SPSC Webinar & Café Sessions

Smart speakers and virtual assistants are part of our daily life. Their applications find not only use in our homes, where speech technology grants an unprecedented level of convenience, but also in health care, forensic sciences as well as in banking and payment methods; speech technology has dual use applications. By consequence, we need to evolve our understanding of security & privacy for applications in speech communication.

SPSC features two formats.

  • Webinar — Once-a-month web seminars. The lectures range from keynote-style talks of seniors in industry and academia to practice talks of doctoral and master defenses. We try to keep the time slot for the first Monday in a month (not a bank day) at 10h Brussels time.
    Duration. 40 minute talk & up to 20 minute Q&A.
  • Café Journal Club — We meet for a Café session and discuss current topics and papers related to the work we are doing. Bring a tea, coffee or your favorite beverage. Let's exchange knowledge and ideas. Young researchers decide the topic for an interactive debate and ideas for collaboration. This is an excellent opportunity to meet other researchers in the field, expand your professional network, and find potential collaborators on a project. In the first part of the meeting we will be discussing a paper or topic (30 minutes), followed by 10 minutes of discussing ideas. We will meet once per month, similar to the Webinar, but near the end of the month. Meeting duration 30 minutes with 10 minutes of discussion. We are happy to receive suggestions on interesting papers and topics, and also open to participants as volunteer meeting leader.


    Duration. 30 minutes followed by 10 mins discussion.

Outcome. The goal of the lecture talks is to understand another perspective and discuss on particular aspects of SPSC in its inter-disciplinary setting. We need to leave our comfort zones to meaningfully anticipate the merger of speech technology with SPSC research areas including: user-interface design, study of the law, cryptography, and cognitive sciences.

Propose a talk. Simply drop us an email with speaker, date/time, title & abstract to: cafe@lists.spsc-sig.org

Open to everyone. Including non-members (0 EUR fee). Please register for stating your data privacy consent and to obtain the session URL.

Upcoming webinars

Webinar: 2024-03-04 (Mon) Anna-Maria Piskopani, University of Nottingham, UK — 10am Brussels time [registration]
    Voice modification in the creative sector and the risks to human rights
    Abstract [+]
For speech technologies in recent years, machine learning and generative AI have been transformational. The use of voice modification in creative industries and media—using even the voice of deceased people—has sparked a legal and ethical debate regarding the limits of creative voice modification and alteration and the respect of individual rights. Voice, speech, and audio are all protected in different contexts under several rights (identity, privacy, data protection, personality and private life, copyright, and intellectual property) in various jurisdictions (UK, USA, and European nations such as Germany and France). Generative AI companies now give the opportunity for anyone who uses them to easily craft realistic synthetic voices from just a few seconds of real speech. In this webinar we will discuss ethical and legal issues raised by these technologies, analyse specific paradigms, and discuss the challenges of mitigating these legal rights' risks.
Webinar: 2024-04-01 (Mon) Casandra Rusti, University of Southern California, USA, & Anna Leschanowsky, Fraunhofer IIS, Germany — 5pm Brussels time [registration]
    A Data Perspective on Ethical Challenges in Voice Biometrics Research
    Abstract [+]
Speaker recognition technology, integral to sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. This paper builds on our previous conference work by conducting a detailed metadata analysis of three key datasets — Switchboard, VoxCeleb, and ASVspoof — and addresses their collection methods and associated privacy risks. Through a longitudinal study spanning 2012 to 2021 and involving an analysis of around 700 papers, we investigate how community adoption of datasets has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings highlight the persistent dominance of certain datasets and reveal significant shifts in research focus and data practices. The study emphasizes the need for more inclusive, fair, and privacy-conscious methodologies in speaker recognition, spotlighting the biases, fairness challenges, and ethical considerations brought forth by our in-depth analysis.
Webinar: 2024-05-06 (Mon) Michele Panariello, EURECOM, France — 10am Brussels time [registration]
    Abstract [+]
Webinar: 2024-06-03 (Mon) Luke Richards, University of Maryland, Baltimore County, USA — 4pm Brussels time [registration]
    Abstract [+]
The recent advancement in generative AI is bringing paradigm shifts to society. Using the contemporary AI-based voice synthesizers, it now becomes practical to produce speech that vividly mimics a specific person. While these technologies are designed to improve lives, they also pose significant risks of misuse, potentially harming voice actors' livelihoods and enabling financial scams. In recognition of such threats, existing strategies primarily focus on detecting synthetic speech. In complementary to these defenses, we propose AntiFake as a proactive approach that hinders unauthorized speech synthesis. AntiFake works by adding minor noises to speech samples, such that the attacker's synthesis attempts will lead to audio that does not sound like the target speaker. To attain an optimal balance between sample quality, protection strength, and system usability, we propose adversarial optimization on the three-way trade-offs guided by minimal user inputs. In this work, we make an initial step towards actively protecting our voices, and highlight the ongoing need for robust and sustainable defenses in this evolving landscape.
Webinar: 2024-07-01 (Mon) Lin Zhang, National Institute of Informatics, Japan — 10am Brussels time [registration]
    Partial Spoof
    Abstract [+]

Past webinars

Webinar: 2024-02-05 (Mon) Chao-Han Huck Yang, Nvidia Research — 9am Brussels time [slides, video (talk only)]
    Data Privacy and Evaluation Challenges of Large Language Model Based Speech Recognition
    Abstract [+]
Recently, large-scale pre-trained language models have demonstrated exceptional representational and generalization capabilities for solving unseen domain-specific tasks. As the amount of text data increases due to web-scale digitalization, the speech and audio research community has recently begun to incorporate these large-scale pre-trained models for acoustic processing. However, there are new challenges associated with pre-trained models that often utilize training data from undisclosed sources (e.g., Whisper, ChatGPT, and Bard), such as the risk of data leakage and unauthorized collection in speech evaluation tasks. In this talk, we will introduce these challenges and discuss recent advances in generative error correction for speech recognition and data unlearning algorithms.
Webinar: 2024-01-15 (Mon) Zhiyuan Yu, Washington University in St Louis, USA — 16h Brussels time [video (talk only)]
    Safeguarding Voices via Adversarial Examples: Defense and Way Forward in the Era of GenAI
    Abstract [+]
The recent advancement in generative AI is bringing paradigm shifts to society. Using the contemporary AI-based voice synthesizers, it now becomes practical to produce speech that vividly mimics a specific person. While these technologies are designed to improve lives, they also pose significant risks of misuse, potentially harming voice actors' livelihoods and enabling financial scams. In recognition of such threats, existing strategies primarily focus on detecting synthetic speech. In complementary to these defenses, we propose AntiFake as a proactive approach that hinders unauthorized speech synthesis. AntiFake works by adding minor noises to speech samples, such that the attacker's synthesis attempts will lead to audio that does not sound like the target speaker. To attain an optimal balance between sample quality, protection strength, and system usability, we propose adversarial optimization on the three-way trade-offs guided by minimal user inputs. In this work, we make an initial step towards actively protecting our voices, and highlight the ongoing need for robust and sustainable defenses in this evolving landscape.

Webinar: 2023-11-06 (Mon) Jennifer Williams, University of Southampton, UK — 10h Brussels time
    AI Regulation and Speech Technology: Perspectives from the UK
    Abstract [+]
The UK is attempting to position itself as the global leader in AI regulation, and this talk comes within just days following the AI Safety Summit which has gathered together many stakeholders from around the world for an important conversation. Often missing from this conversation about AI regulation is our field: speech technology. Especially for speech privacy and security, as researchers we are left with many questions and opinions about how AI regulation may affect our freedom to innovate and conduct our research. For many of us, our livelihood depends on being able to conduct our research. This talk will present some of the issues and questions from a mostly UK perspective, including controversial issues that are (for now) still open for discussion and debate. Most importantly, this talk will emphasise why it is important for experts in our field to have a voice about regulation. Some of the issues we will discuss include: open-source, deepfakes, and privacy.
Webinar: 2023-10-02 (Mon) Yang Cao, Hokkaido University, Japan — 10h Brussels time [slides, video (talk only)]
    Towards Formalizing Speech Privacy for Speech Data Release and Analysis
    Abstract [+]
In the age of ubiquitous voice assistants, automated transcription services, and voice-enabled IoT devices, speech data has become a pivotal asset for companies and researchers. Alongside its value, however, lies a pressing concern: safeguarding individual privacy. Unlike traditional textual data, speech encapsulates not just content but also unique characteristics of an individual's voice, including emotions, accents, and biometric details. In this talk, I will provide a brief overview of the history of data privacy research, emphasizing the quest for a more principled definition of privacy. I will then survey recent studies that employ formal privacy definitions, like Differential Privacy, for privacy-preserving speech data release and analysis. Finally, I'll highlight unresolved challenges and potential research opportunities for formalizing speech privacy.
Webinar: 2023-07-04 Unoki Masashi, Japan Advanced Institute of Science and Technology, Japan — 10h Brussels time [slides]
    Introduction to audio/speech information techniques and future applications
    Abstract [+]
Audio information hiding (AIH) has recently been focused on as a state-of-the-art technique enabling copyrights to be protected and defended against attacks and tampering of audio/speech content. This technique has aimed at embedding codes as watermarks to protect copyrights in audio/speech content, which are inaudible to and inseparable by users, and at detecting embedded codes from watermarked signals. It has also aimed at verifying whether it can robustly detect embedded codes from watermarked signals (robust or fragile), whether it can blindly detect embedded codes from watermarked signals (blind or non-blind), whether it can completely restore watermarked signals to the originals by removing embedded codes from them (reversible or irreversible), and whether it can be secure against the publicity of algorithms employed in public or private methods. AIH methods, therefore, must satisfy some of the five following requirements to provide a useful and reliable form of watermarking: (a) inaudibility, (b) robustness, (c) blind detectability, (d) confidentiality, and (e) reversibility. In this talk, historical and typical AIH methods (including speech information hiding) are introduced and pointed out drawbacks. Then our proposed methods based on human auditory characteristics (cochlear delay, adaptive phase modulation, singular spectrum analysis with psychoacoustic model, formant enhancement, spread-spectrum with LP residue) are introduced. In addition, current research issues such as speech spoofing and deepfake detection will also be introduced.
Webinar: 2023-06-12 (Mon) Helen Fraser, University of Melbourne, Australia — 10h Brussels time [video (talk only)]
    Enhancing forensic audio: What works, what doesn't - and how can we know?
    Abstract [+]
Recorded speech used as evidence in criminal trials is often of poor quality, making it hard for the court to understand the content. 'Enhancing' aims to improve the audio, via techniques such as noise reduction, gain control, or spectral subtraction (Maher, 2018). Results are usually evaluated by measuring the acoustic effects of the enhancing processes, and observing whether the audio sounds clearer, without unwanted artefacts. But what does it mean to say the audio 'sounds clearer'? Can we be sure that what sounds clear to the scientist will also sound clear to the judge and jury? Is it possible they might hear it ‘clearly’ but wrongly? This webinar gives new insights on these questions (Fraser 2019 gives a quick impression), and seeks assistance from the signal processing community in ensuring that the courts gain a reliable interpretation of poor-quality forensic audio.
Webinar: 2023-04-03 (Mon) Brij Mohan Lal Srivastava, Inria Startup Studio — 10h Brussels time
    Differentially Private Speaker Anonymization
    Abstract [+]
State-of-the-art speaker anonymization techniques operate by disentangling the speaker information from linguistic and prosodic attributes followed by re-synthesis based on the speaker embedding of a pseudospeaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this talk, I will present our work where we showed that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors, which are plugged into the state-of-the-art anonymization pipeline to generate differentially private utterances with a provable upper bound on the speaker information they contain.
Webinar: 2023-03-06 (Mon) Xin Wang, National Institute of Informatics, Japan — 10h Brussels time [slides]
    Using vocoders to create training data for speech spoofing countermeasure
    Abstract [+]
A good training set for speech spoofing countermeasures requires diverse speech synthesis spoofing attacks, but generating such spoofed trials for a target speaker can be technically demanding. Instead of using full-fledged speech synthesis systems, we use vocoders to do copy-synthesis on bona fide utterances and use the produced utterances as spoofed training data. A copy-synthesized utterance can be treated as the output from a speech synthesis system with perfect acoustic modeling components. While it is not truly spoofed, the copy-synthesized data was found to be effective in training spoofing countermeasures, and the trained countermeasures generalized reasonably well to multiple unseen test sets including domain-mismatched ones.
Webinar: 2023-02-06 (Mon) You (Neil) Zhang, University of Rochester — 16h Brussels time [slides]
    Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks
    Abstract [+]
Automatic Speaker Verification (ASV) systems aim to verify a speaker’s claimed identity through voice. However, voice can be easily forged with replay, text-to-speech (TTS), and voice conversion (VC) techniques, which may compromise ASV systems. Voice presentation attack detection (PAD) is developed to improve the reliability of speaker verification systems against such spoofing attacks. One main issue of voice PAD systems is their generalization ability to unseen synthetic attacks, i.e., synthesis methods that are not seen during training of the presentation attack detection models. We proposed one-class learning, where the model compacts the distribution of learned representations of bona fide speech while pushing away spoofing attacks to improve the results. We then propose speaker attractor multi-center one-class learning (SAMO) as a novel representation learning framework for voice anti-spoofing, which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. Our proposed system outperforms existing state-of-the-art single systems, demonstrating the effectiveness of our proposed methods.

Webinar: 2022-12-05 (Mon) Guglielmo Maccario, UNINT University of the International Studies of Rome & Maurizio Naldi, Università di Roma LUMSA — 17h Brussels time [slides]
    Privacy and smart speakers
    Abstract [+]
Since the release of the first Amazon Echo in the United States, academics and practitioners have been discussing the privacy risks associated with smart speakers, from acceptance models and perception surveys to privacy-protecting features and experiments with prototypes. In this talk, we will provide an overview of the literature concerning privacy issues in smart speakers. We will first look at the meta-data of papers (publication outlets, geographical and time distribution), and then propose a classification of studies based on their topic within the overall theme of privacy. We will also share some ongoing research results on the relevance of privacy as emerging from a text mining analysis of customers’ opinions collected on Amazon UK and some preliminary results from a similar investigation carried out in South Korea.
Webinar: 2022-11-07 (Mon) Francisco Teixeira, University of Lisbon — 16h Brussels time [slides, video (talk only)]
    Towards End-to-End Private Automatic Speaker Recognition
    Abstract [+]
The development of privacy-preserving automatic speaker verification systems has been the focus of a number of studies with the intent of allowing users to authenticate themselves without risking the privacy of their voice. However, current privacy-preserving methods assume that the template voice representations (or speaker embeddings) used for authentication are extracted locally by the user. This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable. In this work, we show how speaker embeddings can be extracted while keeping both the speaker's voice and the service provider's model private, using Secure Multiparty Computation. Further, we show that it is possible to obtain reasonable trade-offs between security and computational cost. This work is complementary to those showing how authentication may be performed privately, and thus can be considered as another step towards fully private automatic speaker recognition.
Webinar: 2022-09-05 (Mon) Pardis Emami-Naeini, Duke University — 16h Brussels time [slides]
    Designing an Informative and Usable Security and Privacy Label for IoT Devices
    Abstract [+]
IoT consumers are concerned about the privacy and security of their smart devices, but they cannot do much about it at the time of purchase. This is due to the unavailability of such information when making a purchase decision. Therefore, we decided to bring this much-needed transparency to consumers at the time of purchase. In this talk, I will discuss how we developed an informative and usable privacy and security label for IoT devices by conducting a series of studies and incorporating inputs from thousands of consumers and experts.
Webinar: 2022-08-01 (Mon) Raphael Franck Olivier, Carnegie Mellon University — 16h Brussels time
    Targeted and transferable adversarial perturbations on self-supervised ASR models
    Abstract [+]
The transferability of adversarial attacks between different models is a critical factor in estimating the danger that they pose on real-life systems. On Automatic Speech Recognition (ASR) models, past works have shown that targeted optimization attacks display extremely limited transferability. However these works were largely conducted on now outdated models. We find that this property is obsolete when it comes to more recent neural architectures, and in particular to transformers pretrained with Self-Supervised Learning (SSL). With a thorough ablation study on factors such as model performance or number of parameters, we show that SSL training makes models more vulnerable to transferred attacks from other SSL-pretrained models. This remains true regardless of the SSL objective used and even, to an extent, of the training data. We release a small adversarial dataset that can fool, with various but always non-trivial success, all publicly available SSL-pretrained ASR models. We also propose an explanation for this intriguing property, and discuss its important implications for the security of ASR systems.
Webinar: 2022-07-11 (Mon) Jennifer Williams, University of Southampton — 10h Brussels time
    Developing Privacy-Preserving Audio Capabilities for Smart Buildings
    Abstract [+]
Smart buildings have the potential to not only reduce the consumption of resources, such as electricity, but also to vastly improve the quality of life for individuals. Buildings of the future may be designed from the ground up or existing buildings may be retrofitted with certain capabilities that contribute to net-zero carbon goals and increased comfort. An important aspect of creating "smart" buildings includes the ability to sense multiple aspects of human occupancy, including how many people are inside, their personal comfort preferences, how they are distributed throughout a building, and their levels of movement or activity. This information can then be used to optimise energy consumption, e.g., by regulating heating, ventilation or lighting. Many types of sensors already exist but some are expensive or do not result in accurate people counts or provide fine-grained decision information. In this talk, we will discuss ongoing work relating to the development of privacy-preserving audio analysis for occupancy-detection. We will examine some initial findings, discuss some techniques for preserving privacy, and also discuss further work for equipping smart buildings with audio. The adoption of audio in smart buildings can have wide-ranging benefits from providing accessibility services and emergency services, to reduced energy costs, and even specialist capabilities such as automated meeting recordings and secure access.
Webinar: 2022-06-13 (Mon) Sneha Des, Danish Technical University — 10h Brussels time [slides]
    Influence of loss functions on the latent representation of speech emotions
    Abstract [+]
Speech emotion recognition (SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. Developing generalizable and transferable models are critical due to a lack of sufficient resources in terms of data and labels for languages beyond the most commonly spoken ones. This talk will provide insights into the impact of loss functions over the latent space and its subsequent influence on transferability speech emotion models to resource constrained regimes.
Webinar: 2022-04-04 (Mon) Nicolas Müller, Fraunhofer AISEC — 10h Brussels time [slides]
    Text-to-Speech Synthesis and the Threat of Audio-Deepfakes
    Abstract [+]
Text-to-speech (TTS), i.e., the synthesis of human voice by AI, has many benign applications, but unfortunately also enables misuse by so-called deepfakes. This talk will give a brief, technical overview of audio synthesis creation, and then cover ways to deal with the threat of 'deepfakes' enabled by TTS in the first place.
Webinar: 2022-02-07 (Mon) Hung-Yi Lee & Haibin Wu, National Taiwan University — 10h Brussels time [slides]
    Towards Universal Self-supervised Model for Speech Processing
    Abstract [+]
Self-supervised learning (SSL) has shown to be vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art for various tasks with minimal adaptation. However, existing works of SSL explored a limited number of speech processing tasks, blurring the generalizability and re-usability of SSL models across speech processing tasks. This talk will first introduce Speech processing Universal PERformance Benchmark (SUPERB), which is a leaderboard to benchmark the performance of SSL model across a wide range of speech processing tasks. The results on SUPERB demonstrates that SSL representations show competitive generalizability across speech processing tasks. I will also share some ongoing research directions based on SUPERB.

Webinar: 2021-12-06 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time
    First SPSC Symposium - Review and Outlook
    Abstract [+]
The first SPSC Symposium took place on November 10-12, 2021, online. It was meant to bring together people with various backgrounds and perspectives on speech communication to present and discuss hot topics in security and privacy research. There were three invited talks, two workshops, one PhD symposium, and nearly 20 poster presentations. In this talk, Ingo and Karla will give some insights on the organizational process and their main take aways. The audience is encouraged to actively participate by providing feedback and bringing in ideas for the next Symposium, planned to take place at the Interspeech 2022.
Webinar: 2021-11-01 (Mon) Meeri Haataja, Saidot — 10h Brussels time [slides are available upon request]
    Ethical considerations in voice and speech technologies
    Abstract [+]
Meeri Haataja is the CEO and co-founder of Saidot, a Finland based company with a mission for building responsible AI ecosystems, helping companies assess and document the ethical and legal aspects of their AI systems. The platform is used by major public and private organizations in deploying systematic AI governance and transparency via AI registers. In her presentation, Meeri will share best practices on AI governance and how to apply systematic ethical assessment in AI development projects. She will introduce how the platform will help AI teams collaborate on AI ethics and governance, recognizing the growing industry need to address the specificities of varying AI use cases and industries. After Meeri's presentation we will discuss which specific aspects should we embed in AI governance methodologies while assessing the use cases for voice technologies and speech interfaces. How could the academic speech community contribute to creating standardized assessment and documentation approaches capable of addressing the ethical considerations specific to voice and speech technologies?
Webinar: 2021-10-04 (Mon) Tore Knudsen, Artist at Noodle — 16h Brussels time [video (talk only)]
    Project Alias - A parasite for the surveillance age.
    Abstract [+]
With Project Alias, Tore Knudsen and Bjørn Karmann from Denmark demonstrated a simple, yet effective way to take back control over our own private sphere, which earned them the STARTS Prize of the European Commission in 2019. Tore will today show us the process and thoughts behind the project, while also showing other example of his speculative design work that explores our relationship with technology, data and privacy.
Webinar: 2021-09-06 (Mon) Christoph Lutz, Norwegian Business School — 10h Brussels time [slides, video (talk only)]
    Privacy and Smart Speakers - A Multi-Dimensional Approach
    Abstract [+]
Over the last few years, smart speakers such as Amazon Echo and Google Home have become increasingly present in many households. Yet, privacy remains a prominent concern in the public discourse about smart speakers, as well as in the nascent academic literature. We argue that privacy in the context of smart speakers is more complex than in other settings due to smart speakers' specific technological affordances and also the axial relationships between users, the device, device manufacturers, application developers, and other third parties such as moderation contractors and data brokers. With survey data from Amazon Echo and Google Home users in the UK, we explore users' privacy concerns and privacy protection behaviors related to smart speakers. We rely on a contextual understanding of privacy, assessing the prevalence of seven distinct privacy concern types as well as three privacy protection behaviors. The results indicate that concerns about third parties, such as contractors listening to smart speaker recordings, are most pronounced. Privacy protection behaviors are uncommon but partly affected by privacy concerns and motives such as social presence and utilitarian benefits. Taken together, our research paints a picture of privacy pragmatism or privacy cynicism among smart speaker users.
Webinar: 2021-08-02 (Mon) Andreas Nautsch, vitas.ai — 10h Brussels time [slides, video (talk only)]
    Metrics in VoicePrivacy and ASVspoof Challenges
    Abstract [+]
Security & privacy are essential to human machine interaction; so is the assessment of countermeasures & safeguards. Whereas conventional machine learning systems are evaluated for their recognition performance, projecting the same metrics to security & privacy contexts does not suffice. For security, when countermeasures are added on-top, it misleads to simply consider alone either the original system or the countermeasure: a new system is composed which needs to be evaluated for its entirety. For privacy, where an add-on mindset is inadequate to fulfill by-design and by-default demands, assessment aims at estimating the capacity of an adversary to infer sensitive information from data while having no further knowledge about her. This talk reflects on both at the example of voice biometrics in speech technology. In the ASVspoof and VoicePrivacy challenges, the security & privacy are investigated for speech technology. While the aim of ASVspoof is fake audio detection to protect voice biometrics from attacks through synthetic and replayed speech, the aim of VoicePrivacy is to suppress biometric factors in audio data when only recognition matters of what was said. Both challenges gather new communities for benchmarking solutions with common protocols and datasets at the level of first and advanced steps. For this, task definition is as much of relevance as the development of metrics. Depending on the research challenge, new metrics have been introduced. The tandem detection cost function “t-DCF” and the zero evidence biometric recognition assessment “ZEBRA” frameworks are presented, illustrated, and explained to tackle security & privacy quantification. As an “add-on,” the t-DCF framework extends upon the DCF metric which is used since over two decades in evaluation of voice biometrics. On the contrary, not as “add-on,” the ZEBRA framework is motivated by Shannon’s “perfect secrecy” and the validation methodology of automated systems in forensic sciences. Both frameworks indicate directions for developing future capacities in better characterizing the security & privacy tasks at hand.
Webinar: 2021-07-12 (Mon) Ivo Emanuilov & Katerina Yordanova, KU Leuven — 11h Brussels time [slides, video (talk only)]
    "Open the pod bay doors, HAL": legal limitations on the use of biometric data for emotion detection and speech recognition in human-robot collaboration on the smart shop floor
    Abstract [+]
Industrial applications of AI, IoT, augmented reality and digital twins in connected, distributed and scalable Factories of the Future bear the promise of new, more efficient and better optimized manufacturing. In the future, human workers and robots would no longer work in separation. Cobots would operate alongside humans with mobile robots assuming more and increasingly independent tasks on the shop floor. The expected optimization gains, however, would have to be balanced against the fundamental rights of workers. Indeed, the most promising use cases for human-robot collaboration entail also the most privacy-invasive practices, such as recognition of speech, gait analysis, and emotion detection. The rollout of these new scenarios would depend on balancing the values of human safety, IT and OT security and workplace privacy. In this talk, we push several selected use cases to their point of collision with EU privacy and data protection law and attempt to salvage as much as possible from the wreckage.
Webinar: 2021-06-07 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time [slides, video (talk only)]
    Speech Behavior Matters - Automatically Detect Device Directed Speech for the application of addressee-detection
    Abstract [+]
Voice Assistants getting more and more popular and change the way people interact. Furthermore, more people get in contact with them and it is usual to use them in the daily routine. But, unfortunately most systems are still just voice-controlled remotes and the conversations still feels uncomfortable. Especially as the conversation activation needs a wake-word, which is still error-prone. This talk firstly discusses examples about errors in the conversation initiation and depicts the state-of the art in the research field of addressee-detection with a special focus on prosodic differences in the addressee behavior. Afterwards own analyses to the addressee behavior for modern voice-assistants in two different settings: a) interactions with Amazon's Alexa in a lab setting, dataset of similar dialog complexity between HHI and HCI. Subsequently, analyses of self-reports and annotator feedback on the speaking behavior will be discussed, followed by an overview of different recognition experiments to finally build an (intelligent) addressee-detection framework based on prosodic characteristics. The talk is then concluded by mention possible future research directions and open issues in experimental conditions.
Webinar: 2021-05-03 (Mon) Clara Hollomey, Austrian Academy of Sciences — 10h Brussels time
    Time-frequency analysis of speech signals: from algorithms to human perception
    Abstract [+]
Time-frequency analysis provides a wealth of information about audio signals, such as their pitch, formant resonances, and the presence of noisy components in the sound. Also in the human inner ear, sound is spread out according to frequency before being further processed along the auditory pathway. Consequently, time-frequency analysis forms the basis of most speech detection, classification, and discrimination tasks. In this talk, we outline advantages and constraints of common time-frequency analysis algorithms, such as the discrete Gabor, Fourier, and wavelet transform, with regards to speech signals. We describe how those algorithms can be applied to modelling human speech processing and give examples for typical applications, such as speech denoising, pitch-shifting, and estimating the short-term speech intelligibility. The algorithms, models, and their applications are illustrated by demos drawn from the Large Time Frequency Analysis Toolbox (LTFAT) and the Auditory Modeling Toolbox (AMT) . While the LTFAT provides algorithmically efficient means for audio signal representation and modification, the AMT specifically targets speech signal analysis in the human auditory system. The AMT and the LTFAT are both open-source and can be freely downloaded from ltfat.github.io and http://amtoolbox.sourceforge.net/ respectively.
Webinar: 2021-04-12 (Mon) Olya Kudina, TU Delft — 10h Brussels time [slides are available upon request]
    Ethical considerations of the algorithmic processing of language and speech
    Abstract [+]
In this talk, Olya will discuss the ethical implications of voice-based interfaces, such as Siri, Google Home or Alexa. She will consider them from the theoretical perspective of technologies-as-mediators and ethics-as-accompaniment to show how voice-based technologies help to shape our lives. Olya will discuss specific instances of how they foster moral perceptions, choices and values, and in parallel give a response from the creative user and design communities. Together, this will provide a clue on how to shape meaningful interactions with voice assistants, individually and collectively.
Webinar: 2021-03-01 (Mon) Yefim Shulman, Tel Aviv University — 10h Brussels time [slides, video (talk only)]
    Promised but Not Guaranteed: Understanding People's Ability to Control Their Personal Information
    Abstract [+]
The consensus in legal frameworks, such as the GDPR and CCPA, states that people (data subjects) should be able to exercise control over their personal information. Yet, having control over what happens to their personal information in practice remains a challenging endeavor for the data subjects. Based on a conceptual control theoretic analysis and select empirical findings, my talk will discuss what control over personal information may require and how it may be improved.
Webinar: 2021-02-01 (Mon) Lara Gauder & Leonardo Pepino, University of Buenos Aires — 16h Brussels time [slides, video (talk only)]
    A Study on the Manifestation of Trust in Speech
    Abstract [+]
Research has shown that trust is an essential aspect of human-computer interaction, determining the degree to which the person is willing to use a system. Predicting the level of trust that a user has on the skills of a certain system could be used to attempt to correct potential distrust by having the system take relevant measures like, for example, explaining its actions more thoroughly. In our research project, we have explored the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech. For this purpose, we designed a protocol for collecting speech data, consisting of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant, which they were led to believe was either very reliable or unreliable. We collected a speech corpus in Argentine Spanish and found that the reported level of trust was effectively elicited by the protocol. Preliminary results using random forest classifiers showed that the subject’s speech can be used to detect which type of VA they were using with an accuracy up to 76%, compared to a random baseline of 50%.
Webinar: 2021-01-11 (Mon) Tom Bäckström, Aalto University — 10h Brussels time [slides, video (talk only)]
    Code of Conduct for Data Management in Speech Research - Starting the process
    Abstract [+]
For anyone working with speech it should, by now, be obvious that we need to take care of the rights of the people involved. The question is only, how do we do that? Data management in an ethically responsible manner is one the aspect of this issue and it touches all researchers; I have myself struggled many times with designing data management plans and I have received many requests that the SPSC SIG should take a shot at this. Therefore I think we need community-wide guidelines of how to handle data management, especially with respect to privacy. For example, what is an acceptable level of anonymization of data? When do we need to anonymize? When do we need to limit access to data by, say, requiring signature of a contract? What kind of expiry dates should data have? To which extent should data use be checked in the review processes of conferences, journals and grant applications? And so on. The objective of this session is to setup the process through which we create a code of conduct. That is, my intention is only to discuss how we want to make decisions about the code of conduct and not to even attempt at writing anything for a first draft. In this session I'll thus present an outline of a roadmap of how we can create a code of conduct. The session will consists of a shorter pre-recorded presentation part, where I present my initial draft of the process. After the presentation, I invite everyone to join a discussion with webcameras on. The session is not recorded, but I'll share my notes with the participants.

Webinar: 2020-12-07 (Mon) Birgit Brüggemeier, Fraunhofer Institute for Integrated Circuits IIS — 10h Brussels time [slides, video (talk only)]
    Conversational Privacy – Communicating Privacy and Security in Conversational User Interfaces
    Abstract [+]
In 2019, media scandals raised awareness about privacy and security violations in Conversational User Interfaces (CUI) like Alexa, Siri and Google. Users report that they perceive CUI as “creepy” and that they are concerned about their privacy. The General Data Protection Regulation (GDPR) gives users the right to control processing of their data, for example by opting-out or requesting deletion and it gives them the right to obtain information about their data. Furthermore, GDPR advises for seamless communication of user rights, which, currently, is poorly implemented in CUI. This talk presents a data collection interface, called Chatbot Language (CBL) that we use to investigate how privacy and security can be communicated in a dialogue between user and machine. We find that conversational privacy can affect user perceptions of privacy and security positively. Moreover, user choices suggest that users are interested in obtaining information on their privacy and security in dialogue form. We discuss implications and limitations of this research.
Webinar: 2020-11-02 (Mon) Rainer Martin & Alexandru Nelus, Ruhr-Universität Bochum — 10h Brussels time [slides, video (talk only)]
    Privacy-preserving Feature Extraction and Classification in Acoustic Sensor Networks
    Abstract [+]
In this talk we present a brief introduction to acoustic sensor networks and to feature extraction schemes that aim to improve the privacy vs. utility trade-off for audio classification in acoustic sensor networks. Our privacy enhancement approach consists of neural-network-based feature extraction models which aim to minimize undesired extraneous information in the feature set. To this end, we present adversarial, siamese and variational information feature extraction schemes in conjunction with neural-network-based classification (trust) and attacker (threat) models. We consider and compare schemes with explicit knowledge of the threat model and without such knowledge. For the latter, we analyze and apply the variational information approach in a smart-home scenario. It is demonstrated that the proposed privacy-preserving feature representation generalizes well to variations in dataset size and scenario complexity while successfully countering speaker identification attacks.
Webinar: 2020-10-05 (Mon) Nick Gaubitch, Pindrop — 10h Brussels time [video (talk only)]
    Voice Security and Why We Should Care
    Abstract [+]
After a couple of decades of somewhat slow development, voice technologies have once again gained a momentum. Much of this has been driven by large leaps in speech and speaker recognition performance and consequently, the development of many voice interfaces. Some notably successful examples of current applications of voice are the Amazon Echo and Apple Siri but we also see and increasing number of institutions that make use of voice recognition to replace more traditional customer identification methods. While much of this development is exciting for speech and audio processing research, it also creates new and significant challenges in security and privacy. Furthermore, new technologies for various forms of voice modification and synthesis are on the rise which only exacerbates the problem.

In this talk we will first introduce Pindrop and the company’s mission in the world of voice security and we will take a glimpse into the global fraud landscape of call centres, which motivates the work that we do. Next, we will take a deeper dive into the specific topic of voice modification and some related research results. Finally, we will provide an outlook into the future of voice and voice security.
Webinar: 2020-09-07 (Mon) Pablo Pérez Zarazaga, Aalto University — 10h Brussels time [slides, video (talk only)]
    Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks
    Abstract [+]
Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.
Café: 2020-08-27 (Thu) Catherine Jasserand, Rijksuniversiteit Groningen — 16h Brussels time [slides]
    What is speech/voice from a data privacy perspective: Insights from the GDPR
    Abstract [+]
Catherine Jasserand, a postdoctoral researcher on privacy issues raised by biometric technologies, will discuss the notions of speech and voice from a data privacy perspective. If the GDPR mentions neither voice data nor speech data among the examples of personal data, it applies to both types of data when they relate to an identifiable or identified individual. The talk will be the opportunity to explain terminological issues (including what ‘identification’ means in the context of data protection).
Webinar: 2020-08-03 (Mon) Qiongxiu Li, Aalborg Universitet — 10h Brussels time [slides, video (talk only)]
    Privacy-Preserving Distributed Optimization via Subspace Perturbation: A General Framework
    Abstract [+]
As the modern world becomes increasingly digitized and interconnected, distributed signal processing has proven to be effective in processing its large volume of data. However, a main challenge limiting the broad use of distributed signal processing techniques is the issue of privacy in handling sensitive data. To address this privacy issue, we propose a novel yet general subspace perturbation method for privacy-preserving distributed optimization, which allows each node to obtain the desired solution while protecting its private data. In particular, we show that the dual variables introduced in each distributed optimizer will not converge in a certain subspace determined by the graph topology. Additionally, the optimization variable is ensured to converge to the desired solution, because it is orthogonal to this non-convergent subspace. We therefore propose to insert noise in the non-convergent subspace through the dual variable such that the private data are protected, and the accuracy of the desired solution is completely unaffected. Moreover, the proposed method is shown to be secure under two widely-used adversary models: passive and eavesdropping. Furthermore, we consider several distributed optimizers such as ADMM and PDMM to demonstrate the general applicability of the proposed method. Finally, we test the performance through a set of applications. Numerical tests indicate that the proposed method is superior to existing methods in terms of several parameters like estimated accuracy, privacy level, communication cost and convergence rate. [pre-print]
Webinar: 2020-07-06 (Mon) Francisco Teixeira, INESC-ID / IST, Univ. of Lisbon — 10h Brussels time [slides, video (talk only)]
    Privacy in Health Oriented Paralinguistic and Extralinguistic Tasks
    Abstract [+]
The widespread use of cloud computing applications has created a society-wide debate on how user privacy is handled by online service providers. Regulations such as the European Union's General Data Protection Regulation (GDPR), have put forward restrictions on how such services are allowed to handle user data. The field of privacy-preserving machine learning is a response to this issue that aims to develop secure classifiers for remote prediction, where both the client's data and the server's model are kept private. This is particularly relevant in the case of speech, and concerns not only the linguistic contents, but also the paralinguistic and extralinguistic info that may be extracted from the speech signal.
In this talk we provide a brief overview of the current state-of-the-art in paralinguistic and extralinguistic tasks for a major application area in terms of privacy concerns - health, along with an introduction to cryptographic methods commonly used in privacy-preserving machine learning. These will lay the groundwork for the review of the state-of-the-art of privacy in paralinguistic and extralinguistic tasks for health applications. With this talk we hope to raise awareness to the problem of preserving privacy in this type of tasks and provide an initial background for those who aim to contribute to this topic.