SPSC Webinar & Café Sessions

Smart speakers and virtual assistants are part of our daily life. Their applications find not only use in our homes, where speech technology grants an unprecedented level of convenience, but also in health care, forensic sciences as well as in banking and payment methods; speech technology has dual use applications. By consequence, we need to evolve our understanding of security & privacy for applications in speech communication.

SPSC features two formats.

Webinar — Once-a-month web seminars. The lectures range from keynote-style talks of seniors in industry and academia to practice talks of doctoral and master defenses. We try to keep the time slot for the first Monday in a month (not a bank day) at 10h CET. But we are flexible with the scheduling (for both day and time).
Duration. 40 minutes talk & up to 20 minutes Q&A.
Café Journal Club — We meet for a Café session and discuss current topics and papers related to the work we are doing. Bring a tea, coffee or your favorite beverage. Let's exchange knowledge and ideas. Young researchers decide the topic for an interactive debate and ideas for collaboration. This is an excellent opportunity to meet other researchers in the field, expand your professional network, and find potential collaborators on a project. In the first part of the meeting we will be discussing a paper or topic (30 minutes), followed by 10 minutes of discussing ideas. We will meet once per month, similar to the Webinar, but near the end of the month. Meeting duration 30 minutes with 10 minutes of discussion. We are happy to receive suggestions on interesting papers and topics, and also open to participants as volunteer meeting leader.
Duration. 30 minutes followed by 10 minutes discussion.

Outcome. The goal of the lecture talks is to understand another perspective and discuss on particular aspects of SPSC in its inter-disciplinary setting. We need to leave our comfort zones to meaningfully anticipate the merger of speech technology with SPSC research areas including: user-interface design, study of the law, cryptography, and cognitive sciences.

Open to everyone. Including non-members (0 EUR fee). Please register for stating your data privacy consent and to obtain the session URL.

Upcoming webinars

Propose a talk. Simply fill out this self-/nomination form

Webinar: 2025-10-20 (Mon) Viola Negroni, Politecnico di Milano — 10am (CEST) [registration]
Leveraging Mixture of Experts for improved Speech Deepfake Detection [paper1] [paper2]
Abstract [+]

Abstract

Speech deepfake detection has become a critical field as synthetic speech generation grows increasingly realistic. Current detection systems face persistent challenges in generalization, struggling to maintain performance across unseen data, and in robustness, as perturbations greatly degrade accuracy. At the same time, improving explainability remains essential for building trustworthy detection models. In this talk, we will explore how Mixture of Experts (MoE) architectures provide an innovative and flexible framework to address these challenges. The strength of MoEs lies in their ability to integrate multiple specialized models: instead of rigidly fusing their decisions, a trainable gating network dynamically weighs each expert’s contribution. Such a modular approach positions MoEs as a promising direction for the next generation of reliable speech deepfake detectors.

Webinar: 2025-12-09 Nicolas Müller, Fraunhofer AISEC — 11am (CET) [registration]
Title_TBD [ref]
Abstract [+]

Abstract_TBD

Past webinars

Webinar: 2025-10-06 (Mon) Junzuo Zhou, Chinese Academy of Sciences — 10am (CEST) [slides] [recording]
Traceability and Copyright Protection in Neural Speech Process [paper1][paper2][paper3]
Abstract [+]

Audio watermarking is a key technique for traceability and copyright protection in neural speech process. This talk introduces three categories of methods: post-hoc watermarking, task-integrated watermarking, and open-source model watermarking. We will focus on approaches that jointly optimize watermark embedding with speech generation or codec models, as well as parameter-level methods for open-source model protection. These techniques improve imperceptibility and robustness while enabling flexible and secure provenance marking, and we will conclude with key challenges for future development.

Webinar: 2025-08-07 (Thur) Kai Li, Tsinghua University — 15pm (CEST) [recording]
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models [paper]
Abstract [+]

The rapid advancement and expanding applications of Audio Large Language Models (ALLMs) demand a rigorous understanding of their trustworthiness. However, systematic research on evaluating these models, particularly concerning risks unique to the audio modality, remains largely unexplored. Existing evaluation frameworks primarily focus on the text modality or address only a restricted set of safety dimensions, failing to adequately account for the unique characteristics and application scenarios inherent to the audio modality. We introduce AudioTrust—the first multifaceted trustworthiness evaluation framework and benchmark specifically designed for ALLMs. AudioTrust facilitates assessments across six key dimensions: fairness, hallucination, safety, privacy, robustness, and authentication. To comprehensively evaluate these dimensions, AudioTrust is structured around 18 distinct experimental setups. Its core is a meticulously constructed dataset of over 4,420 audio/text samples, drawn from real-world scenarios (e.g., daily conversations, emergency calls, voice assistant interactions), specifically designed to probe the multifaceted trustworthiness of ALLMs. For assessment, the benchmark carefully designs 9 audio-specific evaluation metrics, and we employ a large-scale automated pipeline for objective and scalable scoring of model outputs. Experimental results reveal the trustworthiness boundaries and limitations of current state-of-the-art open-source and closed-source ALLMs when confronted with various high-risk audio scenarios, offering valuable insights for the secure and trustworthy deployment of future audio models. Our platform and benchmark are available at https://github.com/JusperLee/AudioTrust.

Webinar: 2025-07-17 (Thur) Rachid Riad, Callyope, Paris, France — 15pm (CEST) [recording]
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio [paper]
Abstract [+]

Audio and speech data are increasingly used in machine learning applications such as speech recognition, speaker identification, and mental health monitoring. However, the passive collection of this data by audio listening devices raises significant privacy concerns. Fully homomorphic encryption (FHE) offers a promising solution by enabling computations on encrypted data and preserving user privacy. Despite its potential, prior attempts to apply FHE to audio processing have faced challenges, particularly in securely computing time frequency representations, a critical step in many audio tasks. Here, we addressed this gap by introducing a fully secure pipeline that computes, with FHE and quantized neural network operations, four fundamental time-frequency representations: Short-Time Fourier Transform (STFT), Mel filterbanks, Mel-frequency cepstral coefficients (MFCCs), and gammatone filters. Our methods also support the private computation of audio descriptors and convolutional neural network (CNN) classifiers. Besides, we proposed approximate STFT algorithms that lighten computation and bit use for statistical and machine learning analyses. We ran experiments on the VocalSet and OxVoc datasets demonstrating the fully private computation of our approach. We showed significant performance improvements with STFT approximation in private statistical analysis of audio markers, and for vocal exercise classification with CNNs. Our results reveal that our approximations substantially reduce error rates compared to conventional STFT implementations in FHE. We also demonstrated a fully private classification based on the raw audio for gender and vocal exercise classification. Finally, we provided a practical heuristic for parameter selection, making quantized approximate signal processing accessible to researchers and practitioners aiming to protect sensitive audio data.

Webinar: 2025-06-03 (Tue) Piotr Kawa, PhD @ WUST, Researcher @ Resemble AI — 2pm (CEST) [recording]
Generalization and Robustness of Audio DeepFake Detection [paper1] [paper2] [paper3] [paper4]
Abstract [+]

Detecting audio deepfakes is an important strategy for combating disinformation. A major challenge in this field is ensuring the reliability of detectors across varying conditions and attack types. In this talk, we will first introduce the concepts of generalization and robustness in deepfake detection. Then, we will present insights from our research focused on addressing these challenges using advanced evaluation protocols, up-to-date multilingual datasets, and defense techniques against a range of attack scenarios.

Webinar: 2025-04-07 (Mon) Tom Bäckström, Aalto University — 10am (CET) [recording] [slides]
Overview of Privacy in Speech Technology [paper] [discussion]
Abstract [+]

The SPSC-SIG was founded in 2019 at a time when there was little activity in the are of privacy for speech technology. Since then we have seen a rapid rise in the volume of research in this area, through contributions that address specific issues. This presentation and the accompanying paper attempt to connect existing contributions and build an overarching perspective to and vision for the research area. By providing a unified framework for discussion and for presenting results, I hope to make it easier for newcomers to join the community and to make comparison of different works easier.

Webinar: 2025-03-03 (Mon) Yi Zhu, INRS & Reality Defender — 16pm (CET) [recording]
Generalizing Audio Deepfake Detection via Style-Linguistics Alignment Pretraining [NeurIPS paper] [ASVspoof5 paper]
Abstract [+]

Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized by generative AI models. Existing ADD models suffer from generalization issues to unseen attacks, with a large performance discrepancy between in-domain and out-of-domain data. In this work, we introduce a new ADD model that explicitly uses the Style-LInguistics Mismatch (SLIM) in fake speech to separate them from real speech. SLIM first employs self-supervised pretraining on only real samples to learn the style-linguistics dependency in the real class. The learned features are then used in complement with standard pretrained acoustic features (e.g., Wav2vec) to learn a classifier on the real and fake classes. When the feature encoders are frozen, SLIM outperforms benchmark methods on out-of-domain datasets while achieving competitive results on in-domain data. The features learned by SLIM allow us to quantify the (mis)match between style and linguistic content in a sample, hence facilitating an explanation of the model decision.

Webinar: 2025-02-03 (Mon) Yara El-Tawil, University of Michigan — 16pm (CET) [recording]
Ethical Development of Speech-Centered Affective Computing for Health
Abstract [+]

Physicians face challenges in delivering effective, timely, and affordable health care. Additionally, several barriers, such as finances, stigma, and accessibility, can prevent a patient from seeking care. To address these issues, researchers are turning to technological solutions, such as the identification of diagnostic digital biomarkers. One such biomarker is emotion in speech, which, when collected and modeled, can provide information about mental and physical health. However, the data from which these emotion biomarkers are drawn and the inferences that result from them are both sensitive and may not be protected as health data by law. The implication is that both the data and inferences can significantly harm patients using these technologies and those around them whose speech data might also be captured. As this technology continues to advance, moving towards being deployed in the real world, steps must be taken to address open ethical and legal questions that surround their use. It is critical that technical communities participate in these discussions so that their expertise can inform emerging ethical and legal frameworks. This work explores the present and future capabilities of speech models in healthcare and discusses ethical, legal, and regulatory frameworks that guide their design and deployment.

Webinar: 2024-11-18 (Mon) Miao Xiaoxiao, Singapore Institute of Technology — 10am Brussels time
From small-scale to large-scale speech database anonymization
Abstract [+]

Voice anonymization aims to remove speaker identity while preserving other attributes to allow various downstream tasks. This talk begins with an overview of our recently proposed privacy-preserving solutions for speech, including anonymization approaches for both single-speaker and multi-speaker scenarios. We then apply the state-of-the-art voice anonymization technique to the original large-scale VoxCeleb2 dataset, creating SynVox2, a synthetic, privacy-friendly alternative. we evaluate SynVox2 in terms of privacy, utility, and fairness, and identify the challenges of employing synthetic data for the downstream task of speaker verification.

Webinar: 2024-09-30 (Mon) Matias Pizarro Bustamante, Ruhr University Bochum, Germany — 10am Brussels time
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution.
Abstract [+]

Adversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We show that characteristics of the distribution over the output tokens can serve as features of binary classifiers. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the effective performance of this approach. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.

Webinar: 2024-07-17 (Wed) Lin Zhang, National Institute of Informatics, Japan — 10am Brussels time
"Whether When What": Detection Localization and Diarization of Partially Spoofed Audio
Abstract [+]

The advancement of generative models makes it easy to partially manipulate small parts of audio while significantly changing their meaning. We dub this new spoofing scenario "Partial Spoof" (PS) and began exploring this threatening scenario 3.5 years ago. This talk will delve into how we define and explore it, through a review of our previous and recently accepted Interspeech works. Our exploration so far includes three specific tasks, each associated with a key question. These tasks mainly focus on defining the task, establishing evaluation metrics, and proposing a benchmark model with some analysis: (1) Spoof Detection: Is the utterance spoofed? This task aligns with the common goal in the spoofing community of distinguishing whether an utterance is spoofed or bona fide. (2) Spoof Localization: When do spoofs happen? This task aims to determine the location of spoof segments within utterances. (3) Spoof Diarization: What spoofed when? This task not only locates the spoofed segments but also discriminates the specific spoofing techniques employed. Please come and discuss this exciting and trending topic, and you may find your next research focus here!

Webinar: 2024-06-03 (Mon) Luke Richards, University of Maryland, Baltimore County, USA — 4pm Brussels time
The Intersection of User Equality & Security in Machine Learning
Abstract [+]

Machine learning systems that enable speech interaction from human users are already present in our homes and workplaces, yet they often lack robustness for diverse user populations, creating a significant usability gap. When we extend an equitable treatment lens to a security context, where our goal is to protect and harden the model, it's essential to ask: "Who do our defenses protect?". I will explore the concept of biased vulnerability and its relationship to biased robustness, demonstrating in a speech case study that machine learning defenses do not have equal coverage for all user subpopulations. Finally, by examining the intersection of equitable interaction and biased vulnerability, we can better understand the importance of considering diverse user needs in our defense methods, ultimately leading to more effective and secure machine learning systems that serve all users equally.

Webinar: 2024-06-17 (Mon) Casandra Rusti, University of Southern California, USA, & Anna Leschanowsky, Fraunhofer IIS, Germany — 5pm Brussels time
A Data Perspective on Ethical Challenges in Voice Biometrics Research
Abstract [+]

Speaker recognition technology, integral to sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. This paper builds on our previous conference work by conducting a detailed metadata analysis of three key datasets — Switchboard, VoxCeleb, and ASVspoof — and addresses their collection methods and associated privacy risks. Through a longitudinal study spanning 2012 to 2021 and involving an analysis of around 700 papers, we investigate how community adoption of datasets has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings highlight the persistent dominance of certain datasets and reveal significant shifts in research focus and data practices. The study emphasizes the need for more inclusive, fair, and privacy-conscious methodologies in speaker recognition, spotlighting the biases, fairness challenges, and ethical considerations brought forth by our in-depth analysis.

Webinar: 2024-05-06 (Mon) Michele Panariello, EURECOM, France — 10am Brussels time [slides, video (talk only)]
Speaker anonymization: current methods, challenges and perspectives
Abstract [+]

Speaker anonymization, the task of masking a voice signal to conceal the identity of its speaker, has been gaining popularity among the speech community. This talk will illustrate the current state of the research on speaker anonymization, from the main models and algorithms to tackle it to the 2024 edition of the VoicePrivacy Challenge. The discussion will also include an overview of some open problems regarding the task, as well as a glance at possible future research directions.

Webinar: 2024-03-04 (Mon) Anna-Maria Piskopani, University of Nottingham, UK — 10am Brussels time
Voice modification in the creative sector and the risks to human rights
Abstract [+]

For speech technologies in recent years, machine learning and generative AI have been transformational. The use of voice modification in creative industries and media—using even the voice of deceased people—has sparked a legal and ethical debate regarding the limits of creative voice modification and alteration and the respect of individual rights. Voice, speech, and audio are all protected in different contexts under several rights (identity, privacy, data protection, personality and private life, copyright, and intellectual property) in various jurisdictions (UK, USA, and European nations such as Germany and France). Generative AI companies now give the opportunity for anyone who uses them to easily craft realistic synthetic voices from just a few seconds of real speech. In this webinar we will discuss ethical and legal issues raised by these technologies, analyse specific paradigms, and discuss the challenges of mitigating these legal rights' risks.

Webinar: 2024-02-05 (Mon) Chao-Han Huck Yang, Nvidia Research — 9am Brussels time [slides, video (talk only)]
Data Privacy and Evaluation Challenges of Large Language Model Based Speech Recognition
Abstract [+]

Recently, large-scale pre-trained language models have demonstrated exceptional representational and generalization capabilities for solving unseen domain-specific tasks. As the amount of text data increases due to web-scale digitalization, the speech and audio research community has recently begun to incorporate these large-scale pre-trained models for acoustic processing. However, there are new challenges associated with pre-trained models that often utilize training data from undisclosed sources (e.g., Whisper, ChatGPT, and Bard), such as the risk of data leakage and unauthorized collection in speech evaluation tasks. In this talk, we will introduce these challenges and discuss recent advances in generative error correction for speech recognition and data unlearning algorithms.

Webinar: 2024-01-15 (Mon) Zhiyuan Yu, Washington University in St Louis, USA — 16h Brussels time [video (talk only)]
Safeguarding Voices via Adversarial Examples: Defense and Way Forward in the Era of GenAI
Abstract [+]

The recent advancement in generative AI is bringing paradigm shifts to society. Using the contemporary AI-based voice synthesizers, it now becomes practical to produce speech that vividly mimics a specific person. While these technologies are designed to improve lives, they also pose significant risks of misuse, potentially harming voice actors' livelihoods and enabling financial scams. In recognition of such threats, existing strategies primarily focus on detecting synthetic speech. In complementary to these defenses, we propose AntiFake as a proactive approach that hinders unauthorized speech synthesis. AntiFake works by adding minor noises to speech samples, such that the attacker's synthesis attempts will lead to audio that does not sound like the target speaker. To attain an optimal balance between sample quality, protection strength, and system usability, we propose adversarial optimization on the three-way trade-offs guided by minimal user inputs. In this work, we make an initial step towards actively protecting our voices, and highlight the ongoing need for robust and sustainable defenses in this evolving landscape.

Webinar: 2023-11-06 (Mon) Jennifer Williams, University of Southampton, UK — 10h Brussels time
AI Regulation and Speech Technology: Perspectives from the UK
Abstract [+]

The UK is attempting to position itself as the global leader in AI regulation, and this talk comes within just days following the AI Safety Summit which has gathered together many stakeholders from around the world for an important conversation. Often missing from this conversation about AI regulation is our field: speech technology. Especially for speech privacy and security, as researchers we are left with many questions and opinions about how AI regulation may affect our freedom to innovate and conduct our research. For many of us, our livelihood depends on being able to conduct our research. This talk will present some of the issues and questions from a mostly UK perspective, including controversial issues that are (for now) still open for discussion and debate. Most importantly, this talk will emphasise why it is important for experts in our field to have a voice about regulation. Some of the issues we will discuss include: open-source, deepfakes, and privacy.

Webinar: 2023-10-02 (Mon) Yang Cao, Hokkaido University, Japan — 10h Brussels time [slides, video (talk only)]
Towards Formalizing Speech Privacy for Speech Data Release and Analysis
Abstract [+]

In the age of ubiquitous voice assistants, automated transcription services, and voice-enabled IoT devices, speech data has become a pivotal asset for companies and researchers. Alongside its value, however, lies a pressing concern: safeguarding individual privacy. Unlike traditional textual data, speech encapsulates not just content but also unique characteristics of an individual's voice, including emotions, accents, and biometric details. In this talk, I will provide a brief overview of the history of data privacy research, emphasizing the quest for a more principled definition of privacy. I will then survey recent studies that employ formal privacy definitions, like Differential Privacy, for privacy-preserving speech data release and analysis. Finally, I'll highlight unresolved challenges and potential research opportunities for formalizing speech privacy.

Webinar: 2023-07-04 Unoki Masashi, Japan Advanced Institute of Science and Technology, Japan — 10h Brussels time [slides]
Introduction to audio/speech information techniques and future applications
Abstract [+]

Audio information hiding (AIH) has recently been focused on as a state-of-the-art technique enabling copyrights to be protected and defended against attacks and tampering of audio/speech content. This technique has aimed at embedding codes as watermarks to protect copyrights in audio/speech content, which are inaudible to and inseparable by users, and at detecting embedded codes from watermarked signals. It has also aimed at verifying whether it can robustly detect embedded codes from watermarked signals (robust or fragile), whether it can blindly detect embedded codes from watermarked signals (blind or non-blind), whether it can completely restore watermarked signals to the originals by removing embedded codes from them (reversible or irreversible), and whether it can be secure against the publicity of algorithms employed in public or private methods. AIH methods, therefore, must satisfy some of the five following requirements to provide a useful and reliable form of watermarking: (a) inaudibility, (b) robustness, (c) blind detectability, (d) confidentiality, and (e) reversibility. In this talk, historical and typical AIH methods (including speech information hiding) are introduced and pointed out drawbacks. Then our proposed methods based on human auditory characteristics (cochlear delay, adaptive phase modulation, singular spectrum analysis with psychoacoustic model, formant enhancement, spread-spectrum with LP residue) are introduced. In addition, current research issues such as speech spoofing and deepfake detection will also be introduced.

Webinar: 2023-06-12 (Mon) Helen Fraser, University of Melbourne, Australia — 10h Brussels time [video (talk only)]
Enhancing forensic audio: What works, what doesn't - and how can we know?
Abstract [+]

Recorded speech used as evidence in criminal trials is often of poor quality, making it hard for the court to understand the content. 'Enhancing' aims to improve the audio, via techniques such as noise reduction, gain control, or spectral subtraction (Maher, 2018). Results are usually evaluated by measuring the acoustic effects of the enhancing processes, and observing whether the audio sounds clearer, without unwanted artefacts. But what does it mean to say the audio 'sounds clearer'? Can we be sure that what sounds clear to the scientist will also sound clear to the judge and jury? Is it possible they might hear it ‘clearly’ but wrongly? This webinar gives new insights on these questions (Fraser 2019 gives a quick impression), and seeks assistance from the signal processing community in ensuring that the courts gain a reliable interpretation of poor-quality forensic audio.

Fraser, H. 2019. Don’t believe your ears: “enhancing” forensic audio can mislead juries in criminal trials. The Conversation.
Fraser, H. 2020. Enhancing forensic audio: What works, what doesn’t, and why. Griffith Journal of Law and Human Dignity, 8(1), 85-102.
Maher, R. 2018. Principles of Forensic Audio Analysis. Springer.

Webinar: 2023-04-03 (Mon) Brij Mohan Lal Srivastava, Inria Startup Studio — 10h Brussels time
Differentially Private Speaker Anonymization
Abstract [+]

State-of-the-art speaker anonymization techniques operate by disentangling the speaker information from linguistic and prosodic attributes followed by re-synthesis based on the speaker embedding of a pseudospeaker. Prior research in the privacy community has shown that anonymization often provides brittle privacy protection, even less so any provable guarantee. In this talk, I will present our work where we showed that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information. We remove speaker information from these attributes by introducing differentially private feature extractors, which are plugged into the state-of-the-art anonymization pipeline to generate differentially private utterances with a provable upper bound on the speaker information they contain.

Webinar: 2023-03-06 (Mon) Xin Wang, National Institute of Informatics, Japan — 10h Brussels time [slides]
Using vocoders to create training data for speech spoofing countermeasure
Abstract [+]

A good training set for speech spoofing countermeasures requires diverse speech synthesis spoofing attacks, but generating such spoofed trials for a target speaker can be technically demanding. Instead of using full-fledged speech synthesis systems, we use vocoders to do copy-synthesis on bona fide utterances and use the produced utterances as spoofed training data. A copy-synthesized utterance can be treated as the output from a speech synthesis system with perfect acoustic modeling components. While it is not truly spoofed, the copy-synthesized data was found to be effective in training spoofing countermeasures, and the trained countermeasures generalized reasonably well to multiple unseen test sets including domain-mismatched ones.

Webinar: 2023-02-06 (Mon) You (Neil) Zhang, University of Rochester — 16h Brussels time [slides]
Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks
Abstract [+]

Automatic Speaker Verification (ASV) systems aim to verify a speaker’s claimed identity through voice. However, voice can be easily forged with replay, text-to-speech (TTS), and voice conversion (VC) techniques, which may compromise ASV systems. Voice presentation attack detection (PAD) is developed to improve the reliability of speaker verification systems against such spoofing attacks. One main issue of voice PAD systems is their generalization ability to unseen synthetic attacks, i.e., synthesis methods that are not seen during training of the presentation attack detection models. We proposed one-class learning, where the model compacts the distribution of learned representations of bona fide speech while pushing away spoofing attacks to improve the results. We then propose speaker attractor multi-center one-class learning (SAMO) as a novel representation learning framework for voice anti-spoofing, which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. Our proposed system outperforms existing state-of-the-art single systems, demonstrating the effectiveness of our proposed methods.

Webinar: 2022-12-05 (Mon) Guglielmo Maccario, UNINT University of the International Studies of Rome & Maurizio Naldi, Università di Roma LUMSA — 17h Brussels time [slides]
Privacy and smart speakers
Abstract [+]

Since the release of the first Amazon Echo in the United States, academics and practitioners have been discussing the privacy risks associated with smart speakers, from acceptance models and perception surveys to privacy-protecting features and experiments with prototypes. In this talk, we will provide an overview of the literature concerning privacy issues in smart speakers. We will first look at the meta-data of papers (publication outlets, geographical and time distribution), and then propose a classification of studies based on their topic within the overall theme of privacy. We will also share some ongoing research results on the relevance of privacy as emerging from a text mining analysis of customers’ opinions collected on Amazon UK and some preliminary results from a similar investigation carried out in South Korea.

Webinar: 2022-11-07 (Mon) Francisco Teixeira, University of Lisbon — 16h Brussels time [slides, video (talk only)]
Towards End-to-End Private Automatic Speaker Recognition
Abstract [+]

The development of privacy-preserving automatic speaker verification systems has been the focus of a number of studies with the intent of allowing users to authenticate themselves without risking the privacy of their voice. However, current privacy-preserving methods assume that the template voice representations (or speaker embeddings) used for authentication are extracted locally by the user. This poses two important issues: first, knowledge of the speaker embedding extraction model may create security and robustness liabilities for the authentication system, as this knowledge might help attackers in crafting adversarial examples able to mislead the system; second, from the point of view of a service provider the speaker embedding extraction model is arguably one of the most valuable components in the system and, as such, disclosing it would be highly undesirable. In this work, we show how speaker embeddings can be extracted while keeping both the speaker's voice and the service provider's model private, using Secure Multiparty Computation. Further, we show that it is possible to obtain reasonable trade-offs between security and computational cost. This work is complementary to those showing how authentication may be performed privately, and thus can be considered as another step towards fully private automatic speaker recognition.

Webinar: 2022-09-05 (Mon) Pardis Emami-Naeini, Duke University — 16h Brussels time [slides]
Designing an Informative and Usable Security and Privacy Label for IoT Devices
Abstract [+]

IoT consumers are concerned about the privacy and security of their smart devices, but they cannot do much about it at the time of purchase. This is due to the unavailability of such information when making a purchase decision. Therefore, we decided to bring this much-needed transparency to consumers at the time of purchase. In this talk, I will discuss how we developed an informative and usable privacy and security label for IoT devices by conducting a series of studies and incorporating inputs from thousands of consumers and experts.

Webinar: 2022-08-01 (Mon) Raphael Franck Olivier, Carnegie Mellon University — 16h Brussels time
Targeted and transferable adversarial perturbations on self-supervised ASR models
Abstract [+]

The transferability of adversarial attacks between different models is a critical factor in estimating the danger that they pose on real-life systems. On Automatic Speech Recognition (ASR) models, past works have shown that targeted optimization attacks display extremely limited transferability. However these works were largely conducted on now outdated models. We find that this property is obsolete when it comes to more recent neural architectures, and in particular to transformers pretrained with Self-Supervised Learning (SSL). With a thorough ablation study on factors such as model performance or number of parameters, we show that SSL training makes models more vulnerable to transferred attacks from other SSL-pretrained models. This remains true regardless of the SSL objective used and even, to an extent, of the training data. We release a small adversarial dataset that can fool, with various but always non-trivial success, all publicly available SSL-pretrained ASR models. We also propose an explanation for this intriguing property, and discuss its important implications for the security of ASR systems.

Webinar: 2022-07-11 (Mon) Jennifer Williams, University of Southampton — 10h Brussels time
Developing Privacy-Preserving Audio Capabilities for Smart Buildings
Abstract [+]

Smart buildings have the potential to not only reduce the consumption of resources, such as electricity, but also to vastly improve the quality of life for individuals. Buildings of the future may be designed from the ground up or existing buildings may be retrofitted with certain capabilities that contribute to net-zero carbon goals and increased comfort. An important aspect of creating "smart" buildings includes the ability to sense multiple aspects of human occupancy, including how many people are inside, their personal comfort preferences, how they are distributed throughout a building, and their levels of movement or activity. This information can then be used to optimise energy consumption, e.g., by regulating heating, ventilation or lighting. Many types of sensors already exist but some are expensive or do not result in accurate people counts or provide fine-grained decision information. In this talk, we will discuss ongoing work relating to the development of privacy-preserving audio analysis for occupancy-detection. We will examine some initial findings, discuss some techniques for preserving privacy, and also discuss further work for equipping smart buildings with audio. The adoption of audio in smart buildings can have wide-ranging benefits from providing accessibility services and emergency services, to reduced energy costs, and even specialist capabilities such as automated meeting recordings and secure access.

Webinar: 2022-06-13 (Mon) Sneha Des, Danish Technical University — 10h Brussels time [slides]
Influence of loss functions on the latent representation of speech emotions
Abstract [+]

Speech emotion recognition (SER) refers to the technique of inferring the emotional state of an individual from speech signals. SERs continue to garner interest due to their wide applicability. Although the domain is mainly founded on signal processing, machine learning, and deep learning, generalizing over languages continues to remain a challenge. Developing generalizable and transferable models are critical due to a lack of sufficient resources in terms of data and labels for languages beyond the most commonly spoken ones. This talk will provide insights into the impact of loss functions over the latent space and its subsequent influence on transferability speech emotion models to resource constrained regimes.

Webinar: 2022-04-04 (Mon) Nicolas Müller, Fraunhofer AISEC — 10h Brussels time [slides]
Text-to-Speech Synthesis and the Threat of Audio-Deepfakes
Abstract [+]

Text-to-speech (TTS), i.e., the synthesis of human voice by AI, has many benign applications, but unfortunately also enables misuse by so-called deepfakes. This talk will give a brief, technical overview of audio synthesis creation, and then cover ways to deal with the threat of 'deepfakes' enabled by TTS in the first place.

Webinar: 2022-02-07 (Mon) Hung-Yi Lee & Haibin Wu, National Taiwan University — 10h Brussels time [slides]
Towards Universal Self-supervised Model for Speech Processing
Abstract [+]

Self-supervised learning (SSL) has shown to be vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art for various tasks with minimal adaptation. However, existing works of SSL explored a limited number of speech processing tasks, blurring the generalizability and re-usability of SSL models across speech processing tasks. This talk will first introduce Speech processing Universal PERformance Benchmark (SUPERB), which is a leaderboard to benchmark the performance of SSL model across a wide range of speech processing tasks. The results on SUPERB demonstrates that SSL representations show competitive generalizability across speech processing tasks. I will also share some ongoing research directions based on SUPERB.

Webinar: 2021-12-06 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time
First SPSC Symposium - Review and Outlook
Abstract [+]

The first SPSC Symposium took place on November 10-12, 2021, online. It was meant to bring together people with various backgrounds and perspectives on speech communication to present and discuss hot topics in security and privacy research. There were three invited talks, two workshops, one PhD symposium, and nearly 20 poster presentations. In this talk, Ingo and Karla will give some insights on the organizational process and their main take aways. The audience is encouraged to actively participate by providing feedback and bringing in ideas for the next Symposium, planned to take place at the Interspeech 2022.

Webinar: 2021-11-01 (Mon) Meeri Haataja, Saidot — 10h Brussels time [slides are available upon request]
Ethical considerations in voice and speech technologies
Abstract [+]

Meeri Haataja is the CEO and co-founder of Saidot, a Finland based company with a mission for building responsible AI ecosystems, helping companies assess and document the ethical and legal aspects of their AI systems. The platform is used by major public and private organizations in deploying systematic AI governance and transparency via AI registers. In her presentation, Meeri will share best practices on AI governance and how to apply systematic ethical assessment in AI development projects. She will introduce how the platform will help AI teams collaborate on AI ethics and governance, recognizing the growing industry need to address the specificities of varying AI use cases and industries. After Meeri's presentation we will discuss which specific aspects should we embed in AI governance methodologies while assessing the use cases for voice technologies and speech interfaces. How could the academic speech community contribute to creating standardized assessment and documentation approaches capable of addressing the ethical considerations specific to voice and speech technologies?

Webinar: 2021-10-04 (Mon) Tore Knudsen, Artist at Noodle — 16h Brussels time [video (talk only)]
Project Alias - A parasite for the surveillance age.
Abstract [+]

With Project Alias, Tore Knudsen and Bjørn Karmann from Denmark demonstrated a simple, yet effective way to take back control over our own private sphere, which earned them the STARTS Prize of the European Commission in 2019. Tore will today show us the process and thoughts behind the project, while also showing other example of his speculative design work that explores our relationship with technology, data and privacy.

Webinar: 2021-09-06 (Mon) Christoph Lutz, Norwegian Business School — 10h Brussels time [slides, video (talk only)]
Privacy and Smart Speakers - A Multi-Dimensional Approach
Abstract [+]

Over the last few years, smart speakers such as Amazon Echo and Google Home have become increasingly present in many households. Yet, privacy remains a prominent concern in the public discourse about smart speakers, as well as in the nascent academic literature. We argue that privacy in the context of smart speakers is more complex than in other settings due to smart speakers' specific technological affordances and also the axial relationships between users, the device, device manufacturers, application developers, and other third parties such as moderation contractors and data brokers. With survey data from Amazon Echo and Google Home users in the UK, we explore users' privacy concerns and privacy protection behaviors related to smart speakers. We rely on a contextual understanding of privacy, assessing the prevalence of seven distinct privacy concern types as well as three privacy protection behaviors. The results indicate that concerns about third parties, such as contractors listening to smart speaker recordings, are most pronounced. Privacy protection behaviors are uncommon but partly affected by privacy concerns and motives such as social presence and utilitarian benefits. Taken together, our research paints a picture of privacy pragmatism or privacy cynicism among smart speaker users.

Webinar: 2021-08-02 (Mon) Andreas Nautsch, vitas.ai — 10h Brussels time [slides, video (talk only)]
Metrics in VoicePrivacy and ASVspoof Challenges
Abstract [+]

Security & privacy are essential to human machine interaction; so is the assessment of countermeasures & safeguards. Whereas conventional machine learning systems are evaluated for their recognition performance, projecting the same metrics to security & privacy contexts does not suffice. For security, when countermeasures are added on-top, it misleads to simply consider alone either the original system or the countermeasure: a new system is composed which needs to be evaluated for its entirety. For privacy, where an add-on mindset is inadequate to fulfill by-design and by-default demands, assessment aims at estimating the capacity of an adversary to infer sensitive information from data while having no further knowledge about her. This talk reflects on both at the example of voice biometrics in speech technology. In the ASVspoof and VoicePrivacy challenges, the security & privacy are investigated for speech technology. While the aim of ASVspoof is fake audio detection to protect voice biometrics from attacks through synthetic and replayed speech, the aim of VoicePrivacy is to suppress biometric factors in audio data when only recognition matters of what was said. Both challenges gather new communities for benchmarking solutions with common protocols and datasets at the level of first and advanced steps. For this, task definition is as much of relevance as the development of metrics. Depending on the research challenge, new metrics have been introduced. The tandem detection cost function “t-DCF” and the zero evidence biometric recognition assessment “ZEBRA” frameworks are presented, illustrated, and explained to tackle security & privacy quantification. As an “add-on,” the t-DCF framework extends upon the DCF metric which is used since over two decades in evaluation of voice biometrics. On the contrary, not as “add-on,” the ZEBRA framework is motivated by Shannon’s “perfect secrecy” and the validation methodology of automated systems in forensic sciences. Both frameworks indicate directions for developing future capacities in better characterizing the security & privacy tasks at hand.

Webinar: 2021-07-12 (Mon) Ivo Emanuilov & Katerina Yordanova, KU Leuven — 11h Brussels time [slides, video (talk only)]
"Open the pod bay doors, HAL": legal limitations on the use of biometric data for emotion detection and speech recognition in human-robot collaboration on the smart shop floor
Abstract [+]

Industrial applications of AI, IoT, augmented reality and digital twins in connected, distributed and scalable Factories of the Future bear the promise of new, more efficient and better optimized manufacturing. In the future, human workers and robots would no longer work in separation. Cobots would operate alongside humans with mobile robots assuming more and increasingly independent tasks on the shop floor. The expected optimization gains, however, would have to be balanced against the fundamental rights of workers. Indeed, the most promising use cases for human-robot collaboration entail also the most privacy-invasive practices, such as recognition of speech, gait analysis, and emotion detection. The rollout of these new scenarios would depend on balancing the values of human safety, IT and OT security and workplace privacy. In this talk, we push several selected use cases to their point of collision with EU privacy and data protection law and attempt to salvage as much as possible from the wreckage.

Webinar: 2021-06-07 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time [slides, video (talk only)]
Speech Behavior Matters - Automatically Detect Device Directed Speech for the application of addressee-detection
Abstract [+]

Voice Assistants getting more and more popular and change the way people interact. Furthermore, more people get in contact with them and it is usual to use them in the daily routine. But, unfortunately most systems are still just voice-controlled remotes and the conversations still feels uncomfortable. Especially as the conversation activation needs a wake-word, which is still error-prone. This talk firstly discusses examples about errors in the conversation initiation and depicts the state-of the art in the research field of addressee-detection with a special focus on prosodic differences in the addressee behavior. Afterwards own analyses to the addressee behavior for modern voice-assistants in two different settings: a) interactions with Amazon's Alexa in a lab setting, dataset of similar dialog complexity between HHI and HCI. Subsequently, analyses of self-reports and annotator feedback on the speaking behavior will be discussed, followed by an overview of different recognition experiments to finally build an (intelligent) addressee-detection framework based on prosodic characteristics. The talk is then concluded by mention possible future research directions and open issues in experimental conditions.

Webinar: 2021-05-03 (Mon) Clara Hollomey, Austrian Academy of Sciences — 10h Brussels time
Time-frequency analysis of speech signals: from algorithms to human perception
Abstract [+]

Time-frequency analysis provides a wealth of information about audio signals, such as their pitch, formant resonances, and the presence of noisy components in the sound. Also in the human inner ear, sound is spread out according to frequency before being further processed along the auditory pathway. Consequently, time-frequency analysis forms the basis of most speech detection, classification, and discrimination tasks. In this talk, we outline advantages and constraints of common time-frequency analysis algorithms, such as the discrete Gabor, Fourier, and wavelet transform, with regards to speech signals. We describe how those algorithms can be applied to modelling human speech processing and give examples for typical applications, such as speech denoising, pitch-shifting, and estimating the short-term speech intelligibility. The algorithms, models, and their applications are illustrated by demos drawn from the Large Time Frequency Analysis Toolbox (LTFAT) and the Auditory Modeling Toolbox (AMT) . While the LTFAT provides algorithmically efficient means for audio signal representation and modification, the AMT specifically targets speech signal analysis in the human auditory system. The AMT and the LTFAT are both open-source and can be freely downloaded from ltfat.github.io and http://amtoolbox.sourceforge.net/ respectively.

Webinar: 2021-04-12 (Mon) Olya Kudina, TU Delft — 10h Brussels time [slides are available upon request]
Ethical considerations of the algorithmic processing of language and speech
Abstract [+]

In this talk, Olya will discuss the ethical implications of voice-based interfaces, such as Siri, Google Home or Alexa. She will consider them from the theoretical perspective of technologies-as-mediators and ethics-as-accompaniment to show how voice-based technologies help to shape our lives. Olya will discuss specific instances of how they foster moral perceptions, choices and values, and in parallel give a response from the creative user and design communities. Together, this will provide a clue on how to shape meaningful interactions with voice assistants, individually and collectively.

Webinar: 2021-03-01 (Mon) Yefim Shulman, Tel Aviv University — 10h Brussels time [slides, video (talk only)]
Promised but Not Guaranteed: Understanding People's Ability to Control Their Personal Information
Abstract [+]

The consensus in legal frameworks, such as the GDPR and CCPA, states that people (data subjects) should be able to exercise control over their personal information. Yet, having control over what happens to their personal information in practice remains a challenging endeavor for the data subjects. Based on a conceptual control theoretic analysis and select empirical findings, my talk will discuss what control over personal information may require and how it may be improved.

Webinar: 2021-02-01 (Mon) Lara Gauder & Leonardo Pepino, University of Buenos Aires — 16h Brussels time [slides, video (talk only)]
A Study on the Manifestation of Trust in Speech
Abstract [+]

Research has shown that trust is an essential aspect of human-computer interaction, determining the degree to which the person is willing to use a system. Predicting the level of trust that a user has on the skills of a certain system could be used to attempt to correct potential distrust by having the system take relevant measures like, for example, explaining its actions more thoroughly. In our research project, we have explored the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech. For this purpose, we designed a protocol for collecting speech data, consisting of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant, which they were led to believe was either very reliable or unreliable. We collected a speech corpus in Argentine Spanish and found that the reported level of trust was effectively elicited by the protocol. Preliminary results using random forest classifiers showed that the subject’s speech can be used to detect which type of VA they were using with an accuracy up to 76%, compared to a random baseline of 50%.

Webinar: 2021-01-11 (Mon) Tom Bäckström, Aalto University — 10h Brussels time [slides, video (talk only)]
Code of Conduct for Data Management in Speech Research - Starting the process
Abstract [+]

For anyone working with speech it should, by now, be obvious that we need to take care of the rights of the people involved. The question is only, how do we do that? Data management in an ethically responsible manner is one the aspect of this issue and it touches all researchers; I have myself struggled many times with designing data management plans and I have received many requests that the SPSC SIG should take a shot at this. Therefore I think we need community-wide guidelines of how to handle data management, especially with respect to privacy. For example, what is an acceptable level of anonymization of data? When do we need to anonymize? When do we need to limit access to data by, say, requiring signature of a contract? What kind of expiry dates should data have? To which extent should data use be checked in the review processes of conferences, journals and grant applications? And so on. The objective of this session is to setup the process through which we create a code of conduct. That is, my intention is only to discuss how we want to make decisions about the code of conduct and not to even attempt at writing anything for a first draft. In this session I'll thus present an outline of a roadmap of how we can create a code of conduct. The session will consists of a shorter pre-recorded presentation part, where I present my initial draft of the process. After the presentation, I invite everyone to join a discussion with webcameras on. The session is not recorded, but I'll share my notes with the participants.

Webinar: 2020-12-07 (Mon) Birgit Brüggemeier, Fraunhofer Institute for Integrated Circuits IIS — 10h Brussels time [slides, video (talk only)]
Conversational Privacy – Communicating Privacy and Security in Conversational User Interfaces
Abstract [+]

In 2019, media scandals raised awareness about privacy and security violations in Conversational User Interfaces (CUI) like Alexa, Siri and Google. Users report that they perceive CUI as “creepy” and that they are concerned about their privacy. The General Data Protection Regulation (GDPR) gives users the right to control processing of their data, for example by opting-out or requesting deletion and it gives them the right to obtain information about their data. Furthermore, GDPR advises for seamless communication of user rights, which, currently, is poorly implemented in CUI. This talk presents a data collection interface, called Chatbot Language (CBL) that we use to investigate how privacy and security can be communicated in a dialogue between user and machine. We find that conversational privacy can affect user perceptions of privacy and security positively. Moreover, user choices suggest that users are interested in obtaining information on their privacy and security in dialogue form. We discuss implications and limitations of this research.

Webinar: 2020-11-02 (Mon) Rainer Martin & Alexandru Nelus, Ruhr-Universität Bochum — 10h Brussels time [slides, video (talk only)]
Privacy-preserving Feature Extraction and Classification in Acoustic Sensor Networks
Abstract [+]

In this talk we present a brief introduction to acoustic sensor networks and to feature extraction schemes that aim to improve the privacy vs. utility trade-off for audio classification in acoustic sensor networks. Our privacy enhancement approach consists of neural-network-based feature extraction models which aim to minimize undesired extraneous information in the feature set. To this end, we present adversarial, siamese and variational information feature extraction schemes in conjunction with neural-network-based classification (trust) and attacker (threat) models. We consider and compare schemes with explicit knowledge of the threat model and without such knowledge. For the latter, we analyze and apply the variational information approach in a smart-home scenario. It is demonstrated that the proposed privacy-preserving feature representation generalizes well to variations in dataset size and scenario complexity while successfully countering speaker identification attacks.

Webinar: 2020-10-05 (Mon) Nick Gaubitch, Pindrop — 10h Brussels time [video (talk only)]
Voice Security and Why We Should Care
Abstract [+]

After a couple of decades of somewhat slow development, voice technologies have once again gained a momentum. Much of this has been driven by large leaps in speech and speaker recognition performance and consequently, the development of many voice interfaces. Some notably successful examples of current applications of voice are the Amazon Echo and Apple Siri but we also see and increasing number of institutions that make use of voice recognition to replace more traditional customer identification methods. While much of this development is exciting for speech and audio processing research, it also creates new and significant challenges in security and privacy. Furthermore, new technologies for various forms of voice modification and synthesis are on the rise which only exacerbates the problem.

In this talk we will first introduce Pindrop and the company’s mission in the world of voice security and we will take a glimpse into the global fraud landscape of call centres, which motivates the work that we do. Next, we will take a deeper dive into the specific topic of voice modification and some related research results. Finally, we will provide an outlook into the future of voice and voice security.

Webinar: 2020-09-07 (Mon) Pablo Pérez Zarazaga, Aalto University — 10h Brussels time [slides, video (talk only)]
Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks
Abstract [+]

Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.

Café: 2020-08-27 (Thu) Catherine Jasserand, Rijksuniversiteit Groningen — 16h Brussels time [slides]
What is speech/voice from a data privacy perspective: Insights from the GDPR
Abstract [+]

Catherine Jasserand, a postdoctoral researcher on privacy issues raised by biometric technologies, will discuss the notions of speech and voice from a data privacy perspective. If the GDPR mentions neither voice data nor speech data among the examples of personal data, it applies to both types of data when they relate to an identifiable or identified individual. The talk will be the opportunity to explain terminological issues (including what ‘identification’ means in the context of data protection).

Webinar: 2020-08-03 (Mon) Qiongxiu Li, Aalborg Universitet — 10h Brussels time [slides, video (talk only)]
Privacy-Preserving Distributed Optimization via Subspace Perturbation: A General Framework
Abstract [+]

As the modern world becomes increasingly digitized and interconnected, distributed signal processing has proven to be effective in processing its large volume of data. However, a main challenge limiting the broad use of distributed signal processing techniques is the issue of privacy in handling sensitive data. To address this privacy issue, we propose a novel yet general subspace perturbation method for privacy-preserving distributed optimization, which allows each node to obtain the desired solution while protecting its private data. In particular, we show that the dual variables introduced in each distributed optimizer will not converge in a certain subspace determined by the graph topology. Additionally, the optimization variable is ensured to converge to the desired solution, because it is orthogonal to this non-convergent subspace. We therefore propose to insert noise in the non-convergent subspace through the dual variable such that the private data are protected, and the accuracy of the desired solution is completely unaffected. Moreover, the proposed method is shown to be secure under two widely-used adversary models: passive and eavesdropping. Furthermore, we consider several distributed optimizers such as ADMM and PDMM to demonstrate the general applicability of the proposed method. Finally, we test the performance through a set of applications. Numerical tests indicate that the proposed method is superior to existing methods in terms of several parameters like estimated accuracy, privacy level, communication cost and convergence rate. [pre-print]

Webinar: 2020-07-06 (Mon) Francisco Teixeira, INESC-ID / IST, Univ. of Lisbon — 10h Brussels time [slides, video (talk only)]
Privacy in Health Oriented Paralinguistic and Extralinguistic Tasks
Abstract [+]

The widespread use of cloud computing applications has created a society-wide debate on how user privacy is handled by online service providers. Regulations such as the European Union's General Data Protection Regulation (GDPR), have put forward restrictions on how such services are allowed to handle user data. The field of privacy-preserving machine learning is a response to this issue that aims to develop secure classifiers for remote prediction, where both the client's data and the server's model are kept private. This is particularly relevant in the case of speech, and concerns not only the linguistic contents, but also the paralinguistic and extralinguistic info that may be extracted from the speech signal.
In this talk we provide a brief overview of the current state-of-the-art in paralinguistic and extralinguistic tasks for a major application area in terms of privacy concerns - health, along with an introduction to cryptographic methods commonly used in privacy-preserving machine learning. These will lay the groundwork for the review of the state-of-the-art of privacy in paralinguistic and extralinguistic tasks for health applications. With this talk we hope to raise awareness to the problem of preserving privacy in this type of tasks and provide an initial background for those who aim to contribute to this topic.