SPSC Webinar & Café Sessions

Smart speakers and virtual assistants are part of our daily life. Their applications find not only use in our homes, where speech technology grants an unprecedented level of convenience, but also in health care, forensic sciences as well as in banking and payment methods; speech technology has dual use applications. By consequence, we need to evolve our understanding of security & privacy for applications in speech communication.

SPSC features two formats.

  • Webinar — Once-a-month web seminars. The lectures range from keynote-style talks of seniors in industry and academia to practice talks of doctoral and master defenses. We try to keep the time slot for the first Monday in a month (not a bank day) at 10h Brussels time.
    Duration. 40 minute talk & up to 20 minute Q&A.
  • Café Journal Club — We meet for a Café session and discuss current topics and papers related to the work we are doing. Bring a tea, coffee or your favorite beverage. Let's exchange knowledge and ideas. Young researchers decide the topic for an interactive debate and ideas for collaboration. This is an excellent opportunity to meet other researchers in the field, expand your professional network, and find potential collaborators on a project. In the first part of the meeting we will be discussing a paper or topic (30 minutes), followed by 10 minutes of discussing ideas. We will meet once per month, similar to the Webinar, but near the end of the month. Meeting duration 30 minutes with 10 minutes of discussion. We are happy to receive suggestions on interesting papers and topics, and also open to participants as volunteer meeting leader.


    Duration. 30 minutes followed by 10 mins discussion.

Outcome. The goal of the lecture talks is to understand another perspective and discuss on particular aspects of SPSC in its inter-disciplinary setting. We need to leave our comfort zones to meaningfully anticipate the merger of speech technology with SPSC research areas including: user-interface design, study of the law, cryptography, and cognitive sciences.

Propose a talk. Simply drop us an email with speaker, date/time, title & abstract to: cafe@lists.spsc-sig.org

Open to everyone. Including non-members (0 EUR fee). Please register for stating your data privacy consent and to obtain the session URL.

Upcoming webinars

Webinar: 2021-12-06 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time [registration]
    First SPSC Symposium - Review and Outlook
    Abstract [+]
The first SPSC Symposium took place on November 10-12, 2021, online. It was meant to bring together people with various backgrounds and perspectives on speech communication to present and discuss hot topics in security and privacy research. There were three invited talks, two workshops, one PhD symposium, and nearly 20 poster presentations. In this talk, Ingo and Karla will give some insights on the organizational process and their main take aways. The audience is encouraged to actively participate by providing feedback and bringing in ideas for the next Symposium, planned to take place at the Interspeech 2022.
Webinar: 2022-02-07 (Mon) Hung-Yi Lee & Haibin Wu, National Taiwan University — 10h Brussels time [registration]
    "Is self-supervised learning universal in speech processing tasks?" and "Characterizing adversarial robustness for speaker verification"
    Abstract [+]
Webinar: 2022-03-07 (Mon) Nitin Sawhney, Aalto University Magdeburg — 16h Brussels time [registration]
    Abstract [+]

Past webinars

Webinar: 2021-11-01 (Mon) Meeri Haataja, Saidot — 10h Brussels time [slides are available upon request]
    Ethical considerations in voice and speech technologies
    Abstract [+]
Meeri Haataja is the CEO and co-founder of Saidot, a Finland based company with a mission for building responsible AI ecosystems, helping companies assess and document the ethical and legal aspects of their AI systems. The platform is used by major public and private organizations in deploying systematic AI governance and transparency via AI registers. In her presentation, Meeri will share best practices on AI governance and how to apply systematic ethical assessment in AI development projects. She will introduce how the platform will help AI teams collaborate on AI ethics and governance, recognizing the growing industry need to address the specificities of varying AI use cases and industries. After Meeri's presentation we will discuss which specific aspects should we embed in AI governance methodologies while assessing the use cases for voice technologies and speech interfaces. How could the academic speech community contribute to creating standardized assessment and documentation approaches capable of addressing the ethical considerations specific to voice and speech technologies?
Webinar: 2021-10-04 (Mon) Tore Knudsen, Artist at Noodle — 16h Brussels time [video (talk only)]
    Project Alias - A parasite for the surveillance age.
    Abstract [+]
With Project Alias, Tore Knudsen and Bjørn Karmann from Denmark demonstrated a simple, yet effective way to take back control over our own private sphere, which earned them the STARTS Prize of the European Commission in 2019. Tore will today show us the process and thoughts behind the project, while also showing other example of his speculative design work that explores our relationship with technology, data and privacy.
Webinar: 2021-09-06 (Mon) Christoph Lutz, Norwegian Business School — 10h Brussels time [slides, video (talk only)]
    Privacy and Smart Speakers - A Multi-Dimensional Approach
    Abstract [+]
Over the last few years, smart speakers such as Amazon Echo and Google Home have become increasingly present in many households. Yet, privacy remains a prominent concern in the public discourse about smart speakers, as well as in the nascent academic literature. We argue that privacy in the context of smart speakers is more complex than in other settings due to smart speakers' specific technological affordances and also the axial relationships between users, the device, device manufacturers, application developers, and other third parties such as moderation contractors and data brokers. With survey data from Amazon Echo and Google Home users in the UK, we explore users' privacy concerns and privacy protection behaviors related to smart speakers. We rely on a contextual understanding of privacy, assessing the prevalence of seven distinct privacy concern types as well as three privacy protection behaviors. The results indicate that concerns about third parties, such as contractors listening to smart speaker recordings, are most pronounced. Privacy protection behaviors are uncommon but partly affected by privacy concerns and motives such as social presence and utilitarian benefits. Taken together, our research paints a picture of privacy pragmatism or privacy cynicism among smart speaker users.
Webinar: 2021-08-02 (Mon) Andreas Nautsch, vitas.ai — 10h Brussels time [slides, video (talk only)]
    Metrics in VoicePrivacy and ASVspoof Challenges
    Abstract [+]
Security & privacy are essential to human machine interaction; so is the assessment of countermeasures & safeguards. Whereas conventional machine learning systems are evaluated for their recognition performance, projecting the same metrics to security & privacy contexts does not suffice. For security, when countermeasures are added on-top, it misleads to simply consider alone either the original system or the countermeasure: a new system is composed which needs to be evaluated for its entirety. For privacy, where an add-on mindset is inadequate to fulfill by-design and by-default demands, assessment aims at estimating the capacity of an adversary to infer sensitive information from data while having no further knowledge about her. This talk reflects on both at the example of voice biometrics in speech technology. In the ASVspoof and VoicePrivacy challenges, the security & privacy are investigated for speech technology. While the aim of ASVspoof is fake audio detection to protect voice biometrics from attacks through synthetic and replayed speech, the aim of VoicePrivacy is to suppress biometric factors in audio data when only recognition matters of what was said. Both challenges gather new communities for benchmarking solutions with common protocols and datasets at the level of first and advanced steps. For this, task definition is as much of relevance as the development of metrics. Depending on the research challenge, new metrics have been introduced. The tandem detection cost function “t-DCF” and the zero evidence biometric recognition assessment “ZEBRA” frameworks are presented, illustrated, and explained to tackle security & privacy quantification. As an “add-on,” the t-DCF framework extends upon the DCF metric which is used since over two decades in evaluation of voice biometrics. On the contrary, not as “add-on,” the ZEBRA framework is motivated by Shannon’s “perfect secrecy” and the validation methodology of automated systems in forensic sciences. Both frameworks indicate directions for developing future capacities in better characterizing the security & privacy tasks at hand.
Webinar: 2021-07-12 (Mon) Ivo Emanuilov & Katerina Yordanova, KU Leuven — 11h Brussels time [slides, video (talk only)]
    "Open the pod bay doors, HAL": legal limitations on the use of biometric data for emotion detection and speech recognition in human-robot collaboration on the smart shop floor
    Abstract [+]
Industrial applications of AI, IoT, augmented reality and digital twins in connected, distributed and scalable Factories of the Future bear the promise of new, more efficient and better optimized manufacturing. In the future, human workers and robots would no longer work in separation. Cobots would operate alongside humans with mobile robots assuming more and increasingly independent tasks on the shop floor. The expected optimization gains, however, would have to be balanced against the fundamental rights of workers. Indeed, the most promising use cases for human-robot collaboration entail also the most privacy-invasive practices, such as recognition of speech, gait analysis, and emotion detection. The rollout of these new scenarios would depend on balancing the values of human safety, IT and OT security and workplace privacy. In this talk, we push several selected use cases to their point of collision with EU privacy and data protection law and attempt to salvage as much as possible from the wreckage.
Webinar: 2021-06-07 (Mon) Ingo Siegert, Otto-von-Guericke-University Magdeburg — 10h Brussels time [slides, video (talk only)]
    Speech Behavior Matters - Automatically Detect Device Directed Speech for the application of addressee-detection
    Abstract [+]
Voice Assistants getting more and more popular and change the way people interact. Furthermore, more people get in contact with them and it is usual to use them in the daily routine. But, unfortunately most systems are still just voice-controlled remotes and the conversations still feels uncomfortable. Especially as the conversation activation needs a wake-word, which is still error-prone. This talk firstly discusses examples about errors in the conversation initiation and depicts the state-of the art in the research field of addressee-detection with a special focus on prosodic differences in the addressee behavior. Afterwards own analyses to the addressee behavior for modern voice-assistants in two different settings: a) interactions with Amazon's Alexa in a lab setting, dataset of similar dialog complexity between HHI and HCI. Subsequently, analyses of self-reports and annotator feedback on the speaking behavior will be discussed, followed by an overview of different recognition experiments to finally build an (intelligent) addressee-detection framework based on prosodic characteristics. The talk is then concluded by mention possible future research directions and open issues in experimental conditions.
Webinar: 2021-05-03 (Mon) Clara Hollomey, Austrian Academy of Sciences — 10h Brussels time
    Time-frequency analysis of speech signals: from algorithms to human perception
    Abstract [+]
Time-frequency analysis provides a wealth of information about audio signals, such as their pitch, formant resonances, and the presence of noisy components in the sound. Also in the human inner ear, sound is spread out according to frequency before being further processed along the auditory pathway. Consequently, time-frequency analysis forms the basis of most speech detection, classification, and discrimination tasks. In this talk, we outline advantages and constraints of common time-frequency analysis algorithms, such as the discrete Gabor, Fourier, and wavelet transform, with regards to speech signals. We describe how those algorithms can be applied to modelling human speech processing and give examples for typical applications, such as speech denoising, pitch-shifting, and estimating the short-term speech intelligibility. The algorithms, models, and their applications are illustrated by demos drawn from the Large Time Frequency Analysis Toolbox (LTFAT) and the Auditory Modeling Toolbox (AMT) . While the LTFAT provides algorithmically efficient means for audio signal representation and modification, the AMT specifically targets speech signal analysis in the human auditory system. The AMT and the LTFAT are both open-source and can be freely downloaded from ltfat.github.io and http://amtoolbox.sourceforge.net/ respectively.
Webinar: 2021-04-12 (Mon) Olya Kudina, TU Delft — 10h Brussels time [slides are available upon request]
    Ethical considerations of the algorithmic processing of language and speech
    Abstract [+]
In this talk, Olya will discuss the ethical implications of voice-based interfaces, such as Siri, Google Home or Alexa. She will consider them from the theoretical perspective of technologies-as-mediators and ethics-as-accompaniment to show how voice-based technologies help to shape our lives. Olya will discuss specific instances of how they foster moral perceptions, choices and values, and in parallel give a response from the creative user and design communities. Together, this will provide a clue on how to shape meaningful interactions with voice assistants, individually and collectively.
Webinar: 2021-03-01 (Mon) Yefim Shulman, Tel Aviv University — 10h Brussels time [slides, video (talk only)]
    Promised but Not Guaranteed: Understanding People's Ability to Control Their Personal Information
    Abstract [+]
The consensus in legal frameworks, such as the GDPR and CCPA, states that people (data subjects) should be able to exercise control over their personal information. Yet, having control over what happens to their personal information in practice remains a challenging endeavor for the data subjects. Based on a conceptual control theoretic analysis and select empirical findings, my talk will discuss what control over personal information may require and how it may be improved.
Webinar: 2021-02-01 (Mon) Lara Gauder & Leonardo Pepino, University of Buenos Aires — 16h Brussels time [slides, video (talk only)]
    A Study on the Manifestation of Trust in Speech
    Abstract [+]
Research has shown that trust is an essential aspect of human-computer interaction, determining the degree to which the person is willing to use a system. Predicting the level of trust that a user has on the skills of a certain system could be used to attempt to correct potential distrust by having the system take relevant measures like, for example, explaining its actions more thoroughly. In our research project, we have explored the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech. For this purpose, we designed a protocol for collecting speech data, consisting of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant, which they were led to believe was either very reliable or unreliable. We collected a speech corpus in Argentine Spanish and found that the reported level of trust was effectively elicited by the protocol. Preliminary results using random forest classifiers showed that the subject’s speech can be used to detect which type of VA they were using with an accuracy up to 76%, compared to a random baseline of 50%.
Webinar: 2021-01-11 (Mon) Tom Bäckström, Aalto University — 10h Brussels time [slides, video (talk only)]
    Code of Conduct for Data Management in Speech Research - Starting the process
    Abstract [+]
For anyone working with speech it should, by now, be obvious that we need to take care of the rights of the people involved. The question is only, how do we do that? Data management in an ethically responsible manner is one the aspect of this issue and it touches all researchers; I have myself struggled many times with designing data management plans and I have received many requests that the SPSC SIG should take a shot at this. Therefore I think we need community-wide guidelines of how to handle data management, especially with respect to privacy. For example, what is an acceptable level of anonymization of data? When do we need to anonymize? When do we need to limit access to data by, say, requiring signature of a contract? What kind of expiry dates should data have? To which extent should data use be checked in the review processes of conferences, journals and grant applications? And so on. The objective of this session is to setup the process through which we create a code of conduct. That is, my intention is only to discuss how we want to make decisions about the code of conduct and not to even attempt at writing anything for a first draft. In this session I'll thus present an outline of a roadmap of how we can create a code of conduct. The session will consists of a shorter pre-recorded presentation part, where I present my initial draft of the process. After the presentation, I invite everyone to join a discussion with webcameras on. The session is not recorded, but I'll share my notes with the participants.

Webinar: 2020-12-07 (Mon) Birgit Brüggemeier, Fraunhofer Institute for Integrated Circuits IIS — 10h Brussels time [slides, video (talk only)]
    Conversational Privacy – Communicating Privacy and Security in Conversational User Interfaces
    Abstract [+]
In 2019, media scandals raised awareness about privacy and security violations in Conversational User Interfaces (CUI) like Alexa, Siri and Google. Users report that they perceive CUI as “creepy” and that they are concerned about their privacy. The General Data Protection Regulation (GDPR) gives users the right to control processing of their data, for example by opting-out or requesting deletion and it gives them the right to obtain information about their data. Furthermore, GDPR advises for seamless communication of user rights, which, currently, is poorly implemented in CUI. This talk presents a data collection interface, called Chatbot Language (CBL) that we use to investigate how privacy and security can be communicated in a dialogue between user and machine. We find that conversational privacy can affect user perceptions of privacy and security positively. Moreover, user choices suggest that users are interested in obtaining information on their privacy and security in dialogue form. We discuss implications and limitations of this research.
Webinar: 2020-11-02 (Mon) Rainer Martin & Alexandru Nelus, Ruhr-Universität Bochum — 10h Brussels time [slides, video (talk only)]
    Privacy-preserving Feature Extraction and Classification in Acoustic Sensor Networks
    Abstract [+]
In this talk we present a brief introduction to acoustic sensor networks and to feature extraction schemes that aim to improve the privacy vs. utility trade-off for audio classification in acoustic sensor networks. Our privacy enhancement approach consists of neural-network-based feature extraction models which aim to minimize undesired extraneous information in the feature set. To this end, we present adversarial, siamese and variational information feature extraction schemes in conjunction with neural-network-based classification (trust) and attacker (threat) models. We consider and compare schemes with explicit knowledge of the threat model and without such knowledge. For the latter, we analyze and apply the variational information approach in a smart-home scenario. It is demonstrated that the proposed privacy-preserving feature representation generalizes well to variations in dataset size and scenario complexity while successfully countering speaker identification attacks.
Webinar: 2020-10-05 (Mon) Nick Gaubitch, Pindrop — 10h Brussels time [video (talk only)]
    Voice Security and Why We Should Care
    Abstract [+]
After a couple of decades of somewhat slow development, voice technologies have once again gained a momentum. Much of this has been driven by large leaps in speech and speaker recognition performance and consequently, the development of many voice interfaces. Some notably successful examples of current applications of voice are the Amazon Echo and Apple Siri but we also see and increasing number of institutions that make use of voice recognition to replace more traditional customer identification methods. While much of this development is exciting for speech and audio processing research, it also creates new and significant challenges in security and privacy. Furthermore, new technologies for various forms of voice modification and synthesis are on the rise which only exacerbates the problem.

In this talk we will first introduce Pindrop and the company’s mission in the world of voice security and we will take a glimpse into the global fraud landscape of call centres, which motivates the work that we do. Next, we will take a deeper dive into the specific topic of voice modification and some related research results. Finally, we will provide an outlook into the future of voice and voice security.
Webinar: 2020-09-07 (Mon) Pablo Pérez Zarazaga, Aalto University — 10h Brussels time [slides, video (talk only)]
    Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks
    Abstract [+]
Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.
Café: 2020-08-27 (Thu) Catherine Jasserand, Rijksuniversiteit Groningen — 16h Brussels time [slides]
    What is speech/voice from a data privacy perspective: Insights from the GDPR
    Abstract [+]
Catherine Jasserand, a postdoctoral researcher on privacy issues raised by biometric technologies, will discuss the notions of speech and voice from a data privacy perspective. If the GDPR mentions neither voice data nor speech data among the examples of personal data, it applies to both types of data when they relate to an identifiable or identified individual. The talk will be the opportunity to explain terminological issues (including what ‘identification’ means in the context of data protection).
Webinar: 2020-08-03 (Mon) Qiongxiu Li, Aalborg Universitet — 10h Brussels time [slides, video (talk only)]
    Privacy-Preserving Distributed Optimization via Subspace Perturbation: A General Framework
    Abstract [+]
As the modern world becomes increasingly digitized and interconnected, distributed signal processing has proven to be effective in processing its large volume of data. However, a main challenge limiting the broad use of distributed signal processing techniques is the issue of privacy in handling sensitive data. To address this privacy issue, we propose a novel yet general subspace perturbation method for privacy-preserving distributed optimization, which allows each node to obtain the desired solution while protecting its private data. In particular, we show that the dual variables introduced in each distributed optimizer will not converge in a certain subspace determined by the graph topology. Additionally, the optimization variable is ensured to converge to the desired solution, because it is orthogonal to this non-convergent subspace. We therefore propose to insert noise in the non-convergent subspace through the dual variable such that the private data are protected, and the accuracy of the desired solution is completely unaffected. Moreover, the proposed method is shown to be secure under two widely-used adversary models: passive and eavesdropping. Furthermore, we consider several distributed optimizers such as ADMM and PDMM to demonstrate the general applicability of the proposed method. Finally, we test the performance through a set of applications. Numerical tests indicate that the proposed method is superior to existing methods in terms of several parameters like estimated accuracy, privacy level, communication cost and convergence rate. [pre-print]
Webinar: 2020-07-06 (Mon) Francisco Teixeira, INESC-ID / IST, Univ. of Lisbon — 10h Brussels time [slides, video (talk only)]
    Privacy in Health Oriented Paralinguistic and Extralinguistic Tasks
    Abstract [+]
The widespread use of cloud computing applications has created a society-wide debate on how user privacy is handled by online service providers. Regulations such as the European Union's General Data Protection Regulation (GDPR), have put forward restrictions on how such services are allowed to handle user data. The field of privacy-preserving machine learning is a response to this issue that aims to develop secure classifiers for remote prediction, where both the client's data and the server's model are kept private. This is particularly relevant in the case of speech, and concerns not only the linguistic contents, but also the paralinguistic and extralinguistic info that may be extracted from the speech signal.
In this talk we provide a brief overview of the current state-of-the-art in paralinguistic and extralinguistic tasks for a major application area in terms of privacy concerns - health, along with an introduction to cryptographic methods commonly used in privacy-preserving machine learning. These will lay the groundwork for the review of the state-of-the-art of privacy in paralinguistic and extralinguistic tasks for health applications. With this talk we hope to raise awareness to the problem of preserving privacy in this type of tasks and provide an initial background for those who aim to contribute to this topic.