Data Mining & Speech Research Seminar on 18.1.2019

When: Friday 18.1, 12:00 (sharp!)-13:30

Where: TB180 (Joensuu) and F213 (Kuopio)


Dr.  Esther Galbrun (12:00-12:30), A short introduction to redescription mining

Dr. Tanel Aluemäe (12:30-13:00), Training Speaker recognition Models with Recording-Level Labels

Mr. Lauri Tavi (13:00-13:30), Recognition of creaky voice from authentic emergency calls



Dr. Esther Galbrun, University of Eastern Finland

A short introduction to redescription mining

Abstract: A biologist interested in bioclimatic habitats of species needs to find geographical areas that admit two characterizations, one in terms of their climatic profile and one in terms of the occupying species. This is just one example of a general problem setting where we need to identify correspondences between data that have different nature (here, species vs. climate). To identify such correspondences over binary data sets, Ramakrishnan et al. proposed redescription mining in 2004. Subsequent research has extended the problem formulation to more complex correspondences and data types, making it applicable to a wide variety of data analysis tasks. We will give an introduction to redescription mining, from the problem definition to current algorithms, not forgetting applications.


Dr. Tanel Alumäe, senior researcher at Tallinn University of Technology, Estonia

Training Speaker recognition Models with Recording-Level Labels

Abstract: We investigate training speaker recognition models using coarse-grained speaker labels provided only at the recording level. The approach is based on a weakly supervised training method that allows to train a speaker recognition deep neural network using a special cost function that doesn't need segment-level annotations. Experiments are conducted on the VoxCeleb corpus. We show that without using any reference segment-level labeling, the method can achieve 1% speaker recognition error rate on the official VoxCeleb1 closed set speaker recognition test set.  By training a x-vector based speaker verification system on the resegmented and relabeled VoxCeleb1 corpus, we can achieve 4.57% EER on the VoxCeleb speaker verification test set. When using VoxCeleb2 data, the speaker verification EER reduces to 2.91% (as opposed to 3.13% when using reference segmentations).


Bio: Tanel Alumäe received a PhD from Tallinn University of Technology in 2006. After graduation, he worked as a post-doc at CNRS/LIMSI in France and at Aalto University, and later in Raytheon BBN Technologies in the USA. Currently he is a Senior Researcher at Tallinn University of Technology and leads a Laboratory of Language Technology. His main research interests are speech and speaker recognition.


Mr. Lauri Tavi, University of Eastern Finland

Recognition of creaky voice from authentic emergency calls

Abstract: Although creaky voice, or vocal fry, is widely studied phonation mode, open questions still exist in creak’s acoustic characterization and automatic recognition. Many questions are open since creak varies significantly depending on conversational context.

In this talk, I focus on recognition of creaky voice from authentic emergency calls because creak detection could potentially provide information about the caller’s emotional state or attempt of voice disguise. Based on our ongoing study, I demonstrate differences in the amount of creak and spectral moments between creaky and modal voice in two different speech corpora: authentic Finnish emergency call recordings and high-quality conversational speech recordings, which are used as out-of-domain data. In addition, I introduce an exploratory creak recognizer based on convolutional neural network (CNN), which is generated specifically for emergency calls. Our results show that the system can perform moderately well using a limited amount of training data on challenging testing data and has the potential to achieve higher F scores when more emergency calls are used for model training.


Bio: Lauri Tavi is a doctoral student at the Department of General Linguistics and Language Technology, University of Eastern Finland. In 2015, he finished his MA in Finnish language at University of Eastern Finland and started to work on his doctoral thesis about emotion-related prosodic and acoustic features in Finnish emergency calls. In 2017, he worked on his thesis at the Laboratory of Language Technology, Tallinn University of Technology, as a visiting doctoral student. Currently he is a member of the board of the Linguistic Association of Finland. His research interests include acoustic-phonetic analysis of emotions, forensic speech science, speech recognition systems and speech prosody.