Project funded by the ANR

VERA project (2013-2015): research on speech recognition

  • The VERA project aims at developing tools for diagnostic, localization, and measurements of automatic transcription errors. This project is based on a consortium of first-rate academic actors in this field. The objective is to study the errors in detail (at the perceptive, acoustico-phonetics, lexical, and syntactic levels) in order to yield a precise diagnosis of possible lacks of the current classical models on certain classes of linguistic phenomena.
  • At the application level, the VERA project is justified by an observation—that a number of applications offering access to the contents of multimedia data are made possible by the use of automatic transcription of speech: subtitling of video, search for precise portions of audio-visual archives, automated reports of meetings, extraction and structuring of information (Speech Analytics) in multimedia contents (Web, call centers, …). However large scale deployment is often slowed down by the fact that transcription by automatic speech recognition systems contains too many errors. Research and development in speech recognition has focused, successfully until now, on the improvement of methods and models implemented in the transcription process, measured through the word error rate; however, past a given performance level, the the cost of reducing the residual errors increases exponentially.
  • Transcription errors thus persist, which are more or less awkward according to the applications. Information retrieval is tolerant with errors (up to 30%), but systematic errors on certain named entities can be prohibitive. On the contrary, subtitling or meeting transcription have a very weak tolerance with the errors, and even very low word error rates compared to the state of the art (lower than 5%) are too high for the end-users.
  • Error processing is not limited to increasing the acceptance level of applications based on automatic transcription. Error classification, impact measurement through perceptive tests, error diagnosis for current state-of-the-art transcription systems, constitute the first, crucial step in identifying the lacks of the current models and preparing the future generations of Automatic Speech Recognition system.
  • The VERA project aims, through close cooperation between complementary partners who excel in their field, at setting up an infrastructure for detection, diagnosis, and qualitative measurement, which makes it possible to create a virtuous circle of improvement of large and very large vocabulary continuous speech recognition systems.


  • Many speech researchers from both computer science and linguistic sciences plead for a re-convergence of their sciences, in the framework of an experimental speech science (for instance see [Liberman2010, Adda2011]). The basic idea is to use the automatic instruments developed by computer scientists to explore the vast speech corpora with a linguistic vision. This exploration leads to new knowledges which in turn will lead to improvements in the modelisation of speech in the automatic systems.
  • In this framework, the study of errors, and especially residual errors, is crucial. This study is pluridisciplinary and multi-form. It covers the comparison of human and machine performance [Lippmann1997], as well as the diagnostic study of errors made by machines [Goldwater_Jurafsky_Manning_2010], or by humans and machines [Vasilescu11].
  • This new paradigm needs new tools, instruments, corpora. For instance, most of the error diagnostic studies have been done in a situation were the Word Error Rate (WER) was high, while we may think that we should focus on situations were the WER is sufficiently low to permit an easy classification: when the WER is high, the different error types overlap, interfer and block any meaningful analysis. These favorable situations which are especially interesting to study are on tasks and corpora where many evaluations have already been done, or where the level of performance of automatic systems is reaching an asymptot, meaning that in the residual errors lie phenomena which are not well modeled by classical technics, and thus interesting to study.
  • The project VERA aims at developping a methodology and generic tools to enable the localisation and the diagnosis of Automatic Speech Recognition (ASR) errors. Together with these tools, new metrics will be developped enabling a contrastive focus on different types of errors, depending of the application.
  • The project VERA will use this new tools for diagnosis and metrics to annotate the errors produced by many ASR systems on different corpora in order to increase our knowledge on the nature of these residuals. This knowledge, in complement with the usual evaluation paradigm in usage in ASR until now, will help to identify and solve the main important locks for the next generation of ASR systems.
  • To summarize, the aim of the project is to improve the performance of the automatic systems and to increase our linguistic knowledge on speech.

Position of the VERA project

  • Studying the errors made by automatic systems has already been tackled in the field of speech and language. But while study of residuals is usual and crucial in many sciences (for instance signal processing), in the field of speech and language, these studies are marginal. However, some authors [Adda2011,Abney2011] advocates for a clear focus on this subject, in the framework of experimental linguistics, as it points out the differences between prediction and observation.
  • To our knowledge, no national or international project are underway on this subject (in 2012). Given the international interest about the set up of a common experimental framework for all speech sciences (see for instance [Liberman2010]) we may think that some international projects will spring up. At the national level, we see real convergences with the Labex Empirical Foundations of Linguistics (EFL), in which a partner (LPP) is one of the leading laboratories.
  • The VERA project aims at developing, and distributing, methodologies and tools which are crucial bricks for this experimental framework. Through direct collaboration with EFL, as well as indirect collaborations with other institutions (for instance agency like ELRA, LDC, or European networks like CLARIN and META-NET) who have direct interest in the experimental framework set up, we will assure a clear visibility of the project and of the different researches on this subject.
home.txt · Last modified: 08/07/2013 00:07 by yannick Estève