An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

  • Abdul Hannan
  • , Alessio Brutti
  • , Shah Nawaz
  • , Mubashir Noman

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource devices is impractical despite of their favorable performance. Existing approaches (pruning, distillation, layer skip etc.) transform the large models into smaller ones at the cost of significant performance degradation or require prolonged training of smaller models for better performance. To address these issues, we introduce an efficacious two-step representation learning based approach capable of producing several small sized models from a single large model ensuring considerably better performance in limited number of epochs. Comprehensive experimentation on ASR benchmarks reveals the efficacy of our approach, achieving three-fold training speed-up and up to 12.54% word error rate improvement.
Original languageEnglish
Title of host publicationInterSpeech 2025
Subtitle of host publication17-21 August 2025, Rotterdam, The Netherlands
Pages3613-3617
Number of pages5
Edition1
DOIs
Publication statusPublished - 2025

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Fields of science

  • 102003 Image processing
  • 202002 Audiovisual media
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 102 Computer Sciences

JKU Focus areas

  • Digital Transformation

Cite this