A Database of Non-Native English Accents to Assist Neural Speech Recognition

Proceedings of The 12th Language Resources and Evaluation Conference (LREC 2020).
Abstract   Read Paper   Dataset   Code  

Embedding Projections

The visualization of MFCC vectors of speech samples from the dataset provides certain insights into the distribution of AccentDB. We share multiple embedding visualizations of our dataset - with 4 accents and 9 accents. We include 150 files from each class in the TensorFlow Embedding Projector tool below (please choose Color by -> Label in the left menu to differentiate between the classes). You can also perform PCA, UMAP and t-sne decomposition. Read more in section 2.6 or download processed vectors and metadata
Explore Embeddings  

Dataset

The current release v1.0 of AccentDB has three datasets licensed under a CC BY-NC 4.0 License.

release v1.0

Title Description Notes
2.8GB accentdb_core 4 non-native Indian English accents collected by authors. 6,587 files
3.9GB accentdb_extended Samples for 5 English Accents + 4 accents from accentdb_core. 19,111 files
1.3GB accentdb_raw Raw and unprocessed recordings for the core dataset. 11 files

To play with a smaller AccentDB dataset, we share a classification model described in section 3.1.2. You can experiment with the model and the dataset in a Colab notebook in your browser.
Open in Google Colab  

Citation

If you have found our dataset or models to be useful, please cite us as below.   Download Bib

1
@InProceedings{ahamad-anand-bhargava:2020:LREC,
2
  author    = {Ahamad, Afroz  and  Anand, Ankit  and  Bhargava, Pranesh},
3
  title     = {AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition},
4
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
5
  month          = {May},
6
  year           = {2020},
7
  address        = {Marseille, France},
8
  publisher      = {European Language Resources Association},
9
  pages     = {5353--5360},
10
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.659}
11
}

People

  Contact us at accentdb.research@gmail.com