Overview
Designed for realistic speaking-rate analysis
GRATS supports research on Mandarin speaking-rate variation. Unlike datasets that rely on
artificially time-scaled references, GRATS provides naturally produced target-rate recordings
from the same speaker and sentence. This makes it possible to evaluate whether speech
time-scale modification systems approach authentic Mandarin target-rate realizations.
Anonymous review note
This project-page draft has been anonymized for double-blind review. Author names,
institutional affiliations, and direct identifying contact information have been removed
and should be restored only after the review process is complete.
Dataset version note
The published paper reports the original 25-speaker, 60-prompt, 7,500-utterance version.
The current expanded corpus adds another 25-speaker, 60-prompt, 7,500-utterance set,
resulting in 15,000 utterances in total. This is two 25 × 60 × 5 subsets, not a fully
crossed 50 × 120 × 5 design.
50Native Mandarin speakers in expanded corpus
5Speaking rates
15,000Total utterances in expanded corpus
2 × 60Prompt sets in expanded corpus
Datasets
Speech and audio-visual resources
Speech
GRATS Speech Dataset
A naturally recorded parallel multi-speed Mandarin speech dataset for speech time-scale
modification benchmarking. The original paper version contains 25 speakers, 60 prompts,
and 7,500 utterances. The expanded corpus adds another 25-speaker, 60-prompt,
7,500-utterance set.
- Natural target-rate references
- Controlled speaker-sentence-rate pairing
- Five target rates: 0.5×, 0.75×, 1.0×, 1.25×, 1.5×
Audio-visual
AV-GRATS Audio-Visual Dataset
An audio-visual extension for evaluating audio-visual speech recognition under
speaking-rate variation. The original paper version reports 25 speakers and 7,500
utterances, while the expanded release can document the additional set separately.
- Audio-video synchronized recordings
- Mandarin AVSR robustness analysis
- Controlled multi-speed setting
Examples
Audio examples for five speaking rates
Replace the sample sentence text, transcript, and file paths with your own clips. For
anonymous review, use audio only and remove identity-revealing metadata before uploading.
Access
Controlled-access research release
The dataset is provided for research use through a controlled-access request process.
Requesters should prepare their name, affiliation, institutional email, research purpose,
and agreement to the dataset terms of use.
Access formReview link or the final access request link after approval.
During double-blind review, direct identifying contact details can remain hidden. The official project contact and access form will be provided after review.
Benchmark
Natural reference protocol
For speech time-scale modification evaluation, the naturally recorded 1.0× utterance is used as input.
System outputs at the target rates are compared against the corresponding naturally recorded
target-rate utterances from the same speaker and sentence.
01Input
Natural 1.0× recording
02Target rates
0.5×, 0.75×, 1.25×, 1.5×
03References
Natural target-rate recordings
04Metrics
CER, PESQ, STOI, DNSMOS, duration MAE, F0 correlation
- Use the dataset only for non-commercial academic research unless separate permission is granted.
- Do not redistribute the raw audio, video, metadata, or derived speaker-identifiable data.
- Do not attempt to identify speakers or recover personal information.
- Cite the corresponding paper when using the dataset or benchmark protocol.
- Follow any additional data-use agreement provided during the access request process.
Citation
Anonymous placeholders
This anonymized draft intentionally omits author-identifying citation information. The final public citations will be provided after review.
@inproceedings{grats2026,
title = {GRATS: A Natural Multi-Speed Mandarin Dataset for Speech Time-Scale Modification Benchmarking},
author = {Anonymous Authors},
booktitle = {Proc. Venue Placeholder},
year = {2026}
}
@inproceedings{avgrats2026,
title = {AV-GRATS: A Parallel Multi-Speed Mandarin Audio-Visual Corpus for Speaking-Rate Robustness Analysis},
author = {Anonymous Authors},
booktitle = {Proc. Venue Placeholder},
year = {2026}
}