GRATS Dataset Project

Overview

Designed for realistic speaking-rate analysis

GRATS supports research on Mandarin speaking-rate variation. Unlike datasets that rely on artificially time-scaled references, GRATS provides naturally produced target-rate recordings from the same speaker and sentence. This makes it possible to evaluate whether speech time-scale modification systems approach authentic Mandarin target-rate realizations.

Anonymous review note This project-page draft has been anonymized for double-blind review. Author names, institutional affiliations, and direct identifying contact information have been removed and should be restored only after the review process is complete.

Dataset version note The published paper reports the original 25-speaker, 60-prompt, 7,500-utterance version. The current expanded corpus adds another 25-speaker, 60-prompt, 7,500-utterance set, resulting in 15,000 utterances in total. This is two 25 × 60 × 5 subsets, not a fully crossed 50 × 120 × 5 design.

50Native Mandarin speakers in expanded corpus

5Speaking rates

15,000Total utterances in expanded corpus

2 × 60Prompt sets in expanded corpus

Datasets

Speech and audio-visual resources

Speech

GRATS Speech Dataset

A naturally recorded parallel multi-speed Mandarin speech dataset for speech time-scale modification benchmarking. The original paper version contains 25 speakers, 60 prompts, and 7,500 utterances. The expanded corpus adds another 25-speaker, 60-prompt, 7,500-utterance set.

Natural target-rate references
Controlled speaker-sentence-rate pairing
Five target rates: 0.5×, 0.75×, 1.0×, 1.25×, 1.5×

Audio-visual

AV-GRATS Audio-Visual Dataset

An audio-visual extension for evaluating audio-visual speech recognition under speaking-rate variation. The original paper version reports 25 speakers and 7,500 utterances, while the expanded release can document the additional set separately.

Audio-video synchronized recordings
Mandarin AVSR robustness analysis
Controlled multi-speed setting

Examples

Audio examples for five speaking rates

Replace the sample sentence text, transcript, and file paths with your own clips. For anonymous review, use audio only and remove identity-revealing metadata before uploading.

Example 1

Sentence: 這學期學校有書法比賽

Natural recordings

0.5×

0.5x.wav

0.75×

0.75x.wav

1.0×

1.0x.wav

1.25×

1.25x.wav

1.5×

1.5x.wav

Access

Controlled-access research release

The dataset is provided for research use through a controlled-access request process. Requesters should prepare their name, affiliation, institutional email, research purpose, and agreement to the dataset terms of use.

Access formReview link or the final access request link after approval.

During double-blind review, direct identifying contact details can remain hidden. The official project contact and access form will be provided after review.

Benchmark

Natural reference protocol

For speech time-scale modification evaluation, the naturally recorded 1.0× utterance is used as input. System outputs at the target rates are compared against the corresponding naturally recorded target-rate utterances from the same speaker and sentence.

01

Input

Natural 1.0× recording

02

Target rates

0.5×, 0.75×, 1.25×, 1.5×

03

References

Natural target-rate recordings

04

Metrics

CER, PESQ, STOI, DNSMOS, duration MAE, F0 correlation

Terms

Responsible use

Use the dataset only for non-commercial academic research unless separate permission is granted.
Do not redistribute the raw audio, video, metadata, or derived speaker-identifiable data.
Do not attempt to identify speakers or recover personal information.
Cite the corresponding paper when using the dataset or benchmark protocol.
Follow any additional data-use agreement provided during the access request process.

Citation

Anonymous placeholders

This anonymized draft intentionally omits author-identifying citation information. The final public citations will be provided after review.

@inproceedings{grats2026,
  title     = {GRATS: A Natural Multi-Speed Mandarin Dataset for Speech Time-Scale Modification Benchmarking},
  author    = {Anonymous Authors},
  booktitle = {Proc. Venue Placeholder},
  year      = {2026}
}

@inproceedings{avgrats2026,
  title     = {AV-GRATS: A Parallel Multi-Speed Mandarin Audio-Visual Corpus for Speaking-Rate Robustness Analysis},
  author    = {Anonymous Authors},
  booktitle = {Proc. Venue Placeholder},
  year      = {2026}
}

Contact

Post-review update needed

For anonymous review, direct contact information is intentionally omitted. The official project or lab contact email, or link to an approved access request form will be provided after review.

anonymous-contact@example.com