Skip to content

HartmannPsi/Reliability-Aware-Score

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reliability-Aware Score

This repository contains the open-source code for the core evaluation components of our paper on reliability-aware automatic speech recognition (ASR): the definition of RAS (Reliability-Aware Score) and the code used to fit the trade-off parameter alpha from human preference annotations. The corresponding paper is available on arXiv now.

What is included

This release focuses on two pieces from the paper:

  1. RAS metric computation
    RAS extends edit-distance-based ASR evaluation by introducing a placeholder token (default: <ph>) for abstention. It discounts placeholder-related errors by a factor alpha, following the dynamic programming formulation described in the paper.

  2. Human-aligned alpha fitting
    alpha is not chosen heuristically. Instead, it is fit from listening-test preference data using the Bradley-Terry style objective described in the paper.

Repository structure

./
├── RAS.py
├── fit_alpha.py
├── requirements.txt
└── third_party/
    └── normalizers/
        ├── __init__.py
        ├── basic.py
        ├── english.py
        └── english.json

File-by-file description

  • RAS.py Main implementation of the RAS metric. It:

    • normalizes English and code-switching text,
    • converts Traditional Chinese to Simplified Chinese with OpenCC,
    • merges consecutive placeholder tokens,
    • computes abstention-aware alignment with dynamic programming,
    • returns detailed counts such as C, S, D, I, S_ph, I_ph, and final RAS.
  • fit_alpha.py Fits the trade-off parameter alpha from human listening-test annotations. It:

    • aggregates pairwise preference counts over transcript A / transcript B / tie,
    • reads precomputed transcript metrics,
    • optimizes alpha with PyTorch and Adam,
    • supports tie-aware fitting through the lambda_tie term.
  • requirements.txt Python dependencies needed for the released code.

  • third_party/normalizers/__init__.py Package entry for the text normalization utilities.

  • third_party/normalizers/basic.py Basic text normalization helpers adapted from OpenAI Whisper, including punctuation/symbol cleanup and Unicode normalization.

  • third_party/normalizers/english.py English text normalization utilities adapted from OpenAI Whisper, including number normalization and English-specific cleanup used before metric computation.

  • third_party/normalizers/english.json Lookup/configuration data used by the English normalizer.

Installation

We recommend Python 3.10+.

cd Reliable-ASR/open-source
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If opencc installation fails on your platform, install the corresponding system package for OpenCC first, then rerun pip install -r requirements.txt.

Usage

1. Compute RAS for a reference-hypothesis pair

The main entry point is compute_metrics in RAS.py.

from RAS import compute_metrics

ref = "the patient has a history of diabetes"
hyp = "the patient has <ph> of diabetes"

metrics = compute_metrics(ref, hyp, alpha=0.5064)
print(metrics)

Example returned fields:

{
    "N": ...,
    "C": ...,
    "S": ...,
    "D": ...,
    "I": ...,
    "S_ph": ...,
    "I_ph": ...,
    "ph_errors": ...,
    "non_ph_errors": ...,
    "RAS": ...
}

Metric definition used in the code

Consistent with the paper, the implementation computes:

RAS = (C - (non_placeholder_errors + alpha * placeholder_errors)) / N

where:

  • C is the number of correct matches,
  • N is the reference length,
  • non_placeholder_errors = S + D + I,
  • placeholder_errors = S_ph + I_ph.

The default placeholder token is <ph>.

Text preprocessing behavior

Before alignment, the code:

  • normalizes English text,
  • tokenizes code-switching text,
  • converts Traditional Chinese to Simplified Chinese,
  • collapses consecutive <ph> tokens into a single placeholder.

This is important if you want reproduction to match the paper implementation.

2. Fit alpha from human preference data

fit_alpha.py is a release of the fitting logic used for the human-alignment experiment. Before running it, you need to replace the placeholder paths at the top of the file:

data_dir = 'path/to/human_choices'
wer_path = 'path/to/listen_test_full.json'

You also need to implement:

def get_path(str):
    return 'path/to/audio.wav'

This function should map a listening-test item ID to the corresponding audio path, because the script filters invalid annotations partly based on audio duration and response time.

Then run:

python fit_alpha.py

The script will:

  • read all human annotation JSON files under data_dir,
  • resolve whether each preference corresponds to transcript A or B,
  • filter overly fast responses,
  • aggregate counts kA, kB, kC,
  • optimize alpha.

Expected input for alpha fitting

The current script assumes:

  • a directory of JSON files containing listening-test annotations,
  • a JSON file (wer_path) containing per-item transcript metrics for systems A and B,
  • accessible audio files for duration-based filtering.

Because this release is extracted from the paper codebase, you will likely need to adapt file paths and the exact data schema to your own annotation export format.

Notes on reproduction

  • RAS.py currently sets the default ALPHA = 0.5064, which is the fitted value reported in the paper.
  • fit_alpha.py is the fitting script used to estimate that value from human preferences.
  • The implementation follows the paper’s abstention-aware dynamic programming formulation, where one placeholder may align to multiple consecutive reference tokens.

Citation

If you use this code, please cite our paper:

@misc{huang2026rasreliabilityorientedmetric,
      title={RAS: a Reliability Oriented Metric for Automatic Speech Recognition}, 
      author={Wenbin Huang and Yuhang Qiu and Bohan Li and Yiwei Guo and Jing Peng and Hankun Wang and Xie Chen and Kai Yu},
      year={2026},
      eprint={2604.24278},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2604.24278}, 
}

About

Open-source code for the core evaluation components of our paper on reliability-aware automatic speech recognition.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages