[codex] expose optional ClearerVoice HQ model by rtzr-jelly · Pull Request #1 · rtzr/crisp

rtzr-jelly · 2026-06-23T12:06:26Z

What changed

Added an optional ClearerVoice file-HQ path in the File tab. The HQ model picker appears only when a ClearerVoice command or local venv/checkpoint install is discovered.
Added scripts/clearvoice-wrapper.py as a file-in/file-out adapter for ClearerVoice and updated the HQ model PoC setup instructions.
Recorded ClearerVoice PoC results and updated status, licensing, and voice-enhancer docs.
Ignored locally downloaded external model checkpoints.

Why

ClearerVoice beat the built-in DSP file-HQ path on the 12-sample corpus by average SI-SDR, but it depends on Python/Torch and downloaded weights. This keeps the app bundle clean while exposing the better HQ path only when it is installed locally.

Validation

swift test --package-path app — 32 tests, 0 failures
.clearvoice.venv/bin/python -m py_compile scripts/clearvoice-wrapper.py
git diff --check

기존 DeepFilterNet 노이즈 캔슬링 위에 교체 가능한 2-stage 파이프라인과 DSP 보이스 인핸서를 추가한다 (krisp_like_mac_voice_enhancer_prd_v02.docx). 엔진 (CrispEngine): - AudioProcessing: ProcessingMode(Off/Noise/Voice/Clean+Enhance), AudioProcessingConfig, AudioProcessor 프로토콜 (PRD §4.5 교체 가능 stage) - Biquad: RBJ cookbook IIR + EnvelopeFollower (할당 없는 per-sample) - VoiceEnhancer: HPF→톤EQ(3종)→컴프레서→디에서→−1dBFS 리미터. zero-latency, 생성형 아님(화자 보존). wetMix=0이면 bit-identical 패스스루 - PipelineProcessor: denoise→enhance stage 그래프. 모드는 라우팅이 아니라 파라미터(atten·dry/wet 램프)만 바꿔 클릭 없이 전환 (PRD §4.1/§8.4) - Loudness: BS.1770 K-weighting + 게이팅 LUFS 측정/정규화 + peak 천장 - FileEnhancer.enhance(options:): 파일 HQ 파이프라인 (Fast/HQ, 톤, 미리듣기 N초, LUFS −16/−18, before/after 리포트) 앱: - AppState에 mode/enhanceStrength/tonePreset 추가 + 영속화. 파라미터 변경은 엔진 재시작 없이 update()로 라이브 적용 - LiveAudioEngine을 PipelineProcessor로 전환 - MenuBar/Settings: 처리 모드 + 인핸스 강도 + 톤 picker - File 탭: 모드/품질/톤/음량정규화/미리듣기 + 결과 리포트 - Diagnostics: 모드/강도/톤/추정 지연 표기 검증: - vetool(신규 CLI): 패스스루 bit-identical, aligned==chunked 결정성, 리미터 안전, RTF enhancer 0.012 / clean+enhance 0.11 - filetool enhance: 4모드 파일 처리, 48k mono wav/m4a, 길이 정확 보존 - 신규 서드파티/모델 weight 0개 (라이선스 BOM 변화 없음) docs: voice-enhancer.md(신규), architecture/STATUS/test-report/README/LICENSES 갱신 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PRD §8.1 테스트셋, §8.2 자동 평가, §9.4 batch 스크립트+CSV 납품물을 구현한다. 다운로드 없이 레포의 실제 음성·노이즈 자산에서 재현 가능. 엔진: - AudioIO: 48k mono WAV 읽기/쓰기 (AVFoundation, ffmpeg-free) - Metrics: RMS/peak dB, SI-SDR, 지연 보정 SI-SDR(cross-correlation 정렬), non-finite 검출 - TestCorpus: 실제 음성+노이즈에서 코퍼스 생성 — noisy(SNR 0/5/10/20), reverb(Schroeder), clipping, bandlimit(8/16k), 저음량, hum - Biquad: setLowpass 추가 (코퍼스 대역 제한용) CLI: - maketestset: 코퍼스 생성 → test/corpus/*.wav + manifest.csv - evaltool: 파일 메트릭(LUFS/peak/RMS/SI-SDR) CSV 출력 테스트 (swift test, 26개 0 실패): - BiquadTests/VoiceEnhancerTests/LoudnessTests/MetricsTests/PipelineTests - IntegrationTests: 코퍼스 생성 유효성 + 파일 end-to-end(모델). 자산/모델 없으면 XCTSkip 스크립트: - scripts/quality-eval.sh: 코퍼스 × 모드 처리 → quality-report.csv. non-finite·클리핑 시 exit 1 (회귀 게이트). 측정: noisy_snr0 denoise SI-SDR 0→+4.89dB, 모든 출력 −1dBFS 천장 준수 test/corpus 산출물(wav/manifest/report)은 gitignore (재현 가능), README만 추적. docs: voice-enhancer/test-report/README에 테스트셋·평가 결과 반영. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

레벨미터: - 출력 미터가 suppressor의 hop carry 프레임(processed.isEmpty)마다 입력 레벨을 그대로 미러링하고, 매 ~10ms 창의 RMS를 그대로 써서 깜빡이던 문제 수정. - peak-hold-with-decay 엔벨로프로 변경(빠른 상승·완만한 하강). carry 프레임에는 출력 엔벨로프만 감쇠하고 입력을 따라가지 않음. 크래시(SIGABRT): - 44.1kHz 내장 마이크에서 installTap(format:)에 명시 포맷을 넘기면 AVFAudio가 "format mismatch" NSException을 던져(Swift에서 catch 불가) 앱이 즉시 종료. isEnabled가 영속화돼 실행 즉시 엔진이 시작되며 재현됨. - installTap(format: nil)로 노드 자체 포맷 사용 + 리샘플러를 실제 버퍼 포맷에서 지연 생성(포맷 변경 시 재생성)하도록 수정. 임의 하드웨어 레이트에 견고. 문서/평가: - docs/enhance-behavior.md(신규): 인핸서가 실제로 들리게 하는 변화와 한계를 측정값(LUFS/peak/crest/저역·고역)과 함께 요약. BWE·디클리핑·dereverb 불가 명시. - Metrics.spectralTilt + evaltool에 crest/저역/고역 출력 추가(톤 특성 측정). evaltool CSV 컬럼 변경에 맞춰 quality-eval.sh 인덱스 갱신. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

deep-research(다중소스 웹검색 + 적대적 검증, 110 에이전트/27소스/131주장) 결과를 우리 코드베이스 기준으로 합성. enhance-limitations-research.md 신규. 한계별(BWE/디클립/dereverb/생성형 통합/적응형/true-peak/SI-SDR 상한) SOTA·모델· 라이선스·온디바이스 실현성·통합 권고. 핵심 결론: - 실시간 즉시 가능: true-peak 오버샘플 리미터(자체), AP-BWE(MIT, CPU 고속) PoC - 파일 HQ: Resemble Enhance / ClearerVoice-Studio 도입(가중치 라이선스 검증 선행) - 생성형 통합 복원(FINALLY/AnyEnhance/Miipher-2)은 대부분 파일 전용 + 라이선스 제약 - 실시간 생성형(Stream.FM, 32ms)은 연구 최전선 → watch - RE-USE(NSCLv1)·NISQA 가중치·AnyEnhance 비상용 → 출시 제외/내부용 검증 12건(3-0), 일부 항목은 세션 한도로 verify 미완(⚠️ 표기, 출시 전 재확인). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

(a) true-peak (한계 #6 해소): - TruePeak.swift — ITU-R BS.1770 4× 오버샘플 inter-sample peak(윈도우드 sinc fractional-delay, DC 정규화, 엣지 truncation 가드). Loudness.normalize가 sample-peak 대신 true-peak로 −1 dBTP 천장 적용. FileEnhanceReport에 truePeak 추가. evaltool에 true-peak 컬럼. 파일 HQ 출력 −1.0 dBTP 검증. (b) 리서치 ⚠️ 항목 1차 소스 직접 검증: - AP-BWE: 코드+가중치 모두 MIT, CPU 18.1× 실시간, 8/12/16/24k→48k (확정) - ClearerVoice-Studio: 코드 Apache-2.0, 16k→48k SR (확정) - AnyEnhance: 재현코드 MIT지만 공개 가중치 무라이선스 → 배포불가(레퍼런스) enhance-limitations-research.md 갱신. (c) 파일 HQ 외부 모델 통합 PoC: - ExternalEnhancer.swift — {in}/{out} subprocess worker(PyTorch 모델용, PRD §4.4). FileEnhancer.enhanceExternal: decode→외부모델→HQ post-DSP(LUFS+true-peak)→write, temp 삭제. filetool external 서브커맨드. cp 패스스루로 plumbing 테스트. - scripts/hq-model-poc.sh — 코퍼스 × {dsp,외부모델} 벤치 → CSV. 미설치 시 graceful skip. --print-setup으로 Resemble/ClearerVoice 설치·래퍼 안내. docs/hq-model-integration.md. 테스트 32개 0 실패(+TruePeak 4, +External 2). 앱 번들 빌드 정상. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rtzr-jelly and others added 6 commits June 23, 2026 18:26

feat(file): expose optional ClearerVoice HQ model

ea82dbe

rtzr-jelly changed the title ~~feat: Voice Enhancer 파이프라인 추가 (PRD v0.2)~~ [codex] expose optional ClearerVoice HQ model Jun 26, 2026

rtzr-jelly marked this pull request as draft June 26, 2026 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] expose optional ClearerVoice HQ model#1

[codex] expose optional ClearerVoice HQ model#1
rtzr-jelly wants to merge 6 commits into
mainfrom
feat/voice-enhancer-pipeline

rtzr-jelly commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rtzr-jelly commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rtzr-jelly commented Jun 23, 2026 •

edited

Loading