🎙️ Grammar Scoring Engine from Voice Samples

An end-to-end AI system that evaluates spoken English grammar by converting audio to text, detecting grammatical errors using NLP, generating a normalized score, and mapping it to CEFR proficiency levels.

🔗 Live Demo

👉 Try it live →

Upload any English .wav audio file and get instant grammar feedback!

🎯 Problem Statement

Assessing spoken grammar is critical for:

Language learning platforms (Duolingo, IELTS prep)
Hiring tools (spoken English screening)
Educational assessments (CEFR-based evaluation)

This project builds a lightweight, cloud-deployable grammar evaluation pipeline that works on real-world audio input — no Java, no heavy dependencies.

🧠 System Architecture

Audio Input (.wav)
        │
        ▼
Speech-to-Text (Google Speech API)
        │
        ▼
Grammar Error Detection (LanguageTool NLP API)
        │
        ▼
Score Normalization (0–100)
        │
        ▼
CEFR Level Mapping (A1 → C2)
        │
        ▼
Streamlit Web Interface

⚙️ Tech Stack

Component	Technology
Language	Python 3.10+
Speech-to-Text	SpeechRecognition (Google Speech API)
Grammar Analysis	LanguageTool Public API (cloud-safe, Java-free)
UI & Deployment	Streamlit + Streamlit Cloud
Audio Samples	WAV format input

📊 Grammar Scoring Methodology

Spoken audio is transcribed into text using Google Speech API
Grammar errors are detected using rule-based NLP (LanguageTool)
Error density is computed against total word count
A normalized score (0–100) is generated
Score is mapped to CEFR proficiency level

🎯 CEFR Score Mapping

Score Range	CEFR Level	Description
90 – 100	C1 / C2	Advanced / Mastery
75 – 89	B2	Upper Intermediate
60 – 74	B1	Intermediate
40 – 59	A2	Elementary
< 40	A1	Beginner

✨ Features

🎤 Upload any English .wav audio file
📝 Real-time speech-to-text transcription
🔍 Automatic grammar error detection
📊 Normalized grammar score (0–100)
🏆 CEFR proficiency level estimation
✅ Highlighted error details
☁️ Fully cloud-deployed — works in browser, no setup needed

🚀 Run Locally

Step 1 — Clone the repo

git clone https://github.com/M1325-source/grammar-scoring-engine.git
cd grammar-scoring-engine

Step 2 — Install dependencies

pip install -r requirements.txt

Step 3 — Run the app

streamlit run app.py

Open http://localhost:8501 in your browser.

📁 Project Structure

grammar-scoring-engine/
│
├── app.py                  ← Streamlit web app (main entry)
├── train.py                ← Model training / scoring logic
├── requirements.txt        ← Python dependencies
├── audio_samples/          ← Sample .wav files for testing
└── README.md

🔍 Key Design Decisions

Avoided Java-based LanguageTool — used REST API instead for Streamlit Cloud compatibility
Dataset-agnostic pipeline — works on any real-world English audio
Modular architecture — each stage (STT, NLP, scoring) is independent
Cloud-first approach — no local server or runtime dependencies needed

🔮 Future Enhancements

ML-based grammar scoring model (regression on CEFR labels)
Pronunciation & fluency analysis
Multi-language support
Confidence & coherence metrics
Real-time microphone recording
Batch audio processing

👩‍💻 Author

Manisha Priya AI/ML Engineer | NLP Enthusiast

🐙 GitHub: @M1325-source
📧 Email: manishapriya1325@gmail.com

Built as part of an AI research assessment — demonstrates real-world NLP pipeline design, speech processing, and cloud deployment best practices.

⭐ If you found this useful, please star the repo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Grammar Scoring Engine from Voice Samples

🔗 Live Demo

🎯 Problem Statement

🧠 System Architecture

⚙️ Tech Stack

📊 Grammar Scoring Methodology

🎯 CEFR Score Mapping

✨ Features

🚀 Run Locally

Step 1 — Clone the repo

Step 2 — Install dependencies

Step 3 — Run the app

📁 Project Structure

🔍 Key Design Decisions

🔮 Future Enhancements

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
audio_samples		audio_samples
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ Grammar Scoring Engine from Voice Samples

🔗 Live Demo

🎯 Problem Statement

🧠 System Architecture

⚙️ Tech Stack

📊 Grammar Scoring Methodology

🎯 CEFR Score Mapping

✨ Features

🚀 Run Locally

Step 1 — Clone the repo

Step 2 — Install dependencies

Step 3 — Run the app

📁 Project Structure

🔍 Key Design Decisions

🔮 Future Enhancements

👩‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages