ServiciosNLP is an educational backend platform built with Python and FastAPI. It provides NLP-based services for humanities researchers — including word frequency analysis, sentence classification, and document clustering — through a clean, documented REST API. The project grows incrementally, starting with basic text processing and progressively integrating LLM-powered features via the OpenAI API.
- REST API built with FastAPI
- Interactive API documentation via Swagger UI
- Word frequency analysis with Spanish stopword filtering and text normalization
- Sentence classification powered by OpenAI GPT using dynamic system/user prompts
- User-defined classification categories with few-shot examples
- Results exported as downloadable CSV and Excel (.xlsx) files
- Asynchronous task handling using FastAPI's BackgroundTasks
- Fully containerized with Docker
- Version-controlled with Git & GitHub using feature branches and pull requests
- React + TypeScript frontend with Vite and Tailwind CSS
When a request arrives, the API responds immediately with a unique request_id while the export runs in the background. The client polls /status/{request_id} and downloads the result from /results/{request_id} when ready. This simulates a real-world async job queue using FastAPI's dependency injection system.
The classification service builds prompts dynamically at runtime. The system prompt defines the model's role and injects the user-defined categories. The user prompt provides the sentences and enforces a strict JSON response format. This separation follows best practices for structured LLM output and enables flexible, user-driven classification without redeploying the service.
- Building modular REST APIs with FastAPI routers
- Input validation and schema definition with Pydantic
- Asynchronous background task execution with BackgroundTasks
- Dependency injection in FastAPI
- Text normalization and Spanish stopword filtering with regex
- Dynamic prompt construction for LLM APIs (system/user prompt separation)
- Structured JSON output enforcement with OpenAI
- CSV and Excel export from Python data structures
- Containerization with Docker and hot reload in development
- Environment variable management with
.envfiles - Git branching workflow with feature branches and pull requests
- React component architecture with single responsibility principle
- TypeScript interfaces and types for frontend data modeling
- Responsive two-column layout with Tailwind CSS
- Progressive disclosure UI pattern
- Async job polling from the frontend
ServiciosNLP/
├── app/
│ ├── main.py → FastAPI entry point, router registration
│ ├── api/
│ │ ├── health.py → Health check endpoint
│ │ ├── word_count.py → Word count endpoints (POST, GET status, GET results)
│ │ └── sentence_classification.py → Sentence classification endpoints
│ └── core/
│ ├── word_count.py → NLP logic: normalization, stopword filtering, counting
│ ├── sentence_classification.py → LLM prompt construction and classification logic
│ └── export.py → CSV and Excel export utilities
├── frontend/
│ ├── src/
│ │ ├── types/ → TypeScript type definitions
│ │ ├── services/ → API communication layer
│ │ ├── components/ → Reusable UI components
│ │ └── App.tsx → Root component
│ ├── package.json → Frontend dependencies
│ └── vite.config.ts → Vite and Tailwind configuration
├── .env → Environment variables (not committed)
├── .env_example → Environment variables template
├── Dockerfile → Container configuration
└── requirements.txt → Python dependencies- Python 3.12+
- Docker (optional, recommended)
- An OpenAI account with a valid API key (required for sentence classification)
- A
.envfile in the project root (see below)
This project requires a .env file at the root of the project to configure the OpenAI API key. This file is excluded from version control via .gitignore to protect sensitive credentials.
Never commit your
.envfile to version control. It contains secret keys that must remain private.
In the project root, create a file named .env with the following content:
OPENAI_API_KEY=your-api-key-here
Refer to .env_example (included in the repository) for the expected format.
- Create an account at platform.openai.com
- Navigate to API Keys and generate a new key
- Paste the key into your
.envfile
The sentence classification service will not function without a valid API key. All other services (word count, health check) work without it.
pip install -r requirements.txt
uvicorn app.main:app --reloadOpen in your browser:
- App:
http://127.0.0.1:8000 - API docs:
http://127.0.0.1:8000/docs
docker build -t nlpservices .
docker run --rm -p 8000:80 -v ./app:/code/app -v ~/nltk_data:/root/nltk_data --env-file .env -it nlpservicesdocker run --rm -p 8000:80 -v ~/nltk_data:/root/nltk_data --env-file .env nlpservicesThe
--env-file .envflag injects your environment variables into the container at runtime, so secrets never get baked into the Docker image.
This project is fully containerized. Clone the repository on any machine with Docker installed, add your .env file, build the image, and run the container exposing the desired port.
- Project structure with FastAPI routers
- Health check endpoint
- Word count service with CSV export and async processing
- Sentence classification with OpenAI and Excel export
- Docker volume for NLTK punkt_tab model persistence
- CORS configuration
- React frontend — text input and service selector
- React frontend — analysis history with job status polling
- React frontend — persistent history via backend endpoint
- Named Entity Recognition (NER)
- Document clustering
- Paragraph clustering
- File upload support (TXT, PDF)
This project is licensed under the GNU Affero General Public License v3.0.