Image2Code uses loguru for structured, production-ready logging. The logging system is centralized, configurable, and designed to track the entire code extraction pipeline for debugging and production monitoring.
Key Features:
- Structured logging with timestamps, log levels, module names
- File-based logging with automatic rotation
- Console output in development/debug modes
- JSON output for production (machine-parseable)
- Configuration via YAML file + environment variable overrides
- No business logic changes - purely observational logging
Logging is automatically initialized when you run the application:
python main.py Images/Logs will be written to logs/app.log
# Real-time tail
tail -f logs/app.log
# View today's log
cat logs/app.log
# Search for errors
grep ERROR logs/app.logLOG_LEVEL=DEBUG python main.py Images/Edit logging_config.yaml:
log_level: DEBUGLOG_FORMAT=plain python main.py Images/Output:
2026-02-24 10:34:56 | INFO | core.pipeline:run:103 - Starting extraction | image_count: 3
2026-02-24 10:34:57 | DEBUG | core.extractor:_extract_llm:181 - LLM extraction starting | model: qwen3-vl:235b-cloud | image: Images/code1.png
LOG_FORMAT=json python main.py Images/Output:
{
"timestamp": "2026-02-24T10:34:56.789123Z",
"level": "INFO",
"module": "core.pipeline",
"function": "run",
"line": 103,
"message": "Starting extraction | image_count: 3"
}Located at workspace root, controls logging behavior:
version: 1
environment: dev # dev, prod, debug
log_directory: logs # Where to store logs
log_level: DEBUG # DEBUG, INFO, WARNING, ERROR
log_format: plain # plain or json
rotation:
type: daily # daily or size
retention: all # all or "7 days"All must be prefixed with uppercase:
| Variable | Values | Default | Purpose |
|---|---|---|---|
LOG_LEVEL |
DEBUG, INFO, WARNING, ERROR, CRITICAL | INFO | Minimum log level |
LOG_FORMAT |
plain, json | plain | Output format |
LOG_DIRECTORY |
any path | logs/ | Log file location |
ENVIRONMENT |
dev, prod, debug | dev | Environment type |
Priority: Environment Variables > YAML Config > Defaults
LOG_LEVEL=DEBUG LOG_FORMAT=plain ENVIRONMENT=dev python main.py Images/LOG_LEVEL=INFO LOG_FORMAT=json ENVIRONMENT=prod python main.py Images/LOG_LEVEL=DEBUG LOG_FORMAT=plain ENVIRONMENT=debug python main.py Images/| Level | Usage | Example |
|---|---|---|
DEBUG |
Detailed info for debugging | Image encoding size, LLM response details, overlap detection |
INFO |
General workflow events | Pipeline start/complete, extraction success, blocks created |
WARNING |
Potential issues, fallbacks | JSON parsing fallback, overlap strategy fallback |
ERROR |
Errors that should be fixed | LLM API failure, invalid image path, parsing errors |
CRITICAL |
System-level failures | Unrecoverable errors (rare) |
✓ Pipeline start/complete
✓ Each stage start/complete (extraction, ordering, reconstruction)
✓ Block counts at each stage
✓ Timing information (stages)
✓ Extraction start/complete
✓ Per-image LLM call start/complete
✓ JSON parsing success/failure
✓ Fallback to raw text (if needed)
✓ Image metadata (format, language, cells)
✓ Failed images count
✓ Ordering start/complete
✓ Strategy used (line_numbers > timestamp > overlap > fallback)
✓ Blocks successfully ordered
✓ Grouping start/complete
✓ Groups created count
✓ Per-group details (blocks, total lines)
✓ Reconstruction type (notebook vs code)
✓ Block processing (per-block details)
✓ Overlap detection and merge count
✓ Final code statistics (line count)
Here's a complete run logged from start to finish:
2026-02-24 10:34:56 | INFO | __main__:main:50 - ============================================================
2026-02-24 10:34:56 | INFO | __main__:main:51 - Code Extraction Pipeline Started
2026-02-24 10:34:56 | INFO | __main__:main:52 - Arguments: ['Images/']
2026-02-24 10:34:56 | INFO | __main__:main:62 - Found 3 image(s) to process
2026-02-24 10:34:56 | INFO | core.pipeline:run:107 - Pipeline started | image_paths: 3
2026-02-24 10:34:56 | INFO | core.pipeline:extract_blocks:28 - Starting extraction | image_count: 3
2026-02-24 10:34:56 | INFO | core.extractor:extract_from_images:231 - Starting extraction from images | count: 3
2026-02-24 10:34:56 | DEBUG | core.extractor:extract_from_images:234 - Extracting image 1/3 | path: Images/code1.png
2026-02-24 10:34:56 | INFO | core.extractor:_extract_llm:177 - Calling LLM model: qwen3-vl:235b-cloud
2026-02-24 10:34:58 | DEBUG | core.extractor:_extract_llm:185 - LLM response received | response_size: 1850
2026-02-24 10:34:58 | DEBUG | core.extractor:extract_from_image:206 - JSON parsed successfully | format: python | cells: 1
2026-02-24 10:34:58 | INFO | core.extractor:extract_from_image:223 - Block extracted | format: python | lines: 45
... [similar for images 2 and 3] ...
2026-02-24 10:35:02 | INFO | core.extractor:extract_from_images:245 - Extraction complete | success: 3 | failed: 0
2026-02-24 10:35:02 | INFO | core.pipeline:order_blocks:37 - Starting ordering | block_count: 3
2026-02-24 10:35:02 | INFO | core.ordering:order:25 - Starting block ordering | block_count: 3
2026-02-24 10:35:02 | INFO | core.ordering:order:31 - Ordering strategy: LINE_NUMBERS | blocks: 3
2026-02-24 10:35:02 | INFO | core.pipeline:order_blocks:39 - Ordering complete | blocks_ordered: 3
2026-02-24 10:35:02 | INFO | core.pipeline:reconstruct:43 - Starting reconstruction | block_count: 3 | format: python
2026-02-24 10:35:02 | INFO | core.reconstruction:reconstruct:31 - Reconstruction type: CODE
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:57 - Processing code block 1/3 | lines: 45
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:68 - First block appended | total_lines: 45
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:73 - Processing code block 2/3 | lines: 50
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:78 - Overlap detected | block: 2 | overlap_lines: 3
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:81 - Block merged | total_lines: 92
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:73 - Processing code block 3/3 | lines: 55
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:78 - Overlap detected | block: 3 | overlap_lines: 2
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:81 - Block merged | total_lines: 145
2026-02-24 10:35:02 | INFO | core.reconstruction:_reconstruct_code:84 - Code reconstruction complete | final_lines: 145 | filename: script.py
2026-02-24 10:35:02 | INFO | core.pipeline:reconstruct:44 - Reconstruction complete | status: success
2026-02-24 10:35:02 | INFO | __main__:save_output:38 - Output saved successfully | path: outputs/script.py
2026-02-24 10:35:02 | INFO | __main__:main:77 - Reconstruction complete | saved to: outputs/script.py
grep ERROR logs/app.loggrep core.extractor logs/app.loggrep "code1.png" logs/app.loggrep "Calling LLM" logs/app.loggrep "Ordering strategy:" logs/app.loggrep "Overlap detected" logs/app.loggrep ERROR logs/app.log | wc -lgrep "LLM response received" logs/app.log-
Use JSON format for log aggregation:
# Update .env or environment: export LOG_FORMAT=json export LOG_LEVEL=INFO export ENVIRONMENT=prod
-
Store logs in persistent location:
# Configure in logging_config.yaml: log_directory: /var/log/image2code/ -
Setup log rotation:
rotation: type: daily # or "size" with size parameter retention: "30 days" # or "all"
-
Redirect to external logging service (future enhancement):
- JSON logs can be easily piped to ELK Stack, Datadog, etc.
- Use:
tail -f logs/app.log | send-to-logging-service
Watch for:
- ERROR logs → immediate attention needed
- WARNING logs → potential issues to investigate
- Extraction failures → check LLM availability
- High overlap counts → possible duplicate detection issues
-
Check log directory exists:
ls -la logs/
-
Check file permissions:
chmod 755 logs/
-
Verify environment variables:
echo $LOG_LEVEL echo $LOG_FORMAT
-
Check configuration file:
cat logging_config.yaml
Change log level:
LOG_LEVEL=WARNING python main.py Images/Use plain format for development:
LOG_FORMAT=plain python main.py Images/Ensure ENVIRONMENT is dev/debug, not prod:
ENVIRONMENT=dev python main.py Images/-
Import logger:
from core.logger import get_logger logger = get_logger(__name__)
-
Use appropriate level:
logger.debug("Variable values", extra={"var1": value1}) logger.info("Stage complete", extra={"stage": "extraction"}) logger.warning("Fallback used", extra={"reason": "missing data"}) logger.error("LLM failed", extra={"error": str(e)})
-
Add context with extra dict:
logger.info( "Processing image", extra={ "image": image_path, "format": detected_format, "cells": cell_count } )
# test_logging.py
import os
os.environ['LOG_LEVEL'] = 'DEBUG'
from core.logger import initialize_logger, get_logger
initialize_logger()
logger = get_logger("test")
logger.debug("Test debug message")
logger.info("Test info message")
logger.warning("Test warning message")
logger.error("Test error message")Run: python test_logging.py and check logs/app.log
- Easy to use:
from core.logger import get_loggerand log away - Zero logic changes: Pure observation, no business logic modified
- Flexible configuration: YAML + environment variables
- Production ready: JSON output, automatic rotation
- Comprehensive: Every stage of pipeline tracked
- Debuggable: Detailed logs for troubleshooting multi-image processing
Need Help?
- Check logs:
cat logs/app.log - View recent errors:
tail -20 logs/app.log - Search logs:
grep -i "error\|warning" logs/app.log