Logging Documentation

Overview

Image2Code uses loguru for structured, production-ready logging. The logging system is centralized, configurable, and designed to track the entire code extraction pipeline for debugging and production monitoring.

Key Features:

Structured logging with timestamps, log levels, module names
File-based logging with automatic rotation
Console output in development/debug modes
JSON output for production (machine-parseable)
Configuration via YAML file + environment variable overrides
No business logic changes - purely observational logging

Quick Start

1. Basic Setup

Logging is automatically initialized when you run the application:

python main.py Images/

Logs will be written to logs/app.log

2. Viewing Logs

# Real-time tail
tail -f logs/app.log

# View today's log
cat logs/app.log

# Search for errors
grep ERROR logs/app.log

3. Changing Log Level

Option A: Environment Variable (Temporary)

LOG_LEVEL=DEBUG python main.py Images/

Option B: Config File (Permanent)

Edit logging_config.yaml:

log_level: DEBUG

4. Log Format

Option A: Human-Readable (Development)

LOG_FORMAT=plain python main.py Images/

Output:

2026-02-24 10:34:56 | INFO     | core.pipeline:run:103 - Starting extraction | image_count: 3
2026-02-24 10:34:57 | DEBUG    | core.extractor:_extract_llm:181 - LLM extraction starting | model: qwen3-vl:235b-cloud | image: Images/code1.png

Option B: Machine-Readable JSON (Production)

LOG_FORMAT=json python main.py Images/

Output:

{
  "timestamp": "2026-02-24T10:34:56.789123Z",
  "level": "INFO",
  "module": "core.pipeline",
  "function": "run",
  "line": 103,
  "message": "Starting extraction | image_count: 3"
}

Configuration

Configuration File: `logging_config.yaml`

Located at workspace root, controls logging behavior:

version: 1
environment: dev              # dev, prod, debug

log_directory: logs           # Where to store logs
log_level: DEBUG              # DEBUG, INFO, WARNING, ERROR
log_format: plain             # plain or json

rotation:
  type: daily                 # daily or size
  retention: all              # all or "7 days"

Environment Variables (Override Config File)

All must be prefixed with uppercase:

Variable	Values	Default	Purpose
`LOG_LEVEL`	DEBUG, INFO, WARNING, ERROR, CRITICAL	INFO	Minimum log level
`LOG_FORMAT`	plain, json	plain	Output format
`LOG_DIRECTORY`	any path	logs/	Log file location
`ENVIRONMENT`	dev, prod, debug	dev	Environment type

Priority: Environment Variables > YAML Config > Defaults

Example Configurations

Development (verbose, local disk)

LOG_LEVEL=DEBUG LOG_FORMAT=plain ENVIRONMENT=dev python main.py Images/

Production (info level, JSON format)

LOG_LEVEL=INFO LOG_FORMAT=json ENVIRONMENT=prod python main.py Images/

Troubleshooting (max verbosity)

LOG_LEVEL=DEBUG LOG_FORMAT=plain ENVIRONMENT=debug python main.py Images/

Log Levels

Level	Usage	Example
`DEBUG`	Detailed info for debugging	Image encoding size, LLM response details, overlap detection
`INFO`	General workflow events	Pipeline start/complete, extraction success, blocks created
`WARNING`	Potential issues, fallbacks	JSON parsing fallback, overlap strategy fallback
`ERROR`	Errors that should be fixed	LLM API failure, invalid image path, parsing errors
`CRITICAL`	System-level failures	Unrecoverable errors (rare)

What Gets Logged

Core Pipeline (core/pipeline.py)

✓ Pipeline start/complete
✓ Each stage start/complete (extraction, ordering, reconstruction)
✓ Block counts at each stage
✓ Timing information (stages)

Image Extraction (core/extractor.py)

✓ Extraction start/complete
✓ Per-image LLM call start/complete
✓ JSON parsing success/failure
✓ Fallback to raw text (if needed)
✓ Image metadata (format, language, cells)
✓ Failed images count

Block Ordering (core/ordering.py)

✓ Ordering start/complete
✓ Strategy used (line_numbers > timestamp > overlap > fallback)
✓ Blocks successfully ordered

Block Grouping (core/grouping.py)

✓ Grouping start/complete
✓ Groups created count
✓ Per-group details (blocks, total lines)

Code Reconstruction (core/reconstruction.py)

✓ Reconstruction type (notebook vs code)
✓ Block processing (per-block details)
✓ Overlap detection and merge count
✓ Final code statistics (line count)

Example Log Flow

Here's a complete run logged from start to finish:

2026-02-24 10:34:56 | INFO  | __main__:main:50 - ============================================================
2026-02-24 10:34:56 | INFO  | __main__:main:51 - Code Extraction Pipeline Started
2026-02-24 10:34:56 | INFO  | __main__:main:52 - Arguments: ['Images/']

2026-02-24 10:34:56 | INFO  | __main__:main:62 - Found 3 image(s) to process

2026-02-24 10:34:56 | INFO  | core.pipeline:run:107 - Pipeline started | image_paths: 3

2026-02-24 10:34:56 | INFO  | core.pipeline:extract_blocks:28 - Starting extraction | image_count: 3
2026-02-24 10:34:56 | INFO  | core.extractor:extract_from_images:231 - Starting extraction from images | count: 3
2026-02-24 10:34:56 | DEBUG | core.extractor:extract_from_images:234 - Extracting image 1/3 | path: Images/code1.png
2026-02-24 10:34:56 | INFO  | core.extractor:_extract_llm:177 - Calling LLM model: qwen3-vl:235b-cloud
2026-02-24 10:34:58 | DEBUG | core.extractor:_extract_llm:185 - LLM response received | response_size: 1850
2026-02-24 10:34:58 | DEBUG | core.extractor:extract_from_image:206 - JSON parsed successfully | format: python | cells: 1
2026-02-24 10:34:58 | INFO  | core.extractor:extract_from_image:223 - Block extracted | format: python | lines: 45

... [similar for images 2 and 3] ...

2026-02-24 10:35:02 | INFO  | core.extractor:extract_from_images:245 - Extraction complete | success: 3 | failed: 0

2026-02-24 10:35:02 | INFO  | core.pipeline:order_blocks:37 - Starting ordering | block_count: 3
2026-02-24 10:35:02 | INFO  | core.ordering:order:25 - Starting block ordering | block_count: 3
2026-02-24 10:35:02 | INFO  | core.ordering:order:31 - Ordering strategy: LINE_NUMBERS | blocks: 3
2026-02-24 10:35:02 | INFO  | core.pipeline:order_blocks:39 - Ordering complete | blocks_ordered: 3

2026-02-24 10:35:02 | INFO  | core.pipeline:reconstruct:43 - Starting reconstruction | block_count: 3 | format: python
2026-02-24 10:35:02 | INFO  | core.reconstruction:reconstruct:31 - Reconstruction type: CODE
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:57 - Processing code block 1/3 | lines: 45
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:68 - First block appended | total_lines: 45
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:73 - Processing code block 2/3 | lines: 50
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:78 - Overlap detected | block: 2 | overlap_lines: 3
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:81 - Block merged | total_lines: 92
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:73 - Processing code block 3/3 | lines: 55
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:78 - Overlap detected | block: 3 | overlap_lines: 2
2026-02-24 10:35:02 | DEBUG | core.reconstruction:_reconstruct_code:81 - Block merged | total_lines: 145
2026-02-24 10:35:02 | INFO  | core.reconstruction:_reconstruct_code:84 - Code reconstruction complete | final_lines: 145 | filename: script.py

2026-02-24 10:35:02 | INFO  | core.pipeline:reconstruct:44 - Reconstruction complete | status: success

2026-02-24 10:35:02 | INFO  | __main__:save_output:38 - Output saved successfully | path: outputs/script.py

2026-02-24 10:35:02 | INFO  | __main__:main:77 - Reconstruction complete | saved to: outputs/script.py

Analyzing Logs

Find Errors Only

grep ERROR logs/app.log

Find Specific Module

grep core.extractor logs/app.log

Find Specific Image

grep "code1.png" logs/app.log

Find LLM Calls

grep "Calling LLM" logs/app.log

Find Ordering Strategy Used

grep "Ordering strategy:" logs/app.log

Find Overlaps Detected

grep "Overlap detected" logs/app.log

Count Failures

grep ERROR logs/app.log | wc -l

Performance: Find slowest image

grep "LLM response received" logs/app.log

Production Deployment

Setup for Production

Use JSON format for log aggregation:

# Update .env or environment:
export LOG_FORMAT=json
export LOG_LEVEL=INFO
export ENVIRONMENT=prod

Store logs in persistent location:

# Configure in logging_config.yaml:
log_directory: /var/log/image2code/

Setup log rotation:

rotation:
  type: daily        # or "size" with size parameter
  retention: "30 days"  # or "all"

Redirect to external logging service (future enhancement):
- JSON logs can be easily piped to ELK Stack, Datadog, etc.
- Use: tail -f logs/app.log | send-to-logging-service

Monitoring in Production

Watch for:

ERROR logs → immediate attention needed
WARNING logs → potential issues to investigate
Extraction failures → check LLM availability
High overlap counts → possible duplicate detection issues

Troubleshooting

Logs not appearing

Check log directory exists:
```
ls -la logs/
```
Check file permissions:
```
chmod 755 logs/
```
Verify environment variables:
```
echo $LOG_LEVEL
echo $LOG_FORMAT
```
Check configuration file:
```
cat logging_config.yaml
```

Too much output

Change log level:

LOG_LEVEL=WARNING python main.py Images/

JSON format too verbose

Use plain format for development:

LOG_FORMAT=plain python main.py Images/

Console not showing logs

Ensure ENVIRONMENT is dev/debug, not prod:

ENVIRONMENT=dev python main.py Images/

For Developers

Add Logging to New Code

Import logger:

from core.logger import get_logger
logger = get_logger(__name__)

Use appropriate level:

logger.debug("Variable values", extra={"var1": value1})
logger.info("Stage complete", extra={"stage": "extraction"})
logger.warning("Fallback used", extra={"reason": "missing data"})
logger.error("LLM failed", extra={"error": str(e)})

Add context with extra dict:

logger.info(
    "Processing image",
    extra={
        "image": image_path,
        "format": detected_format,
        "cells": cell_count
    }
)

Testing Logging

# test_logging.py
import os
os.environ['LOG_LEVEL'] = 'DEBUG'

from core.logger import initialize_logger, get_logger
initialize_logger()

logger = get_logger("test")
logger.debug("Test debug message")
logger.info("Test info message")
logger.warning("Test warning message")
logger.error("Test error message")

Run: python test_logging.py and check logs/app.log

Summary

Easy to use: from core.logger import get_logger and log away
Zero logic changes: Pure observation, no business logic modified
Flexible configuration: YAML + environment variables
Production ready: JSON output, automatic rotation
Comprehensive: Every stage of pipeline tracked
Debuggable: Detailed logs for troubleshooting multi-image processing

Need Help?

Check logs: cat logs/app.log
View recent errors: tail -20 logs/app.log
Search logs: grep -i "error\|warning" logs/app.log

FilesExpand file tree

LOGGING.md

Latest commit

History