Model Selection Guide¶

Quick Start - Just Tell Me What To Use!¶

Best Overall Choice¶

Recommendation: OpenAI GPT-4o-mini

# Set in your .env file
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=your-api-key-here

Why: Perfect accuracy, fast processing (10.85s), complete metadata extraction.

Best Local/Offline Choice¶

Recommendation: Gemma2:9B

# Set in your .env file
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma2:9b
OLLAMA_BASE_URL=http://localhost:11434

Why: Fastest local model (6.65s), excellent quality, privacy-first.

Most Cost-Effective¶

Recommendation: Self-hosted Gemma2:9B

Zero marginal cost after setup
Excellent performance for unlimited processing
Full privacy and control

Decision Tree¶

flowchart TD
    A[Need LLM for Bank Statement Processing?] --> B{Privacy Required?}
    B -->|Yes| C{Speed Priority?}
    B -->|No| D{Budget Conscious?}

    C -->|Yes| E[Gemma2:9B
6.65s]
    C -->|Accuracy| F[Mistral:Instruct
Perfect Segmentation]

    D -->|Yes| G[OpenAI GPT-4o-mini
Best $/accuracy]
    D -->|Performance| G

    E --> H[Ollama Setup Required]
    F --> H
    G --> I[API Key Required]

    H --> J[Success!]
    I --> J

Detailed Selection Criteria¶

1. Accuracy Requirements¶

Maximum Accuracy Needed¶

Primary: OpenAI GPT-4o-mini
Backup: Mistral:Instruct (local)
Use Case: Financial institutions, legal compliance, audit requirements

Good Accuracy Acceptable¶

Primary: Gemma2:9B
Backup: Qwen2.5-Coder
Use Case: Personal finance, small business, development

Basic Processing OK¶

Primary: Pattern Fallback (no LLM)
Backup: Any functional Ollama model
Use Case: Bulk processing, non-critical applications

2. Speed Requirements¶

Ultra-Fast (< 8 seconds)¶

Gemma2:9B - 6.65s ⚡
Mistral:Instruct - 7.63s

Fast (8-12 seconds)¶

Qwen2.5:latest - 8.53s
Qwen2.5-Coder - 8.59s
OpenHermes - 8.66s
OpenAI GPT-4o-mini - 10.85s

Moderate (12-20 seconds)¶

Acceptable for batch processing
DeepSeek-r1:latest, Phi4:latest

Slow (> 20 seconds)¶

Only for background processing
Avoid: Qwen3, Llama3.2

3. Privacy & Deployment¶

Privacy-First (Local Only)¶

Gemma2:9B - Best local performance
Mistral:Instruct - Open source, reliable
Qwen2.5-Coder - Feature complete

Cloud OK (Best Performance)¶

OpenAI GPT-4o-mini - Industry leading
Gemma2:9B - Local backup option
Mistral:Instruct - Local alternative

Hybrid (Flexible)¶

Primary: OpenAI for critical documents
Fallback: Gemma2:9B for routine processing
Configure both in environment

4. Technical Resources¶

Limited Resources (< 8GB RAM)¶

OpenAI GPT-4o-mini (cloud)
Mistral:Instruct (4.1GB model)
OpenHermes (4.1GB model)

Moderate Resources (8-12GB RAM)¶

Gemma2:9B (5.4GB model) ✅ Recommended
Qwen2.5 variants (4.7GB each)
Multiple models can be loaded

High Resources (12GB+ RAM)¶

All models available
DeepSeek-Coder-v2 (8.9GB) for development
Phi4 (9.1GB) for Microsoft ecosystem

5. Rate Limiting & Backoff Considerations¶

API-Based Models (OpenAI)¶

Rate Limiting: 50 requests/minute, 1000/hour default limits
Backoff Strategy: Automatic exponential backoff with jitter on rate limits
Burst Capacity: 10 immediate requests allowed
Best For: High-volume processing with built-in reliability

Local Models (Ollama)¶

Rate Limiting: None (limited by local hardware)
Backoff Strategy: Minimal (only for temporary resource issues)
Burst Capacity: Limited by available RAM/CPU
Best For: Consistent processing without API delays

Rate Limiting Configuration¶

# OpenAI rate limiting (default values)
OPENAI_REQUESTS_PER_MINUTE=50
OPENAI_BURST_LIMIT=10
OPENAI_BACKOFF_MIN=1.0
OPENAI_BACKOFF_MAX=60.0

# For high-volume processing, increase limits
OPENAI_REQUESTS_PER_MINUTE=100
OPENAI_BURST_LIMIT=20

Configuration Examples¶

Production Setup (High Accuracy)¶

# Primary provider
LLM_PROVIDER=openai
OPENAI_API_KEY=your-key
OPENAI_MODEL=gpt-4o-mini

# Fallback enabled
LLM_FALLBACK_ENABLED=true
OLLAMA_MODEL=gemma2:9b
OLLAMA_BASE_URL=http://localhost:11434

Development Setup (Speed Focus)¶

# Fast local processing
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma2:9b
OLLAMA_BASE_URL=http://localhost:11434

# No fallback for consistent testing
LLM_FALLBACK_ENABLED=false

Privacy Setup (Local Only)¶

# Local only - no cloud services
LLM_PROVIDER=ollama
OLLAMA_MODEL=gemma2:9b
OLLAMA_BASE_URL=http://localhost:11434

# Pattern fallback only
ENABLE_FALLBACK_PROCESSING=true
LLM_FALLBACK_ENABLED=false

Budget Setup (Minimize Costs)¶

# Free local processing
LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral:instruct
OLLAMA_BASE_URL=http://localhost:11434

# OpenAI for critical documents only
# (comment out to disable)
# OPENAI_API_KEY=your-key

Use Case Specific Recommendations¶

Personal Finance Management¶

Model: Gemma2:9B
Reason: Fast, accurate, zero ongoing cost
Setup: Local Ollama installation

Small Business Accounting¶

Model: OpenAI GPT-4o-mini
Reason: Maximum accuracy for tax compliance
Setup: Cloud API with local backup

Enterprise/Financial Institution¶

Model: OpenAI GPT-4o-mini + Gemma2:9B hybrid
Reason: Accuracy for compliance, local for privacy
Setup: Dual provider configuration

Software Development¶

Model: Qwen2.5-Coder
Reason: Optimized for structured document processing
Setup: Local Ollama with development flags

Research/Academic¶

Model: Multiple models for comparison
Reason: Study model behavior and accuracy
Setup: Full Ollama installation with all models

Common Issues & Solutions¶

"Model is too slow"¶

✅ Switch to Gemma2:9B (fastest)
✅ Check GPU availability for Ollama
✅ Increase Ollama memory allocation
⚠️ Consider OpenAI for speed + accuracy

"Accuracy is poor"¶

✅ Switch to OpenAI GPT-4o-mini
✅ Try Mistral:Instruct for better segmentation
✅ Check document quality (scanned vs native PDF)
⚠️ Enable fallback processing as backup

"Model keeps failing"¶

✅ Check Ollama server status: ollama list
✅ Restart Ollama: ollama serve
✅ Switch to different model temporarily
✅ Enable fallback processing

"High memory usage"¶

✅ Use smaller models (Mistral, OpenHermes)
✅ Switch to OpenAI (cloud processing)
✅ Process fewer documents simultaneously
✅ Restart Ollama between large batches

"Inconsistent results"¶

✅ Set LLM_TEMPERATURE=0 for deterministic output
✅ Use OpenAI for maximum consistency
✅ Enable validation strictness: VALIDATION_STRICTNESS=strict
✅ Check document format consistency

Performance Monitoring¶

Key Metrics to Track¶

Processing Time: Target < 15 seconds per document
Accuracy Rate: Track segmentation errors
Memory Usage: Monitor during processing
Error Rate: LLM failures vs fallback usage

Monitoring Commands¶

# Check Ollama status
ollama ps

# Monitor memory usage
htop # or Activity Monitor on Mac

# Check processing logs
tail -f logs/statement_processing.log

# Test model performance
uv run python -m src.bank_statement_separator.main process test.pdf --dry-run

Getting Help¶

Model-Specific Issues¶

OpenAI: Check API key, quota limits, model availability
Ollama: Verify installation, model downloads, memory allocation
Pattern Fallback: Review document format, enable debug logging

Performance Issues¶

Run model comparison tests (see LLM Model Testing)
Check Model Comparison Tables
Review Troubleshooting Guide

Community Resources¶

GitHub Issues: Report bugs and feature requests
Discussions: Model performance comparisons
Documentation: Detailed technical references