Error Handling Guide¶
Comprehensive guide to understanding and managing errors in the Workflow Bank Statement Separator.
Error Handling Overview¶
The system provides a comprehensive error handling framework with multiple layers of protection:
- Pre-validation: Document format and content validation before processing
- Processing Errors: Smart handling of failures during workflow execution
- Quarantine System: Automatic isolation of problematic documents
- Recovery Suggestions: Actionable guidance for resolving issues
Error Categories¶
Critical Errors¶
These errors stop processing immediately and quarantine the document:
- Password Protection: PDF requires password to access
- File Corruption: PDF structure is damaged or invalid
- Access Denied: Insufficient permissions to read/write files
- Resource Exhaustion: Out of memory or disk space
Recoverable Errors¶
These errors trigger retry logic with exponential backoff and jitter:
- Network Timeouts: API requests that timeout
- Temporary File Locks: Files temporarily in use
- Rate Limiting: API rate limits exceeded (automatic backoff)
- Transient API Errors: Temporary service issues
- Resource Contention: Temporary system resource issues
Backoff Strategy Details¶
The system implements a sophisticated backoff mechanism:
- Exponential Delay: Base delay doubles with each retry attempt
- Jitter: Random variation (10%-100%) prevents thundering herd
- Maximum Delay: Capped at 60 seconds to prevent excessive waits
- Selective Retries: Only retries on specific error types (RateLimitError, timeouts)
- Configurable Limits: Adjustable retry counts and base delays
For detailed information about the backoff implementation, see the Backoff Mechanisms Design Document.
Validation Warnings¶
These generate warnings but may allow processing to continue:
- Old Documents: Files older than configured age limit
- Large Files: Files exceeding recommended size limits
- Low Text Content: Documents with minimal extractable text
- Missing Metadata: Statements without clear account information
Quarantine System¶
How It Works¶
When a document fails critical validation or processing:
- Document Quarantine: File moved to quarantine directory with timestamp
- Error Report: Detailed JSON report generated with failure details
- Recovery Suggestions: Actionable steps provided for resolution
- Audit Logging: Complete failure trail recorded
Quarantine Directory Structure¶
quarantine/
├── failed_20240831_143022_statement.pdf # Quarantined document
├── failed_20240831_143055_document.pdf # Another failed document
└── reports/ # Error reports directory
├── error_report_20240831_143022.json # Detailed error report
└── error_report_20240831_143055.json # Another error report
Example Error Report¶
{
"timestamp": "2024-08-31T14:30:22",
"quarantine_file": "/quarantine/failed_20240831_143022_statement.pdf",
"original_file": "/input/problematic_statement.pdf",
"error_reason": "Document format validation failed: Password protected",
"workflow_step": "pdf_ingestion_format_error",
"configuration": {
"validation_strictness": "normal",
"max_file_size_mb": 100,
"allowed_extensions": [".pdf"]
},
"recovery_suggestions": [
"Remove password protection from the PDF",
"Use a PDF tool to unlock the document",
"Contact the document source for an unlocked version"
],
"system_info": {
"python_version": "3.11.0",
"memory_available": "4.2 GB",
"disk_space": "150 GB"
}
}
Validation Strictness Levels¶
Configure error handling behavior with the VALIDATION_STRICTNESS setting:
Strict Mode¶
- All validation issues are treated as errors
- Processing stops on first failure
- Highest accuracy, lowest success rate
- Best for critical financial processing
Normal Mode (Default)¶
- Balanced approach between validation and success
- Some issues generate warnings but allow processing
- Good accuracy with reasonable success rate
- Recommended for most use cases
Lenient Mode¶
- Most validation issues generate warnings only
- Processing continues unless critical failures occur
- Lower accuracy, highest success rate
- Best for exploratory or bulk processing
Common Error Scenarios¶
Password-Protected PDFs¶
Error: Document requires password for access
Recovery Steps:
- Remove password protection using PDF tools
- Request unprotected version from source
- Use PDF utilities like
qpdfor Adobe Acrobat
File Corruption¶
Error: PDF structure is damaged or incomplete
Recovery Steps:
- Re-download or re-scan the original document
- Use PDF repair tools
- Convert to different format and back to PDF
# Attempt PDF repair with Ghostscript
gs -o repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf
API Quota Exceeded¶
Error: OpenAI API quota or rate limits exceeded
Recovery Steps:
- Wait for quota reset (usually monthly)
- Upgrade OpenAI plan for higher limits
- Use fallback processing mode
# Process without API key (fallback mode)
OPENAI_API_KEY="" uv run python -m src.bank_statement_separator.main process input.pdf
Insufficient Text Content¶
Error: Document appears to be image-only or has minimal text
Recovery Steps:
- Check if document is scanned image
- Apply OCR processing before separation
- Adjust minimum text content ratio
# Adjust text content requirements
MIN_TEXT_CONTENT_RATIO=0.05 # Lower threshold
REQUIRE_TEXT_CONTENT=false # Disable requirement
Large File Processing¶
Error: File exceeds size limits or causes memory issues
Recovery Steps:
- Increase file size limits in configuration
- Process on machine with more memory
- Split large documents before processing
Error Prevention¶
Pre-Processing Validation¶
Enable comprehensive validation before processing starts:
# Enable all validation checks
VALIDATE_PDF_STRUCTURE=true
CHECK_PDF_CORRUPTION=true
REQUIRE_TEXT_CONTENT=true
MIN_TEXT_CONTENT_RATIO=0.1
Resource Management¶
Prevent resource-related errors:
Network Reliability¶
Configure robust API handling with advanced backoff mechanisms:
# API reliability settings
API_TIMEOUT_SECONDS=120
MAX_RETRY_ATTEMPTS=3
# Backoff configuration
OPENAI_BACKOFF_MIN=1.0 # Minimum delay between retries
OPENAI_BACKOFF_MAX=60.0 # Maximum delay cap
OPENAI_BACKOFF_MULTIPLIER=2.0 # Exponential growth factor
# Rate limiting
OPENAI_REQUESTS_PER_MINUTE=50
OPENAI_BURST_LIMIT=10
CLI Error Management¶
Check Quarantine Status¶
# View quarantine directory status
uv run python -m src.bank_statement_separator.main quarantine-status
# Detailed view with error analysis
uv run python -m src.bank_statement_separator.main quarantine-status --verbose
Clean Quarantine Directory¶
# Preview cleanup (safe)
uv run python -m src.bank_statement_separator.main quarantine-clean --dry-run
# Clean files older than 30 days
uv run python -m src.bank_statement_separator.main quarantine-clean --days 30
# Force cleanup without confirmation
uv run python -m src.bank_statement_separator.main quarantine-clean --yes
Error Log Analysis¶
# View recent errors
tail -f logs/statement_processing.log | grep ERROR
# Search for specific error types
grep "quarantined" logs/statement_processing.log
# Monitor API failures
grep "API_ERROR" logs/audit.log
Recovery Workflows¶
Document Recovery Process¶
- Identify Issue: Check error report for specific problem
- Apply Fix: Follow recovery suggestions in report
- Reprocess: Attempt processing with corrected document
- Verify Results: Confirm successful processing
Batch Recovery¶
For multiple failed documents:
#!/bin/bash
# recover_quarantined.sh
QUARANTINE_DIR="./quarantine"
RECOVERED_DIR="./recovered"
for pdf in "$QUARANTINE_DIR"/failed_*.pdf; do
echo "Attempting to recover: $pdf"
# Try processing with lenient validation
VALIDATION_STRICTNESS=lenient uv run python -m src.bank_statement_separator.main \
process "$pdf" --output "$RECOVERED_DIR" --yes
if [[ $? -eq 0 ]]; then
echo "✅ Successfully recovered: $pdf"
else
echo "❌ Still failing: $pdf"
fi
done
Monitoring and Alerts¶
Error Rate Monitoring¶
# Calculate daily error rate
grep "quarantined" logs/statement_processing.log | \
grep "$(date +%Y-%m-%d)" | wc -l
# Success rate over last 100 operations
tail -100 logs/statement_processing.log | \
grep -E "(SUCCESS|ERROR)" | \
awk '/SUCCESS/{s++} /ERROR/{e++} END{print "Success rate: " s/(s+e)*100 "%"}'
Automated Alerts¶
Set up monitoring scripts for production:
#!/bin/bash
# monitor_errors.sh
ERROR_COUNT=$(grep "ERROR" logs/statement_processing.log | \
grep "$(date +%Y-%m-%d)" | wc -l)
if [[ $ERROR_COUNT -gt 10 ]]; then
echo "High error rate detected: $ERROR_COUNT errors today" | \
mail -s "Bank Separator Alert" admin@company.com
fi
# Check quarantine size
QUARANTINE_SIZE=$(du -sm quarantine/ | cut -f1)
if [[ $QUARANTINE_SIZE -gt 1000 ]]; then
echo "Quarantine directory size: ${QUARANTINE_SIZE}MB" | \
mail -s "Quarantine Size Alert" admin@company.com
fi
Configuration for Error Handling¶
Production Configuration¶
# High reliability production setup
VALIDATION_STRICTNESS=strict
MAX_RETRY_ATTEMPTS=3
ENABLE_ERROR_REPORTING=true
AUTO_QUARANTINE_CRITICAL_FAILURES=true
PRESERVE_FAILED_OUTPUTS=true
CONTINUE_ON_VALIDATION_WARNINGS=false
# Comprehensive logging
ENABLE_AUDIT_LOGGING=true
LOG_LEVEL=INFO
LOG_API_CALLS=true
Development Configuration¶
# Permissive development setup
VALIDATION_STRICTNESS=lenient
MAX_RETRY_ATTEMPTS=1
CONTINUE_ON_VALIDATION_WARNINGS=true
PRESERVE_FAILED_OUTPUTS=true
ENABLE_ERROR_REPORTING=true
# Debug logging
LOG_LEVEL=DEBUG
DEVELOPMENT_MODE=true
Best Practices¶
Error Prevention¶
- Validate Early: Enable pre-processing validation
- Set Appropriate Limits: Configure reasonable file size and page limits
- Monitor Resources: Watch memory and disk usage
- Test Regularly: Run test suite to catch regressions
Error Response¶
- Review Error Reports: Always check detailed error reports
- Follow Recovery Steps: Apply suggested recovery actions
- Update Configuration: Adjust settings based on error patterns
- Document Issues: Keep track of common problems and solutions
Monitoring¶
- Track Error Rates: Monitor success/failure ratios
- Review Quarantine: Regularly check quarantined documents
- Clean Up: Implement automated cleanup of old files
- Alert on Issues: Set up monitoring for critical errors
Troubleshooting Common Issues¶
High Error Rates¶
If you're seeing many errors:
- Check validation strictness level
- Review file quality in your input
- Verify API key and quota status
- Monitor system resources
Quarantine Filling Up¶
If quarantine directory grows large:
- Review error patterns in reports
- Fix common document issues at source
- Implement regular cleanup schedule
- Consider adjusting validation settings
Processing Slowdowns¶
If processing becomes slow:
- Check for high retry rates due to rate limiting
- Monitor API response times and backoff delays
- Review system resource usage
- Consider adjusting rate limits for your use case
- Enable backoff monitoring to track delay patterns
- Consider batch processing optimization
Rate Limiting Issues¶
If experiencing frequent rate limit errors:
- Check Current Limits: Review
OPENAI_REQUESTS_PER_MINUTEsetting - Monitor Usage: Use rate limiter statistics to understand patterns
- Adjust Burst Capacity: Increase
OPENAI_BURST_LIMITfor traffic spikes - Optimize Timing: Process during off-peak hours if possible
- Consider Local Models: Switch to Ollama for unlimited local processing
Technical Reference¶
For detailed technical configuration, implementation details, and advanced error handling setup, see the Error Handling Technical Reference.
This technical guide includes:
- Complete environment variable configurations
- Production deployment best practices
- Advanced monitoring and maintenance procedures
- Detailed cron job setups for automation
- Low-level implementation details