Error Tagging Testing Guide¶
Comprehensive testing guide for the error detection and tagging system that automatically tags Paperless-ngx documents with processing errors.
Overview¶
The error tagging system identifies 6 types of processing errors and automatically applies configurable tags to affected documents in Paperless-ngx for manual review. This guide covers testing strategies and manual verification procedures.
Testing Components¶
Core Components¶
- ErrorDetector (
src/bank_statement_separator/utils/error_detector.py) - Detects processing errors from workflow state
- Evaluates error severity levels
-
Applies threshold filtering
-
ErrorTagger (
src/bank_statement_separator/utils/error_tagger.py) - Applies error tags to Paperless documents
- Supports batch and individual tagging modes
-
Handles API errors gracefully
-
Workflow Integration (
src/bank_statement_separator/workflow.py) - Integrates error detection into processing pipeline
- Triggers tagging during Paperless upload
Unit Testing¶
Running Unit Tests¶
# Run all error tagging tests
uv run pytest tests/unit/test_error_tagging.py -v
# Run configuration tests
uv run pytest tests/unit/test_error_tagging_config.py -v
# Run with coverage
uv run pytest tests/unit/test_error_tagging*.py --cov=src/bank_statement_separator/utils/error_detector --cov=src/bank_statement_separator/utils/error_tagger --cov-report=html
Test Coverage¶
The unit tests cover 28 test cases across:
- Error Detection (10 tests)
- All 6 error types detection
- Threshold filtering
- Severity level evaluation
-
Configuration validation
-
Automatic Tagging (12 tests)
- Document tagging with mock Paperless client
- Batch vs individual tagging modes
- Error handling and graceful degradation
-
Tag application verification
-
Workflow Integration (6 tests)
- End-to-end workflow with error detection
- Upload results with error tagging
- Configuration scenarios
Key Test Scenarios¶
# Example test cases from the test suite
def test_detect_llm_analysis_error():
"""Test LLM analysis error detection."""
def test_detect_low_confidence_boundaries():
"""Test boundary detection with low confidence."""
def test_apply_error_tags_success():
"""Test successful error tag application."""
def test_workflow_integration_with_errors():
"""Test full workflow with error detection enabled."""
Manual Integration Testing¶
Test Environment Setup¶
- Configure Test Environment
# .env.testing
PAPERLESS_ENABLED=true
PAPERLESS_URL=https://your-paperless-instance.com
PAPERLESS_TOKEN=your-test-api-token
# Error detection configuration
PAPERLESS_ERROR_DETECTION_ENABLED=true
PAPERLESS_ERROR_TAGS=test:error-detection,test:automated-tagging
PAPERLESS_ERROR_TAG_THRESHOLD=0.0
PAPERLESS_ERROR_SEVERITY_LEVELS=low,medium,high,critical
PAPERLESS_ERROR_BATCH_TAGGING=false
- Create Required Tags in Paperless
- Go to Paperless Settings → Tags
- Create test tags:
test:error-detection,test:automated-tagging - Create error type tags:
error:llm,error:confidence,error:pdf, etc. - Create severity tags:
error:severity:high,error:severity:critical
Manual Test Scripts¶
The repository includes manual test scripts in tests/manual/:
1. Storage Path Verification¶
2. Complete Integration Test¶
# Full end-to-end test with document creation
uv run python tests/manual/test_final_complete_integration.py
3. Error Tagging with Existing Documents¶
# Test error tagging on existing documents
uv run python tests/manual/test_with_existing_documents.py
4. Results Verification¶
Expected Test Results¶
Successful integration tests should show:
🎉 COMPLETE SUCCESS!
✅ All documents are in 'test' storage path with error tags applied!
✅ Error detection and tagging system is fully operational!
FINAL RESULTS SUMMARY:
• Total documents found: 2
• Successfully configured: 2
• Success rate: 100.0%
Error Types Testing¶
1. LLM Analysis Failures¶
Test by simulating API failures or invalid responses:
# Test scenario: LLM analysis timeout
workflow_state = {
"current_step": "llm_analysis_error",
"error_message": "OpenAI API timeout after 60 seconds",
"llm_responses": [],
"api_calls_made": 3,
"total_api_failures": 3
}
2. Low Confidence Boundaries¶
Test boundary detection with low confidence scores:
# Test scenario: Poor boundary detection
workflow_state = {
"current_step": "boundary_detection",
"detected_boundaries": [
{"confidence": 0.2, "start_page": 1, "end_page": 5},
{"confidence": 0.3, "start_page": 6, "end_page": 10}
]
}
3. PDF Processing Errors¶
Test PDF generation failures:
# Test scenario: PDF generation failure
workflow_state = {
"current_step": "pdf_generation_error",
"error_message": "PDF generation failed: memory limit exceeded",
"generated_files": [],
"expected_files": ["statement1.pdf", "statement2.pdf"]
}
4. Metadata Extraction Issues¶
Test metadata extraction failures:
# Test scenario: Metadata extraction failure
workflow_state = {
"current_step": "metadata_extraction",
"extracted_metadata": {},
"metadata_extraction_errors": ["Failed to extract bank name", "No account number found"]
}
5. File Output Problems¶
Test file system issues:
# Test scenario: File output failure
workflow_state = {
"current_step": "file_output_error",
"generated_files": [],
"expected_files": ["statement1.pdf", "statement2.pdf"],
"file_system_errors": ["Disk space full", "Permission denied"]
}
6. Validation Failures¶
Test output validation issues:
# Test scenario: Validation failure
workflow_state = {
"current_step": "validation",
"validation_results": {
"is_valid": False,
"checks": {
"page_count": {"status": "failed", "expected": 10, "actual": 8},
"content_sampling": {"status": "failed", "error": "No readable text"}
}
}
}
Configuration Testing¶
Threshold Testing¶
Test different threshold values:
# High threshold (0.8) - only critical errors trigger tagging
PAPERLESS_ERROR_TAG_THRESHOLD=0.8
# Medium threshold (0.5) - medium and above trigger tagging
PAPERLESS_ERROR_TAG_THRESHOLD=0.5
# Low threshold (0.0) - all errors trigger tagging
PAPERLESS_ERROR_TAG_THRESHOLD=0.0
Severity Level Testing¶
Test different severity configurations:
# Only critical errors
PAPERLESS_ERROR_SEVERITY_LEVELS=critical
# High and critical errors
PAPERLESS_ERROR_SEVERITY_LEVELS=high,critical
# All error levels
PAPERLESS_ERROR_SEVERITY_LEVELS=low,medium,high,critical
Performance Testing¶
Batch vs Individual Tagging¶
Test performance differences:
# Individual tagging mode (default)
PAPERLESS_ERROR_BATCH_TAGGING=false
# Batch tagging mode (better for high volume)
PAPERLESS_ERROR_BATCH_TAGGING=true
High Volume Testing¶
Test with multiple documents:
- Create 10+ test documents with processing errors
- Measure tagging performance
- Verify all documents are tagged correctly
- Check for API rate limiting issues
Troubleshooting Tests¶
Permission Testing¶
Test API token permissions:
# Test tag creation permission
curl -X POST \
-H "Authorization: Token $PAPERLESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "test-permission-check"}' \
"$PAPERLESS_URL/api/tags/"
Network Error Simulation¶
Test error handling for network issues:
- Temporarily block network access
- Run error tagging tests
- Verify graceful degradation
- Check error logging
Tag Existence Testing¶
Test behavior when tags don't exist:
- Remove error tags from Paperless
- Run error detection tests
- Verify tag creation attempts
- Check fallback behavior
Continuous Integration Testing¶
GitHub Actions Integration¶
The error tagging tests run in CI/CD:
# Example CI test configuration
- name: Test Error Tagging
run: |
uv run pytest tests/unit/test_error_tagging.py --cov=src --cov-report=xml
- name: Upload Coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
Automated Testing Strategy¶
- Unit Tests: Run on every commit
- Integration Tests: Run on PR to main
- Manual Tests: Run before release
- Performance Tests: Run weekly
Monitoring and Observability¶
Log Analysis¶
Monitor error tagging in logs:
# Check error detection results
grep "Detected.*processing errors" logs/statement_processing.log
# Monitor tagging success rates
grep "Error Tagging Results" logs/statement_processing.log | \
awk '{success+=$0} END{print "Success rate: " success/NR}'
# Check for tagging failures
grep "Failed to apply error tags" logs/statement_processing.log
Metrics Collection¶
Key metrics to track:
- Error detection rate by type
- Tag application success rate
- Processing time impact
- API error rates
Dashboard Queries¶
Example monitoring queries:
# Error detection by type
grep "error_type" logs/statement_processing.log | \
sort | uniq -c | sort -nr
# Tagging success rate by hour
grep "tagged_documents" logs/statement_processing.log | \
grep "$(date +%Y-%m-%d)" | \
awk '{hour=substr($2,1,2); success[hour]+=$NF} END{for(h in success) print h":00 " success[h]}'
Best Practices¶
Testing Best Practices¶
- Test with Real Data: Use actual bank statement PDFs when possible
- Test Error Scenarios: Don't just test success cases
- Verify Cleanup: Ensure test documents are properly tagged/removed
- Performance Testing: Test with realistic document volumes
- Security Testing: Verify proper permission handling
Documentation Best Practices¶
- Document Test Cases: Keep test documentation up-to-date
- Error Scenarios: Document how to reproduce each error type
- Expected Outcomes: Clearly define success criteria
- Troubleshooting: Document common issues and solutions
Development Best Practices¶
- Test-Driven Development: Write tests before implementing features
- Mock External Dependencies: Use mocks for Paperless API in unit tests
- Integration Testing: Test with real Paperless instances
- Error Handling: Test all error paths thoroughly
- Performance Monitoring: Track performance impact of error detection
This testing guide ensures comprehensive validation of the error detection and tagging system across all scenarios and environments.