Release Notes v0.3.0¶
Release Date: September 10, 2025 Version: 0.3.0
Overview¶
This major release introduces a comprehensive Error Detection and Tagging System for Paperless integration, providing automatic identification and tagging of documents that encounter processing issues during workflow execution. This enhancement significantly improves workflow reliability by enabling automated error tracking and document flagging for manual review.
๐ New Features¶
Error Detection and Tagging System¶
- Automatic Error Detection: Identifies 6 categories of processing errors during workflow execution:
- LLM Analysis Failures (API errors, model failures, fallback usage)
- Boundary Detection Issues (low confidence boundaries, suspicious patterns)
- PDF Processing Errors (file corruption, access issues, format problems)
- Metadata Extraction Failures (missing account data, date parsing issues)
- Validation Failures (content validation, integrity checks)
-
File Output Issues (write failures, permissions, disk space)
-
Configurable Error Tagging: Apply customizable error tags to documents with processing issues
- Environment-based configuration with severity filtering
- Support for both individual and batch tagging modes
- Configurable error severity thresholds and tag customization
-
Automatic rollback capabilities on tagging failures
-
Enhanced Workflow Integration: Seamlessly integrated into existing 8-node workflow
- Error detection occurs after successful Paperless uploads
- Graceful degradation with comprehensive audit logging
- Non-blocking operation - workflow continues even if tagging fails
Configuration Options¶
New environment variables for error detection system:
PAPERLESS_ERROR_DETECTION_ENABLED: Enable/disable error detection systemPAPERLESS_ERROR_TAGS: Base error tags to apply to all error documentsPAPERLESS_ERROR_TAG_THRESHOLD: Confidence threshold for boundary error detectionPAPERLESS_ERROR_SEVERITY_LEVELS: Error severity levels that trigger taggingPAPERLESS_ERROR_BATCH_TAGGING: Use batch mode vs individual document taggingPAPERLESS_TAG_WAIT_TIME: Wait time between tagging operations
๐ง Improvements¶
Code Quality Enhancements¶
- Performance Optimization: Combined boundary detection loops for improved efficiency
- Maintainability: Replaced magic numbers with named constants
- Configuration Consistency: Dynamic severity level configuration instead of hardcoded values
- Error Handling: Replaced bare except clauses with specific exception handling
Documentation Updates¶
- Enhanced Workflow Architecture: Updated diagrams to include error detection and tagging flows
- Comprehensive User Guide: Complete documentation for error detection configuration and usage
- Developer Testing Guide: Manual testing scripts and validation procedures
- Environment Variable Reference: Detailed documentation of all error detection settings
๐ Technical Details¶
Error Detection Categories¶
| Category | Detection Criteria | Applied Tags | Severity |
|---|---|---|---|
| LLM Analysis Failure | API errors, model failures, fallback usage | error:llm, error:api-failure |
High |
| Boundary Detection Issues | Low confidence boundaries, suspicious patterns | error:confidence, error:boundary |
Medium |
| PDF Processing Errors | File corruption, access issues, format problems | error:pdf, error:processing |
High |
| Metadata Extraction Failure | Missing account data, date parsing issues | error:metadata, error:extraction |
Medium |
| Validation Failures | Content validation, integrity checks | error:validation |
Medium-High |
| File Output Issues | Write failures, permissions, disk space | error:output, error:file-system |
Critical |
Testing Infrastructure¶
- Unit Tests: Comprehensive test coverage for error detection and tagging functionality
- Integration Tests: End-to-end testing with mock and real Paperless instances
- Manual Test Scripts: Complete testing suite in
tests/manual/directory - Configuration Tests: Validation of environment variable loading and parsing
๐ Usage Examples¶
Basic Configuration¶
# Enable error detection with basic configuration
PAPERLESS_ERROR_DETECTION_ENABLED=true
PAPERLESS_ERROR_TAGS="error:processing,needs:review"
PAPERLESS_ERROR_SEVERITY_LEVELS="medium,high,critical"
Advanced Configuration¶
# Advanced error detection with batch tagging
PAPERLESS_ERROR_DETECTION_ENABLED=true
PAPERLESS_ERROR_TAGS="error:automated,quality:review-required,status:failed"
PAPERLESS_ERROR_TAG_THRESHOLD=0.7
PAPERLESS_ERROR_SEVERITY_LEVELS="high,critical"
PAPERLESS_ERROR_BATCH_TAGGING=true
PAPERLESS_TAG_WAIT_TIME=3
โ ๏ธ Important Notes¶
Backward Compatibility¶
- Full Compatibility: All existing functionality remains unchanged
- Optional Feature: Error detection is disabled by default - existing workflows unaffected
- Configuration Driven: Enable only the features you need via environment variables
Performance Impact¶
- Minimal Overhead: Error detection adds <1 second to workflow execution
- Efficient Processing: Optimized boundary detection with single-loop analysis
- Configurable Delays: Respect API rate limits with configurable wait times
๐งช Testing¶
Manual Testing¶
Run comprehensive error detection tests:
# Test error detection with mock client
uv run python tests/manual/test_error_tagging_e2e.py
# Test with real Paperless instance (creates real documents)
uv run python tests/manual/test_real_paperless_error_tagging.py
# Verify results in Paperless
uv run python tests/manual/verify_final_results.py
Unit Testing¶
# Run error detection unit tests
uv run pytest tests/unit/test_error_tagging.py -v
uv run pytest tests/unit/test_error_tagging_config.py -v
๐ Documentation¶
Updated Documentation¶
- Workflow Architecture: Enhanced with error detection diagrams
- Paperless Integration: Complete error detection configuration guide
- Environment Variables: Full reference for error detection settings
- Error Tagging Testing: Developer testing and validation guide
๐ฏ Future Enhancements¶
This release establishes the foundation for advanced error handling capabilities:
- Machine Learning Integration: Potential integration with ML models for error prediction
- Custom Error Rules: User-defined error detection patterns and criteria
- Advanced Analytics: Error pattern analysis and reporting dashboards
- Automated Recovery: Self-healing capabilities for common error scenarios
Contributors: Stephen Eaton GitHub: v0.3.0 Release