Installation Guide¶
Complete installation instructions for the Workflow Bank Statement Separator.
System Requirements¶
Minimum Requirements¶
- Python: 3.11 or higher
- Memory: 4GB RAM
- Storage: 1GB free space
- Network: Internet access for AI API calls
Recommended Requirements¶
- Python: 3.12+
- Memory: 8GB+ RAM (for large documents)
- Storage: 5GB+ free space (for quarantine and logs)
- CPU: Multi-core processor for faster processing
Operating Systems¶
- Linux: Ubuntu 20.04+, CentOS 8+, any modern distribution
- macOS: macOS 11+ (Big Sur)
- Windows: Windows 10+ with WSL2 recommended
Installation Methods¶
UV is the fastest and most reliable way to install and manage dependencies.
Install UV¶
# Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Verify installation
uv --version
Install Project¶
If you prefer using pip, follow these steps:
# Clone repository
git clone <repository-url>
cd bank-statement-separator
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Linux/macOS:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install project
pip install -e .
# Install development dependencies (optional)
pip install -e ".[dev]"
Verification¶
After installation, verify everything is working:
1. Test Imports¶
# Using UV
uv run python -c "import src.bank_statement_separator; print('✅ Import successful')"
# Using pip/venv
python -c "import src.bank_statement_separator; print('✅ Import successful')"
2. Check CLI¶
# Using UV
uv run python -m src.bank_statement_separator.main --help
# Using pip/venv
python -m src.bank_statement_separator.main --help
Expected output:
Usage: main.py [OPTIONS] COMMAND [ARGS]...
Workflow Bank Statement Separator - AI-powered document processing
Commands:
process Process a PDF file containing multiple bank statements
quarantine-clean Clean old files from quarantine directory
quarantine-status Show quarantine directory status
3. Run Test Suite¶
Expected output:
Configuration Setup¶
1. Environment Variables¶
2. Required Variables¶
Set these essential variables in your .env file:
# AI Processing (recommended but optional)
OPENAI_API_KEY=sk-your-api-key-here
# Core Configuration
LLM_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_DIR=./separated_statements
LOG_LEVEL=INFO
3. Directory Structure¶
The system will create these directories automatically:
bank-statement-separator/
├── logs/ # Processing logs
├── separated_statements/ # Default output directory
├── quarantine/ # Failed documents
│ └── reports/ # Error reports
├── test/
│ ├── input/ # Test input files
│ └── output/ # Test outputs
└── .env # Your configuration
Optional Integrations¶
Paperless-ngx Integration¶
If you want automatic document management:
# Add to .env file
PAPERLESS_ENABLED=true
PAPERLESS_URL=http://your-paperless-instance:8000
PAPERLESS_TOKEN=your-api-token
PAPERLESS_TAGS=bank-statement,automated
PAPERLESS_TAG_WAIT_TIME=5
Development Tools¶
For development work, install additional tools:
# Using UV
uv sync --group dev
# Using pip
pip install -e ".[dev]"
# Verify development tools
uv run black --version
uv run ruff --version
uv run pytest --version
Troubleshooting Installation¶
Common Issues¶
Performance Optimization¶
For better performance, especially with large documents:
# Install optional performance packages
uv add numpy pandas # For faster data processing
uv add pillow # For better image handling
# Set environment variables for performance
echo "OMP_NUM_THREADS=4" >> .env
echo "MAX_FILE_SIZE_MB=500" >> .env
Production Deployment¶
For production environments:
1. Security Configuration¶
# Set secure directories
echo "ALLOWED_INPUT_DIRS=/secure/input" >> .env
echo "ALLOWED_OUTPUT_DIRS=/secure/output" >> .env
echo "QUARANTINE_DIRECTORY=/secure/quarantine" >> .env
# Enable comprehensive logging
echo "ENABLE_AUDIT_LOGGING=true" >> .env
echo "LOG_LEVEL=INFO" >> .env
2. System Service (Linux)¶
Create a systemd service for automated processing:
# /etc/systemd/system/bank-separator.service
[Unit]
Description=Bank Statement Separator
After=network.target
[Service]
Type=simple
User=app
WorkingDirectory=/opt/bank-statement-separator
Environment=PATH=/opt/bank-statement-separator/.venv/bin
ExecStart=/opt/bank-statement-separator/.venv/bin/python -m src.bank_statement_separator.main process /input/statements.pdf
Restart=on-failure
[Install]
WantedBy=multi-user.target
3. Log Rotation¶
# /etc/logrotate.d/bank-separator
/opt/bank-statement-separator/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
sharedscripts
}
Next Steps¶
After successful installation:
- Configure the system: Review Configuration Guide
- Test with sample data: Follow Quick Start Guide
- Learn the CLI: Explore CLI Commands
- Set up integrations: Configure Paperless Integration
Support¶
Need help with installation?
- Check Troubleshooting Guide
- Review Working Notes
- Report installation issues on GitHub