OSINT Enrichment Roadmap¶

Technical recommendations for enhancing the OSINT skill with automated enrichment capabilities.

Current State¶

The OSINT skill provides 13 comprehensive workflows covering person, domain, and company intelligence. This document outlines enrichment techniques to automate data collection and improve intelligence quality.

1. API Integration Layer¶

Priority 1: Core OSINT APIs¶

API	Purpose	Free Tier	Integration Priority
Shodan	Infrastructure/IoT scanning	100 queries/month	HIGH
SecurityTrails	DNS history, subdomains	Limited	HIGH
Hunter.io	Email enumeration	25 searches/month	HIGH
HaveIBeenPwned	Breach checking	Free (rate limited)	HIGH
VirusTotal	Domain/IP reputation	4 req/min	MEDIUM

Priority 2: Enhanced Intelligence¶

API	Purpose	Free Tier	Integration Priority
Censys	Certificate transparency	Limited	MEDIUM
BuiltWith	Technology stack detection	Limited	MEDIUM
Clearbit	Company enrichment	50 req/month	MEDIUM
FullContact	Person enrichment	Limited	LOW
Pipl	Identity resolution	Paid only	LOW

Implementation Approach¶

// Proposed structure: src/skills/osint/Tools/EnrichmentEngine.ts
interface EnrichmentProvider {
  name: string;
  apiKey: string | undefined;
  rateLimit: { requests: number; window: string };
  enrich(target: Target): Promise<EnrichmentResult>;
}

// Configuration in .env
SHODAN_API_KEY="..."
SECURITYTRAILS_API_KEY="..."
HUNTER_API_KEY="..."
HIBP_API_KEY="..."

2. Automated Data Pipeline¶

Collection Layer¶

┌─────────────────────────────────────────────────────────┐
│                   COLLECTION SOURCES                     │
├──────────────┬──────────────┬──────────────┬────────────┤
│ Web Scraping │ API Queries  │ DNS/WHOIS    │ Feeds      │
│ (Browser)    │ (Shodan etc) │ (dig, whois) │ (RSS/STIX) │
└──────┬───────┴──────┬───────┴──────┬───────┴─────┬──────┘
       │              │              │             │
       └──────────────┴──────────────┴─────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                   NORMALIZATION LAYER                    │
│  • Deduplicate records                                  │
│  • Standardize formats (dates, names, identifiers)      │
│  • Extract entities (NER for people, orgs, locations)   │
│  • Assign confidence scores                              │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   ENRICHMENT LAYER                       │
│  • Cross-reference with breach databases                │
│  • Resolve identities across platforms                  │
│  • Geolocate IPs and domains                            │
│  • Calculate risk scores                                 │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                   STORAGE LAYER                          │
│  • Knowledge Graph (entities + relationships)           │
│  • File Reports (human-readable)                        │
│  • Temporal metadata (validity tracking)                 │
└─────────────────────────────────────────────────────────┘

3. Entity Resolution Engine¶

Cross-Platform Identity Linking¶

Technique: Build unified "digital persona" from fragmented platform data.

resolution_signals:
  strong:
    - Same email across platforms (weight: 0.95)
    - Same phone number (weight: 0.90)
    - Linked accounts (OAuth connections) (weight: 0.85)

  medium:
    - Username consistency (weight: 0.60)
    - Profile photo similarity (weight: 0.55)
    - Bio text overlap (weight: 0.50)
    - Posting time patterns (weight: 0.45)

  weak:
    - Similar display names (weight: 0.30)
    - Geographic overlap (weight: 0.25)
    - Interest/topic correlation (weight: 0.20)

Graph-Based Resolution¶

Leverage the knowledge skill's graph capabilities:

// Entity resolution query pattern
interface IdentityCluster {
  primary_entity: Entity;
  linked_entities: Entity[];
  confidence: number;
  resolution_path: ResolutionSignal[];
}

// Store resolution results
add_memory({
  name: "Identity Resolution",
  episode_body: JSON.stringify({
    cluster_id: "...",
    entities: [...],
    confidence: 0.85,
    signals: [...]
  }),
  source: "json",
  group_id: "osint_identities"
});

4. Missing Workflow Recommendations¶

Based on the skill review, these workflows would enhance coverage:

4.1 Email OSINT Workflow¶

File: Workflows/EmailRecon.md

## Email Reconnaissance Workflow

### Collection Steps
1. **Email Validation** - Verify deliverability
2. **Breach Check** - HaveIBeenPwned lookup
3. **Email-to-Social** - Find linked accounts (Gravatar, GitHub, etc.)
4. **Domain Analysis** - MX records, organization identification
5. **Header Analysis** - For provided email samples

### Enrichment APIs
- Hunter.io (email finder/verifier)
- HaveIBeenPwned (breach data)
- Gravatar (avatar hash lookup)
- GitHub API (email-to-user resolution)

4.2 Phone Number OSINT Workflow¶

File: Workflows/PhoneRecon.md

## Phone Number Reconnaissance Workflow

### Collection Steps
1. **Carrier Lookup** - Identify carrier and line type
2. **VOIP Detection** - Distinguish landline/mobile/VOIP
3. **Caller ID** - Name association lookups
4. **Reverse Lookup** - Associated addresses
5. **Social Correlation** - Platform phone verification

### Enrichment Sources
- NumVerify API
- Twilio Lookup
- Open carrier databases

4.3 Image/Photo OSINT Workflow¶

File: Workflows/ImageRecon.md

## Image OSINT Workflow

### Collection Steps
1. **Reverse Image Search** - Google, TinEye, Yandex
2. **EXIF Extraction** - Metadata, GPS, camera info
3. **Facial Recognition** - PimEyes, similar services
4. **Manipulation Detection** - Deepfake/edit detection
5. **Context Analysis** - Background location identification

### Tools
- ExifTool
- TinEye API
- Google Vision API

5. Real-Time Monitoring¶

Alert-Based Collection¶

monitoring_capabilities:
  username_monitoring:
    - New account creation alerts
    - Username availability changes
    - Paste site mentions (Pastebin, etc.)

  domain_monitoring:
    - DNS record changes
    - SSL certificate updates
    - New subdomain discovery
    - WHOIS changes

  company_monitoring:
    - SEC filing alerts
    - News/press mentions
    - Job posting changes (growth signals)
    - Domain portfolio changes

Implementation¶

// Proposed: src/skills/osint/Tools/Monitor.ts
interface MonitoringRule {
  target_type: "username" | "domain" | "company" | "person";
  target_value: string;
  check_interval: string;  // "1h", "24h", etc.
  alert_conditions: AlertCondition[];
}

// Store rules in knowledge graph
add_memory({
  name: "OSINT Monitor Rule",
  episode_body: JSON.stringify(rule),
  source: "json",
  group_id: "osint_monitors"
});

6. Threat Intelligence Integration¶

STIX/TAXII Support¶

For CTI use cases, integrate standard threat intel formats:

threat_intel_enrichment:
  feeds:
    - AlienVault OTX (free)
    - Abuse.ch (free)
    - CIRCL (free)

  enrichment_types:
    - IP reputation
    - Domain reputation
    - File hash lookup
    - MITRE ATT&CK mapping

Integration with CTI Skill¶

The OSINT skill can feed indicators to the CTI skill for analysis:

OSINT discovers → IOC → CTI analyzes → MITRE mapping → Risk score

7. Implementation Priorities¶

Phase 1: Core API Integration (Recommended First)¶

Create EnrichmentEngine.ts abstraction layer
Integrate Shodan for InfraMapping workflow
Integrate Hunter.io for email enumeration
Integrate HaveIBeenPwned for breach checking

Phase 2: Missing Workflows¶

EmailRecon.md workflow
PhoneRecon.md workflow
ImageRecon.md workflow

Phase 3: Advanced Capabilities¶

Entity resolution engine
Real-time monitoring framework
STIX/TAXII integration

Phase 4: AI Enhancement¶

NLP-based entity extraction
Automated confidence scoring
Pattern detection across investigations
Anomaly detection in timelines

8. Security Considerations¶

API Key Management¶

Store all API keys in .env (never commit)
Implement key rotation reminders
Rate limit tracking to avoid bans

OPSEC¶

Proxy support for sensitive queries
User-agent rotation for web scraping
Request timing randomization

Data Protection¶

Encrypt stored intelligence at rest
Implement retention policies
Access logging for audit trails

References¶

theHarvester - Multi-source OSINT tool
SpiderFoot - Automated OSINT platform
Recon-ng - Web reconnaissance framework
Maltego - Graph-based OSINT visualization
OSINT Framework - Collection of OSINT tools

Last updated: 2026-01-11 Version: 1.0.0