Ollama Model Compatibility Guide¶
This guide documents which Ollama models work with the Madeinoz Knowledge System for entity extraction and embeddings.
Architecture Overview¶
The knowledge system uses two types of AI models:
| Component | Purpose | Requirements |
|---|---|---|
| LLM | Entity extraction, relationship detection | Must output valid JSON matching Pydantic schemas |
| Embedder | Vector embeddings for semantic search | Must support /v1/embeddings endpoint |
Recommended Configuration¶
For cost savings while maintaining reliability, we recommend a hybrid setup:
# LLM: gemini-2.0-Flash (reliable JSON output for entity extraction)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=google/gemini-2.0-flash-001
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key
# Embedder: Ollama (free local embeddings)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
This configuration:
- Uses gemini-2.0-flash-001 via OpenRouter for LLM (reliable and cost effective)
- Note:
LLM_PROVIDER=openaibecause OpenRouter is OpenAI-compatible - Uses Ollama for embeddings (free, runs locally)
- Reduces cloud costs while maintaining extraction quality
Cost Comparison¶
| Configuration | LLM Cost | Embedding Cost | Total |
|---|---|---|---|
| Full OpenAI | ~$0.15/1M tokens | ~$0.02/1M tokens | $$$ |
| Hybrid (recommended) | ~$0.15/1M tokens | Free | $$ |
| Full Ollama | Free | Free | Free* |
*Full Ollama has reliability trade-offs for entity extraction.
Model Test Results¶
Tested on: 2026-01-18
Basic JSON Extraction Test¶
We tested 16 Ollama LLM models for basic JSON extraction capability. This test uses a simple entity extraction prompt to evaluate JSON output quality.
Passed (15 models)¶
| Model | Entities | Relationships | Response Time |
|---|---|---|---|
| Deepseek-r1:8b | 5 | 4 | 3,164ms |
| mistral:instruct | 4 | 3 | 3,210ms |
| tulu3:latest | 4 | 2 | 3,742ms |
| llama3.1:latest | 4 | 1 | 3,837ms |
| mistral:latest | 3 | 1 | 4,257ms |
| phi4:latest | 4 | 3 | 5,459ms |
| qwen3-coder:latest | 5 | 4 | 5,852ms |
| Deepseek-r1:latest | 5 | 4 | 5,896ms |
| deepseek-coder-v2:latest | 4 | 2 | 6,243ms |
| gemma2:9b | 5 | 2 | 6,802ms |
| dolphin-mistral | 5 | 1 | 6,844ms |
| phi3:medium | 5 | 2 | 7,993ms |
| Qwen3:latest | 5 | 3 | 9,581ms |
| codestral:latest | 4 | 3 | 10,108ms |
| Qwen3:8b | 5 | 3 | 16,005ms |
Failed (1 model)¶
| Model | Reason |
|---|---|
| Llama3.2:latest | Truncated JSON response |
Important Caveats¶
Basic Test vs Real Usage: The test above uses a simplified entity extraction prompt. Graphiti uses more complex Pydantic schemas with specific field requirements. Models that pass the basic test may still fail with Graphiti's actual schemas.
Observed Issues in Production:
| Model | Basic Test | Graphiti Production | Issue |
|---|---|---|---|
| Llama3.2:latest | ❌ | ❌ | Truncated responses |
| Deepseek-r1:8b | ✅ | ❌ | ValidationError on NodeResolutions - outputs schema instead of data |
| Deepseek-r1:latest | ✅ | ❌ | ValidationError on NodeResolutions |
| Mistral | ✅ | ❌ | Malformed JSON on ExtractedEdges |
Latest Test (2026-01-18): Deepseek-r1:8b tested with actual Graphiti schemas:
Error processing queued episode: 1 validation error for NodeResolutions
entity_resolutions
Field required [type=missing, input_value={'$defs': {'NodeDuplicate...
The model outputs JSON schema definitions instead of data conforming to the schema.
Recommendation: For production LLM use, stick with OpenAI models (gpt-4o-mini, gpt-4o) which reliably produce valid JSON matching Graphiti's Pydantic schemas.
Embedding Models¶
Performance Comparison (Tested 2026-01-18)¶
| Rank | Model | Quality | Speed | Dimensions |
|---|---|---|---|---|
| 🥇 | mxbai-embed-large | 77.0% | 156ms | 1024 |
| 🥈 | nomic-embed-text-v2-moe | 76.4% | 2507ms | 768 |
| 🥉 | embeddinggemma | 75.8% | 384ms | 768 |
| 4 | qwen3-embedding:0.6b | 73.3% | 312ms | 1024 |
| 5 | nomic-embed-text | 66.0% | 426ms | 768 |
Quality Score: Based on semantic similarity tests (higher = better at distinguishing similar vs dissimilar content)
Recommended Configuration¶
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Why mxbai-embed-large?
- Fastest response time (156ms avg)
- Highest semantic quality (77.0%)
- Higher dimensions (1024) capture more nuance
Ollama vs OpenAI Embeddings (Tested 2026-01-18)¶
Direct comparison between Ollama mxbai-embed-large and OpenAI text-embedding-3-small:
| Test Case | Expected | Ollama | OpenAI | Winner |
|---|---|---|---|---|
| Cat/Feline synonyms | high | 80.8% | 75.7% | Ollama |
| Job title synonyms | high | 93.5% | 87.1% | Ollama |
| Weather vs Programming | low | 43.4% | 22.7% | OpenAI |
| ML/DL related | high | 73.8% | 79.9% | OpenAI |
| Different countries | medium | 61.7% | 57.9% | Tie |
| Finance vs Personal | low | 28.5% | 19.9% | OpenAI |
Summary:
| Metric | Ollama mxbai-embed-large | OpenAI text-embedding-3-small |
|---|---|---|
| Dimensions | 1024 | 1536 |
| Avg response time | ~21ms | ~610ms |
| Quality wins | 3 | 2 |
| Cost | Free | ~$0.02/1M tokens |
Verdict: Ollama mxbai-embed-large wins on quality (3-2), is 29x faster, and is completely free. This makes the hybrid configuration (OpenAI LLM + Ollama embeddings) the clear choice.
Full Ollama Configuration (Experimental)¶
If you want to run both LLM and embeddings on Ollama (completely free, no cloud costs):
# LLM Configuration (Ollama)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=mistral:instruct
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=http://your-ollama-server:11434/v1
# Embedder Configuration (Ollama)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Warning: Full Ollama mode may have reliability issues with entity extraction due to JSON output formatting. Monitor logs for validation errors.
Best Models for Full Ollama Mode¶
Based on our testing, if you must use Ollama for LLM, try these in order:
- mistral:instruct - Fast (3.2s), good JSON compliance
- Deepseek-r1:8b - Fast (3.1s), extracts more relationships
- phi4:latest - Good balance of speed and quality
- qwen3-coder:latest - Thorough extraction, slower
Running the Model Test¶
To test models on your Ollama server:
Results are saved to test-results.json.
Troubleshooting¶
"Using Embedder provider: openai" in logs¶
This is expected. The provider label indicates the API format (OpenAI-compatible), not the actual service. Check the HTTP request logs to verify the actual endpoint:
POST http://your-server:11434/v1/embeddings "HTTP/1.1 200 OK" # Ollama
POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" # OpenAI
ValidationError on entity extraction¶
Local models may produce JSON that doesn't match Graphiti's exact Pydantic schema requirements. Solutions:
- Switch to OpenAI for LLM (recommended)
- Try a different local model
- Use hybrid mode (OpenAI LLM + Ollama embeddings)
Model not found¶
Ensure the model is pulled on your Ollama server:
Connection refused¶
Check that Ollama is running and accessible:
OpenAI-Compatible Cloud Providers¶
In addition to Ollama (local), the Madeinoz patch supports OpenAI-compatible cloud providers:
Supported Providers¶
| Provider | Base URL | Best For |
|---|---|---|
| OpenRouter | https://openrouter.ai/api/v1 |
Access to 200+ models (Claude, GPT-4, Llama) |
| Together AI | https://api.together.xyz/v1 |
Fast Llama inference |
| Fireworks AI | https://api.fireworks.ai/inference/v1 |
Low latency |
| DeepInfra | https://api.deepinfra.com/v1/openai |
Serverless GPUs |
Configuration Example (OpenRouter)¶
# LLM: OpenRouter
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=anthropic/claude-3.5-sonnet
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=sk-or-v1-your-openrouter-key
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
# Embedder: Ollama (free, local)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
This configuration:
- Uses OpenRouter for LLM (access to Claude, GPT-4, etc.)
- Uses Ollama for embeddings (free, runs locally)
- Gives you flexibility to choose any model on OpenRouter
Running the Interactive Installer¶
The easiest way to configure OpenAI-compatible providers is through the interactive installer:
The installer will:
- Ask you to select "OpenAI-compatible (OpenRouter, Together, etc.)"
- Let you choose a specific provider
- Prompt for the API key
- Offer Ollama or OpenAI for embeddings
- Let you select models from the provider's catalog