Ollama Model Compatibility Guide¶
This guide documents which Ollama models work with the Madeinoz Knowledge System for entity extraction and embeddings.
Architecture Overview¶
The knowledge system uses two types of AI models:
| Component | Purpose | Requirements |
|---|---|---|
| LLM | Entity extraction, relationship detection | Must output valid JSON matching Pydantic schemas |
| Embedder | Vector embeddings for semantic search | Must support /v1/embeddings endpoint |
Recommended Configuration¶
For cost savings while maintaining reliability, we recommend a hybrid setup:
Option 1: Free Trinity Models (Recommended for cost savings)¶
Option 1a: Trinity Large Preview (More detailed extraction)
# LLM: Trinity Large Preview (FREE, passes all tests)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=arcee-ai/trinity-large-preview:free
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key
# Embedder: Ollama (free local embeddings)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Option 1b: Trinity Mini (Faster processing)
# LLM: Trinity Mini (FREE, faster processing ~16s)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=arcee-ai/trinity-mini:free
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key
# Embedder: Ollama (free local embeddings)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Both configurations:
- Use Trinity models via OpenRouter for LLM (completely free)
- Successfully pass all Graphiti entity extraction tests
- Use Ollama for embeddings (free, runs locally)
- Zero cloud LLM costs while maintaining extraction quality
Trinity Large Preview: More detailed entity extraction, ~25s processing time Trinity Mini: Faster processing ~16s, good quality extraction
Option 2: Gemini Flash (Fast, reliable, Cost effective - RETIRING MARCH 2026)¶
# LLM: gemini-2.0-Flash (reliable JSON output for entity extraction)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=google/gemini-2.0-flash-001
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key
# Embedder: Ollama (free local embeddings)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
This configuration:
- Uses gemini-2.0-flash-001 via OpenRouter for LLM (reliable and cost effective)
- WARNING: Gemini will be RETIRED in March 2026 - use Trinity as alternative
- Note:
LLM_PROVIDER=openaibecause OpenRouter is OpenAI-compatible - Uses Ollama for embeddings (free, runs locally)
- Reduces cloud costs while maintaining extraction quality
Cost Comparison¶
| Configuration | LLM Cost | Embedding Cost | Total |
|---|---|---|---|
| Full OpenAI | ~$0.15/1M tokens | ~$0.02/1M tokens | $$$ |
| Hybrid (recommended) | Free | Free | Free |
| Gemini Flash | ~$0.15/1M tokens | Free | $$ |
| Full Ollama | Free | Free | Free* |
*Full Ollama has reliability trade-offs for entity extraction.
Model Test Results¶
Tested on: 2026-01-18
Basic JSON Extraction Test¶
We tested 16 Ollama LLM models for basic JSON extraction capability. This test uses a simple entity extraction prompt to evaluate JSON output quality.
Passed (15 models)¶
| Model | Entities | Relationships | Response Time |
|---|---|---|---|
| Deepseek-r1:8b | 5 | 4 | 3,164ms |
| mistral:instruct | 4 | 3 | 3,210ms |
| tulu3:latest | 4 | 2 | 3,742ms |
| llama3.1:latest | 4 | 1 | 3,837ms |
| mistral:latest | 3 | 1 | 4,257ms |
| phi4:latest | 4 | 3 | 5,459ms |
| qwen3-coder:latest | 5 | 4 | 5,852ms |
| Deepseek-r1:latest | 5 | 4 | 5,896ms |
| deepseek-coder-v2:latest | 4 | 2 | 6,243ms |
| gemma2:9b | 5 | 2 | 6,802ms |
| dolphin-mistral | 5 | 1 | 6,844ms |
| phi3:medium | 5 | 2 | 7,993ms |
| Qwen3:latest | 5 | 3 | 9,581ms |
| codestral:latest | 4 | 3 | 10,108ms |
| Qwen3:8b | 5 | 3 | 16,005ms |
Failed (1 model)¶
| Model | Reason |
|---|---|
| Llama3.2:latest | Truncated JSON response |
Important Caveats¶
Basic Test vs Real Usage: The test above uses a simplified entity extraction prompt. Graphiti uses more complex Pydantic schemas with specific field requirements. Models that pass the basic test may still fail with Graphiti's actual schemas.
Observed Issues in Production:
| Model | Basic Test | Graphiti Production | Issue |
|---|---|---|---|
| Llama3.2:latest | ❌ | ❌ | Truncated responses |
| Deepseek-r1:8b | ✅ | ❌ | ValidationError on NodeResolutions - outputs schema instead of data |
| Deepseek-r1:latest | ✅ | ❌ | ValidationError on NodeResolutions |
| Mistral | ✅ | ❌ | Malformed JSON on ExtractedEdges |
Latest Test (2026-01-18): Deepseek-r1:8b tested with actual Graphiti schemas:
Error processing queued episode: 1 validation error for NodeResolutions
entity_resolutions
Field required [type=missing, input_value={'$defs': {'NodeDuplicate...
The model outputs JSON schema definitions instead of data conforming to the schema.
Recommendation: For production LLM use, choose from these tested options:
- Free:
arcee-ai/trinity-large-preview:free(passes all tests) Low-cost:google/gemini-2.0-flash-001(fast, reliable - but RETIRING March 2026) - Premium: OpenAI models (gpt-4o-mini, gpt-4o) which reliably produce valid JSON matching Graphiti's Pydantic schemas
Embedding Models¶
Performance Comparison (Tested 2026-01-18)¶
| Rank | Model | Quality | Speed | Dimensions |
|---|---|---|---|---|
| 🥇 | mxbai-embed-large | 77.0% | 156ms | 1024 |
| 🥈 | nomic-embed-text-v2-moe | 76.4% | 2507ms | 768 |
| 🥉 | embeddinggemma | 75.8% | 384ms | 768 |
| 4 | qwen3-embedding:0.6b | 73.3% | 312ms | 1024 |
| 5 | nomic-embed-text | 66.0% | 426ms | 768 |
Quality Score: Based on semantic similarity tests (higher = better at distinguishing similar vs dissimilar content)
Recommended Configuration¶
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Why mxbai-embed-large?
- Fastest response time (156ms avg)
- Highest semantic quality (77.0%)
- Higher dimensions (1024) capture more nuance
Ollama vs OpenAI Embeddings (Tested 2026-01-18)¶
Direct comparison between Ollama mxbai-embed-large and OpenAI text-embedding-3-small:
| Test Case | Expected | Ollama | OpenAI | Winner |
|---|---|---|---|---|
| Cat/Feline synonyms | high | 80.8% | 75.7% | Ollama |
| Job title synonyms | high | 93.5% | 87.1% | Ollama |
| Weather vs Programming | low | 43.4% | 22.7% | OpenAI |
| ML/DL related | high | 73.8% | 79.9% | OpenAI |
| Different countries | medium | 61.7% | 57.9% | Tie |
| Finance vs Personal | low | 28.5% | 19.9% | OpenAI |
Summary:
| Metric | Ollama mxbai-embed-large | OpenAI text-embedding-3-small |
|---|---|---|
| Dimensions | 1024 | 1536 |
| Avg response time | ~21ms | ~610ms |
| Quality wins | 3 | 2 |
| Cost | Free | ~$0.02/1M tokens |
Verdict: Ollama mxbai-embed-large wins on quality (3-2), is 29x faster, and is completely free. This makes the hybrid configuration (OpenAI LLM + Ollama embeddings) the clear choice.
Full Ollama Configuration (Experimental)¶
If you want to run both LLM and embeddings on Ollama (completely free, no cloud costs):
# LLM Configuration (Ollama)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=mistral:instruct
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=http://your-ollama-server:11434/v1
# Embedder Configuration (Ollama)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
Warning: Full Ollama mode may have reliability issues with entity extraction due to JSON output formatting. Monitor logs for validation errors.
Best Models for Full Ollama Mode¶
Based on our testing, if you must use Ollama for LLM, try these in order:
- mistral:instruct - Fast (3.2s), good JSON compliance
- Deepseek-r1:8b - Fast (3.1s), extracts more relationships
- phi4:latest - Good balance of speed and quality
- qwen3-coder:latest - Thorough extraction, slower
Running the Model Test¶
To test models on your Ollama server:
Results are saved to test-results.json.
Troubleshooting¶
"Using Embedder provider: openai" in logs¶
This is expected. The provider label indicates the API format (OpenAI-compatible), not the actual service. Check the HTTP request logs to verify the actual endpoint:
POST http://your-server:11434/v1/embeddings "HTTP/1.1 200 OK" # Ollama
POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" # OpenAI
ValidationError on entity extraction¶
Local models may produce JSON that doesn't match Graphiti's exact Pydantic schema requirements. Solutions:
- Switch to OpenAI for LLM (recommended)
- Try a different local model
- Use hybrid mode (OpenAI LLM + Ollama embeddings)
Model not found¶
Ensure the model is pulled on your Ollama server:
Connection refused¶
Check that Ollama is running and accessible:
OpenAI-Compatible Cloud Providers¶
In addition to Ollama (local), the Madeinoz patch supports OpenAI-compatible cloud providers:
Supported Providers¶
| Provider | Base URL | Best For |
|---|---|---|
| OpenRouter | https://openrouter.ai/api/v1 |
Access to 200+ models (Claude, GPT-4, Llama) |
| Together AI | https://api.together.xyz/v1 |
Fast Llama inference |
| Fireworks AI | https://api.fireworks.ai/inference/v1 |
Low latency |
| DeepInfra | https://api.deepinfra.com/v1/openai |
Serverless GPUs |
Configuration Example (OpenRouter)¶
# LLM: OpenRouter
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=anthropic/claude-3.5-sonnet
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=sk-or-v1-your-openrouter-key
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
# Embedder: Ollama (free, local)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024
This configuration:
- Uses OpenRouter for LLM (access to Claude, GPT-4, etc.)
- Uses Ollama for embeddings (free, runs locally)
- Gives you flexibility to choose any model on OpenRouter
Free Cloud Models (OpenRouter)¶
Several free models on OpenRouter work correctly with Graphiti. These are excellent options for cost-conscious users.
Tested Free Models¶
| Model | Status | Notes | Test Date |
|---|---|---|---|
| arcee-ai/trinity-large-preview:free | ✅ PASSING | Reliable JSON output, recommended | 2026-02-03 |
| google/gemini-2.0-flash-001 | ✅ PASSING | Fast, reliable, but RETIRING March 2026 | 2026-02-03 |
| z-ai/glm-4.5-air:free | ❌ FAILING | ValidationError on ExtractedEntities | 2026-02-03 |
Trinity Model Test Results¶
Model: arcee-ai/trinity-large-preview:free (via OpenRouter)
Test Results (2026-02-03):
| Test | Result | Details |
|---|---|---|
| Episode processing | ✅ PASS | All episodes processed successfully |
| Entity extraction | ✅ PASS | Proper JSON schema compliance |
| Relationship extraction | ✅ PASS | Entities and relationships extracted correctly |
| Validation errors | ✅ NONE | No ValidationError for ExtractedEntities |
Example Test Episodes:
- SQL Databases → Extracted: "SQL", "tables", "primary keys", "foreign keys"
- GraphQL → Extracted: "GraphQL", "query language", "strongly typed schemas"
- Black Holes → Extracted: "Black holes", "event horizon", "Sagittarius A*"
Configuration:
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=arcee-ai/trinity-large-preview:free
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key
Advantages:
- Completely free - no LLM costs
- Reliable entity extraction
- No JSON validation errors
- Works with OpenRouter's free tier
Comparison with Paid Models:
| Model | Cost | Speed | Quality |
|---|---|---|---|
| Trinity (free) | $0 | Medium | Good |
| Gemini 2.0 Flash | ~$0.07/1M tokens | Fast | Excellent |
| GPT-4o-mini | ~$0.15/1M tokens | Fast | Excellent |
Failing Free Models¶
GLM Models (z-ai/glm-4.5-air:free and variants)
These models fail with Graphiti due to JSON output issues:
Issue: GLM models return text or malformed JSON instead of the structured Pydantic schema required by Graphiti.
Workaround: Use Trinity, Gemini, or paid models (GPT-4o-mini, Claude) instead.
Avoid GLM Models
Do not use z-ai/glm-* models with this knowledge system. They consistently fail entity extraction due to incompatible JSON output.
Running the Interactive Installer¶
The easiest way to configure OpenAI-compatible providers is through the interactive installer:
The installer will:
- Ask you to select "OpenAI-compatible (OpenRouter, Together, etc.)"
- Let you choose a specific provider
- Prompt for the API key
- Offer Ollama or OpenAI for embeddings
- Let you select models from the provider's catalog