Ollama Model Compatibility Guide¶

This guide documents which Ollama models work with the Madeinoz Knowledge System for entity extraction and embeddings.

Architecture Overview¶

The knowledge system uses two types of AI models:

Component	Purpose	Requirements
LLM	Entity extraction, relationship detection	Must output valid JSON matching Pydantic schemas
Embedder	Vector embeddings for semantic search	Must support `/v1/embeddings` endpoint

Recommended Configuration¶

For cost savings while maintaining reliability, we recommend a hybrid setup:

# LLM: gemini-2.0-Flash (reliable JSON output for entity extraction)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=google/gemini-2.0-flash-001
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=your-openrouter-api-key

# Embedder: Ollama (free local embeddings)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024

This configuration:

Uses gemini-2.0-flash-001 via OpenRouter for LLM (reliable and cost effective)
Note: LLM_PROVIDER=openai because OpenRouter is OpenAI-compatible
Uses Ollama for embeddings (free, runs locally)
Reduces cloud costs while maintaining extraction quality

Cost Comparison¶

Configuration	LLM Cost	Embedding Cost	Total
Full OpenAI	~$0.15/1M tokens	~$0.02/1M tokens	$$$
Hybrid (recommended)	~$0.15/1M tokens	Free	$$
Full Ollama	Free	Free	Free*

*Full Ollama has reliability trade-offs for entity extraction.

Model Test Results¶

Tested on: 2026-01-18

Basic JSON Extraction Test¶

We tested 16 Ollama LLM models for basic JSON extraction capability. This test uses a simple entity extraction prompt to evaluate JSON output quality.

Passed (15 models)¶

Model	Entities	Relationships	Response Time
Deepseek-r1:8b	5	4	3,164ms
mistral:instruct	4	3	3,210ms
tulu3:latest	4	2	3,742ms
llama3.1:latest	4	1	3,837ms
mistral:latest	3	1	4,257ms
phi4:latest	4	3	5,459ms
qwen3-coder:latest	5	4	5,852ms
Deepseek-r1:latest	5	4	5,896ms
deepseek-coder-v2:latest	4	2	6,243ms
gemma2:9b	5	2	6,802ms
dolphin-mistral	5	1	6,844ms
phi3:medium	5	2	7,993ms
Qwen3:latest	5	3	9,581ms
codestral:latest	4	3	10,108ms
Qwen3:8b	5	3	16,005ms

Failed (1 model)¶

Model	Reason
Llama3.2:latest	Truncated JSON response

Important Caveats¶

Basic Test vs Real Usage: The test above uses a simplified entity extraction prompt. Graphiti uses more complex Pydantic schemas with specific field requirements. Models that pass the basic test may still fail with Graphiti's actual schemas.

Observed Issues in Production:

Model	Basic Test	Graphiti Production	Issue
Llama3.2:latest	❌	❌	Truncated responses
Deepseek-r1:8b	✅	❌	ValidationError on NodeResolutions - outputs schema instead of data
Deepseek-r1:latest	✅	❌	ValidationError on NodeResolutions
Mistral	✅	❌	Malformed JSON on ExtractedEdges

Latest Test (2026-01-18): Deepseek-r1:8b tested with actual Graphiti schemas:

Error processing queued episode: 1 validation error for NodeResolutions
entity_resolutions
  Field required [type=missing, input_value={'$defs': {'NodeDuplicate...

The model outputs JSON schema definitions instead of data conforming to the schema.

Recommendation: For production LLM use, stick with OpenAI models (gpt-4o-mini, gpt-4o) which reliably produce valid JSON matching Graphiti's Pydantic schemas.

Embedding Models¶

Performance Comparison (Tested 2026-01-18)¶

Rank	Model	Quality	Speed	Dimensions
🥇	mxbai-embed-large	77.0%	156ms	1024
🥈	nomic-embed-text-v2-moe	76.4%	2507ms	768
🥉	embeddinggemma	75.8%	384ms	768
4	qwen3-embedding:0.6b	73.3%	312ms	1024
5	nomic-embed-text	66.0%	426ms	768

Quality Score: Based on semantic similarity tests (higher = better at distinguishing similar vs dissimilar content)

Recommended Configuration¶

MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024

Why mxbai-embed-large?

Fastest response time (156ms avg)
Highest semantic quality (77.0%)
Higher dimensions (1024) capture more nuance

Ollama vs OpenAI Embeddings (Tested 2026-01-18)¶

Direct comparison between Ollama mxbai-embed-large and OpenAI text-embedding-3-small:

Test Case	Expected	Ollama	OpenAI	Winner
Cat/Feline synonyms	high	80.8%	75.7%	Ollama
Job title synonyms	high	93.5%	87.1%	Ollama
Weather vs Programming	low	43.4%	22.7%	OpenAI
ML/DL related	high	73.8%	79.9%	OpenAI
Different countries	medium	61.7%	57.9%	Tie
Finance vs Personal	low	28.5%	19.9%	OpenAI

Summary:

Metric	Ollama mxbai-embed-large	OpenAI text-embedding-3-small
Dimensions	1024	1536
Avg response time	~21ms	~610ms
Quality wins	3	2
Cost	Free	~$0.02/1M tokens

Verdict: Ollama mxbai-embed-large wins on quality (3-2), is 29x faster, and is completely free. This makes the hybrid configuration (OpenAI LLM + Ollama embeddings) the clear choice.

Full Ollama Configuration (Experimental)¶

If you want to run both LLM and embeddings on Ollama (completely free, no cloud costs):

# LLM Configuration (Ollama)
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=mistral:instruct
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=http://your-ollama-server:11434/v1

# Embedder Configuration (Ollama)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024

Warning: Full Ollama mode may have reliability issues with entity extraction due to JSON output formatting. Monitor logs for validation errors.

Best Models for Full Ollama Mode¶

Based on our testing, if you must use Ollama for LLM, try these in order:

mistral:instruct - Fast (3.2s), good JSON compliance
Deepseek-r1:8b - Fast (3.1s), extracts more relationships
phi4:latest - Good balance of speed and quality
qwen3-coder:latest - Thorough extraction, slower

Running the Model Test¶

To test models on your Ollama server:

cd src/server
bun test-ollama-models.ts

Results are saved to test-results.json.

Troubleshooting¶

"Using Embedder provider: openai" in logs¶

This is expected. The provider label indicates the API format (OpenAI-compatible), not the actual service. Check the HTTP request logs to verify the actual endpoint:

POST http://your-server:11434/v1/embeddings "HTTP/1.1 200 OK"  # Ollama
POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"   # OpenAI

ValidationError on entity extraction¶

Local models may produce JSON that doesn't match Graphiti's exact Pydantic schema requirements. Solutions:

Switch to OpenAI for LLM (recommended)
Try a different local model
Use hybrid mode (OpenAI LLM + Ollama embeddings)

Model not found¶

Ensure the model is pulled on your Ollama server:

ollama pull mistral:instruct
ollama pull mxbai-embed-large

Connection refused¶

Check that Ollama is running and accessible:

curl http://your-ollama-server:11434/api/tags

OpenAI-Compatible Cloud Providers¶

In addition to Ollama (local), the Madeinoz patch supports OpenAI-compatible cloud providers:

Supported Providers¶

Provider	Base URL	Best For
OpenRouter	`https://openrouter.ai/api/v1`	Access to 200+ models (Claude, GPT-4, Llama)
Together AI	`https://api.together.xyz/v1`	Fast Llama inference
Fireworks AI	`https://api.fireworks.ai/inference/v1`	Low latency
DeepInfra	`https://api.deepinfra.com/v1/openai`	Serverless GPUs

Configuration Example (OpenRouter)¶

# LLM: OpenRouter
MADEINOZ_KNOWLEDGE_LLM_PROVIDER=openai
MADEINOZ_KNOWLEDGE_MODEL_NAME=anthropic/claude-3.5-sonnet
MADEINOZ_KNOWLEDGE_OPENAI_API_KEY=sk-or-v1-your-openrouter-key
MADEINOZ_KNOWLEDGE_OPENAI_BASE_URL=https://openrouter.ai/api/v1

# Embedder: Ollama (free, local)
MADEINOZ_KNOWLEDGE_EMBEDDER_PROVIDER=openai
MADEINOZ_KNOWLEDGE_EMBEDDER_BASE_URL=http://your-ollama-server:11434/v1
MADEINOZ_KNOWLEDGE_EMBEDDER_MODEL=mxbai-embed-large
MADEINOZ_KNOWLEDGE_EMBEDDER_DIMENSIONS=1024

This configuration:

Uses OpenRouter for LLM (access to Claude, GPT-4, etc.)
Uses Ollama for embeddings (free, runs locally)
Gives you flexibility to choose any model on OpenRouter

Running the Interactive Installer¶

The easiest way to configure OpenAI-compatible providers is through the interactive installer:

cd src/server
bun run install.ts

The installer will:

Ask you to select "OpenAI-compatible (OpenRouter, Together, etc.)"
Let you choose a specific provider
Prompt for the API key
Offer Ollama or OpenAI for embeddings
Let you select models from the provider's catalog

References¶

Graphiti GitHub Issue #1116 - Ollama compatibility
Ollama API Documentation
OpenAI-compatible endpoints
OpenRouter Documentation
Together AI Documentation
Fireworks AI Documentation
DeepInfra Documentation