Ollama Model Setup Guide

This guide covers setting up the LLM and embedding models for SEA-Forge™.

Quick Start

# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull required models
ollama pull llama3.2              # Chat/completion model
ollama pull embeddinggemma:300m   # Embedding model (recommended)

Default Configuration

The default .env configuration uses Ollama’s native models:

OLLAMA_BASE_URL=http://localhost:11434
LLM_DEFAULT_MODEL=ollama/llama3.2
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m

Low-Memory Systems (Q8_0 Quantized)

For systems with limited RAM/VRAM, use the Q8_0 quantized embedding model:

Option 1: Use the Just Recipe (Recommended)

just ollama-import-gguf

This automatically downloads and imports unsloth/EmbeddingGemma-300M-GGUF Q8_0.

Option 2: Manual Import

If you already have the GGUF file:

# 1. Navigate to your GGUF location
cd models/models

# 2. Create a Modelfile
cat > Modelfile.embeddinggemma <<EOF
FROM ./embeddinggemma-300M-Q8_0.gguf
EOF

# 3. Import into Ollama
ollama create embeddinggemma:300m-q8_0 -f Modelfile.embeddinggemma

# 4. Verify
ollama list | grep embeddinggemma

Option 3: Download from HuggingFace

# Download the GGUF file
wget -P models/models \
  "https://huggingface.co/unsloth/EmbeddingGemma-300M-GGUF/resolve/main/EmbeddingGemma-300M-Q8_0.gguf"

# Then follow Option 2 steps

Configure Your .env

After importing the Q8_0 model, update your .env:

# Default (Ollama native - higher quality)
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m

# Override for low-memory systems
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m-q8_0

Model Specifications

Model	Dimensions	Size	Use Case
`embeddinggemma:300m`	384	~600MB	Default, highest quality
`embeddinggemma:300m-q8_0`	384	~300MB	Low-memory systems
`llama3.2`	N/A	~2GB	Chat/completion

Note: Both embedding models output 384 dimensions, compatible with PGVECTOR_DIM=384.

Verify Setup

# Check Ollama is running
curl http://localhost:11434/api/tags

# List available models
ollama list

# Test embedding model
ollama run embeddinggemma:300m-q8_0 "Test embedding"

Troubleshooting

Ollama not running

ollama serve &

Model not found

# Re-import the model
cd models/models
ollama create embeddinggemma:300m-q8_0 -f Modelfile.embeddinggemma

Out of memory

Use the Q8_0 quantized version (smaller footprint)
Close other applications
Consider using a smaller chat model

File Locations

models/
└── models/
    ├── embeddinggemma-300M-Q8_0.gguf    # Downloaded GGUF file
    └── Modelfile.embeddinggemma          # Ollama import config

See Also: