This guide covers setting up the LLM and embedding models for SEA-Forge™.
1
2
3
4
5
6
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull llama3.2 # Chat/completion model
ollama pull embeddinggemma:300m # Embedding model (recommended)
The default .env configuration uses Ollama’s native models:
1
2
3
OLLAMA_BASE_URL=http://localhost:11434
LLM_DEFAULT_MODEL=ollama/llama3.2
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m
For systems with limited RAM/VRAM, use the Q8_0 quantized embedding model:
1
just ollama-import-gguf
This automatically downloads and imports unsloth/EmbeddingGemma-300M-GGUF Q8_0.
If you already have the GGUF file:
1
2
3
4
5
6
7
8
9
10
11
12
13
# 1. Navigate to your GGUF location
cd models/models
# 2. Create a Modelfile
cat > Modelfile.embeddinggemma <<EOF
FROM ./embeddinggemma-300M-Q8_0.gguf
EOF
# 3. Import into Ollama
ollama create embeddinggemma:300m-q8_0 -f Modelfile.embeddinggemma
# 4. Verify
ollama list | grep embeddinggemma
1
2
3
4
5
# Download the GGUF file
wget -P models/models \
"https://huggingface.co/unsloth/EmbeddingGemma-300M-GGUF/resolve/main/EmbeddingGemma-300M-Q8_0.gguf"
# Then follow Option 2 steps
After importing the Q8_0 model, update your .env:
1
2
3
4
5
# Default (Ollama native - higher quality)
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m
# Override for low-memory systems
LLM_DEFAULT_EMBEDDING_MODEL=embeddinggemma:300m-q8_0
| Model | Dimensions | Size | Use Case |
|---|---|---|---|
embeddinggemma:300m |
384 | ~600MB | Default, highest quality |
embeddinggemma:300m-q8_0 |
384 | ~300MB | Low-memory systems |
llama3.2 |
N/A | ~2GB | Chat/completion |
Note: Both embedding models output 384 dimensions, compatible with
PGVECTOR_DIM=384.
1
2
3
4
5
6
7
8
# Check Ollama is running
curl http://localhost:11434/api/tags
# List available models
ollama list
# Test embedding model
ollama run embeddinggemma:300m-q8_0 "Test embedding"
1
ollama serve &
1
2
3
# Re-import the model
cd models/models
ollama create embeddinggemma:300m-q8_0 -f Modelfile.embeddinggemma
1
2
3
4
models/
└── models/
├── embeddinggemma-300M-Q8_0.gguf # Downloaded GGUF file
└── Modelfile.embeddinggemma # Ollama import config
See Also: