Skip to content

Structured outputs with Ollama, a complete guide w/ instructor

This guide demonstrates how to use Ollama with Instructor to generate structured outputs. You'll learn how to use JSON schema mode with local LLMs to create type-safe responses.

Open-source LLMS are gaining popularity, and the release of Ollama's OpenAI compatibility later it has made it possible to obtain structured outputs using JSON schema.

By the end of this blog post, you will learn how to effectively utilize instructor with ollama. But before we proceed, let's first explore the concept of patching.

Patching

Instructor's patch enhances a openai api it with the following features:

  • response_model in create calls that returns a pydantic model
  • max_retries in create calls that retries the call if it fails by using a backoff strategy
  • timeout parameter for controlling total retry duration (especially important for Ollama)

Learn More

To learn more, please refer to the docs. To understand the benefits of using Pydantic with Instructor, visit the tips and tricks section of the why use Pydantic page.

Timeout Handling with Ollama

Ollama integration now properly supports timeout parameters to ensure reliable request handling:

from openai import OpenAI
from pydantic import BaseModel
import instructor

class Character(BaseModel):
    name: str
    age: int

client = instructor.from_openai(
    OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama",  # required, but unused
    ),
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="llama2",
    messages=[
        {
            "role": "user",
            "content": "Tell me about Harry Potter",
        }
    ],
    response_model=Character,
    max_retries=2,
    timeout=10.0,  # Total timeout across all retry attempts
)

The timeout parameter ensures that:

  • Total timeout control: Limits the total time spent across all retry attempts, not per individual attempt
  • Ollama compatibility: Prevents timeout issues where retries would multiply the total wait time
  • Predictable behavior: A 3-second timeout stays 3 seconds total, not 9+ seconds when retrying

Timeout Best Practices

When using Ollama, especially with larger models, set appropriate timeout values based on your model's response time. The timeout applies to the total retry duration, making response times more predictable.

Ollama

Start by downloading Ollama, and then pull a model such as Llama 2 or Mistral.

Make sure you update your ollama to the latest version!

ollama pull llama2

Quick Start with Auto Client

You can use Ollama with Instructor's auto client for a simple setup:

import instructor
from pydantic import BaseModel

class Character(BaseModel):
    name: str
    age: int

# Simple setup - automatically configured for Ollama
client = instructor.from_provider("ollama/llama2")

resp = client.chat.completions.create(
    messages=[{"role": "user", "content": "Tell me about Harry Potter"}],
    response_model=Character,
)

Intelligent Mode Selection

The auto client automatically selects the best mode based on your model:

  • Function Calling Models (llama3.1, llama3.2, llama4, mistral-nemo, qwen2.5, etc.): Uses TOOLS mode for enhanced function calling support
  • Other Models: Uses JSON mode for structured output
# These models automatically use TOOLS mode
client = instructor.from_provider("ollama/llama3.1")
client = instructor.from_provider("ollama/qwen2.5")

# Other models use JSON mode
client = instructor.from_provider("ollama/llama2")

You can also override the mode manually:

import instructor

# Force JSON mode
client = instructor.from_provider("ollama/llama3.1", mode=instructor.Mode.JSON)

# Force TOOLS mode  
client = instructor.from_provider("ollama/llama2", mode=instructor.Mode.TOOLS)

Manual Setup

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List

import instructor


class Character(BaseModel):
    name: str
    age: int
    fact: List[str] = Field(..., description="A list of facts about the character")


# enables `response_model` in create call
client = instructor.from_openai(
    OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama",  # required, but unused
    ),
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="llama2",
    messages=[
        {
            "role": "user",
            "content": "Tell me about the Harry Potter",
        }
    ],
    response_model=Character,
)
print(resp.model_dump_json(indent=2))
"""
{
  "name": "Harry James Potter",
  "age": 37,
  "fact": [
    "He is the chosen one.",
    "He has a lightning-shaped scar on his forehead.",
    "He is the son of James and Lily Potter.",
    "He attended Hogwarts School of Witchcraft and Wizardry.",
    "He is a skilled wizard and sorcerer.",
    "He fought against Lord Voldemort and his followers.",
    "He has a pet owl named Snowy."
  ]
}
"""