DSPy: Architecture and Framework Deep Dive

A comprehensive deep dive into DSPy’s philosophy, architecture, core abstractions (signatures, modules, predictors), optimization system, and how to use it to build reliable, data-driven LLM applications.

Dec 7, 2025
DSPy (Declarative Self-improving Python) is a framework that fundamentally reimagines how we build applications with language models. Instead of crafting prompts by hand, you write structured Python programs that DSPy compiles into optimized prompts, fine-tuned weights, or both. This document provides a comprehensive understanding of DSPy's architecture, components, and how they work together.

The DSPy Philosophy

The Problem with Traditional Prompt Engineering

Traditional LLM application development suffers from several issues:
Traditional Approach: 1. Write a prompt by hand 2. Test it on a few examples 3. Tweak wording when it fails 4. Add more instructions 5. Prompt becomes unwieldy 6. Model changes β†’ prompts break 7. Repeat forever Problems: β”œβ”€β”€ Prompts are brittle (small changes break them) β”œβ”€β”€ No systematic optimization β”œβ”€β”€ Prompt-model coupling (prompts don't transfer) β”œβ”€β”€ Hard to maintain at scale └── Human intuition is the only guide

DSPy's Solution: Programming, Not Prompting

DSPy treats LM interactions as a programming problem, not a prompt-writing problem:
DSPy Approach: 1. Define WHAT you want (signatures) 2. Define HOW to compose (modules) 3. Provide training examples 4. Let DSPy optimize prompts/weights 5. Get compiled, optimized program Benefits: β”œβ”€β”€ Prompts are generated, not hand-written β”œβ”€β”€ Systematic optimization with metrics β”œβ”€β”€ Portable across models β”œβ”€β”€ Modular and maintainable └── Data-driven improvement

The Key Insight

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ THE DSPY INSIGHT β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Traditional: You write prompts, hope they work β”‚ β”‚ β”‚ β”‚ DSPy: You write programs, DSPy writes prompts β”‚ β”‚ β”‚ β”‚ The prompt becomes a COMPILED ARTIFACT, not source code β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Architecture Overview

DSPy's architecture consists of several interconnected layers:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DSPy Architecture β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ APPLICATION LAYER β”‚ β”‚ β”‚ β”‚ Your DSPy Program (Modules composed together) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ ABSTRACTION LAYER β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚Signatures β”‚ β”‚ Modules β”‚ β”‚ Predictors β”‚ β”‚ β”‚ β”‚ β”‚ β”‚(schemas) β”‚ β”‚(logic) β”‚ β”‚ (LM calls) β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ OPTIMIZATION LAYER β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚Telepromp- β”‚ β”‚ Metrics β”‚ β”‚ Assertions β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ters β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ INTEGRATION LAYER β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ LM β”‚ β”‚ Retrieval β”‚ β”‚ Tools β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Adapters β”‚ β”‚ Models β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer Responsibilities

Layer
Purpose
Components
Application
Your business logic
Custom modules, pipelines
Abstraction
Define structure and behavior
Signatures, Modules, Predictors
Optimization
Improve program quality
Teleprompters, Metrics, Assertions
Integration
Connect to external systems
LM adapters, Retrievers, Tools

Core Abstractions

Signatures

Signatures are the foundational abstraction in DSPy. They define theΒ input-output contractΒ for a task without specifying how to accomplish it.

What is a Signature?

class QuestionAnswering(dspy.Signature): """Answer questions based on provided context.""" context = dspy.InputField(desc="Background information") question = dspy.InputField(desc="The question to answer") answer = dspy.OutputField(desc="A concise answer")
A signature specifies:
  • Docstring: Natural language description of the task
  • Input fields: What data goes in
  • Output fields: What data comes out
  • Field descriptions: Hints about each field's purpose

Signature Anatomy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SIGNATURE STRUCTURE β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ class TaskName(dspy.Signature): β”‚ β”‚ """Task description (becomes part of prompt)""" β”‚ β”‚ β”‚ β”‚ # Input fields - data provided to the LM β”‚ β”‚ input1 = dspy.InputField(desc="description") β”‚ β”‚ input2 = dspy.InputField() # desc is optional β”‚ β”‚ β”‚ β”‚ # Output fields - data extracted from LM response β”‚ β”‚ output1 = dspy.OutputField(desc="description") β”‚ β”‚ output2 = dspy.OutputField() β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Inline Signatures

For simple cases, you can define signatures inline as strings:
# Inline signature format: "input1, input2 -> output1, output2"# Simple QA qa = dspy.Predict("question -> answer") # With context rag = dspy.Predict("context, question -> answer") # Multiple outputs analysis = dspy.Predict("text -> sentiment, confidence, keywords")

Why Signatures Matter

Signatures serve as: 1. CONTRACTS └── Define what the module expects and produces 2. DOCUMENTATION └── Self-documenting code via docstrings and field descriptions 3. PROMPT TEMPLATES └── DSPy generates prompts from signature structure 4. TYPE HINTS └── Enable validation and IDE support 5. OPTIMIZATION TARGETS └── Teleprompters know what to optimize based on signatures

Modules

Modules are the building blocks of DSPy programs. Inspired by PyTorch'sΒ nn.Module, they encapsulate logic and can be composed hierarchically.

Basic Module Structure

class MyModule(dspy.Module): def __init__(self): super().__init__() # Initialize sub-modules and predictors self.predictor = dspy.Predict(MySignature) def forward(self, **kwargs): # Define the logic result = self.predictor(**kwargs) return result

Module Composition

Modules can contain other modules, enabling complex pipelines:
class RAGPipeline(dspy.Module): def __init__(self, num_passages=3): super().__init__() self.retrieve = dspy.Retrieve(k=num_passages) self.generate = dspy.ChainOfThought("context, question -> answer") def forward(self, question): # Step 1: Retrieve relevant passages passages = self.retrieve(question).passages context = "\n".join(passages) # Step 2: Generate answer with context answer = self.generate(context=context, question=question) return answer

Module Hierarchy

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MODULE COMPOSITION β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ ComplexPipeline (dspy.Module) β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ QueryExpander (dspy.Module) β”‚ β”‚ β”‚ └── dspy.ChainOfThought("query -> expanded_queries") β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ MultiRetriever (dspy.Module) β”‚ β”‚ β”‚ β”œβ”€β”€ dspy.Retrieve(k=5) # Primary retriever β”‚ β”‚ β”‚ └── dspy.Retrieve(k=3) # Fallback retriever β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Ranker (dspy.Module) β”‚ β”‚ β”‚ └── dspy.Predict("passages, query -> ranked_passages") β”‚ β”‚ β”‚ β”‚ β”‚ └── Generator (dspy.Module) β”‚ β”‚ └── dspy.ChainOfThought("context, query -> answer") β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Predictors

Predictors are the interface between your program and the language model. They take a signature and handle the actual LM invocation.

The Predict Class

dspy.PredictΒ is the basic predictor that directly calls the LM:
class BasicQA(dspy.Module): def __init__(self): super().__init__() # Create a predictor from a signature self.qa = dspy.Predict("context, question -> answer") def forward(self, context, question): # Call the predictor result = self.qa(context=context, question=question) return result.answer

What Predictors Do

When you call a predictor: 1. SIGNATURE β†’ PROMPT └── Convert signature + inputs into a prompt 2. PROMPT β†’ LM └── Send prompt to configured language model 3. LM RESPONSE β†’ PARSING └── Extract output fields from response 4. PARSED β†’ dspy.Prediction └── Return structured prediction object

Built-in Modules

DSPy provides several built-in modules for common patterns:

dspy.Predict

The basic predictor - direct input-to-output mapping:
# Simple prediction predict = dspy.Predict("question -> answer") result = predict(question="What is 2+2?") print(result.answer)# "4"

dspy.ChainOfThought

Adds reasoning before the answer:
# With chain-of-thought reasoning cot = dspy.ChainOfThought("question -> answer") result = cot(question="What is 2+2?") print(result.reasoning)# "I need to add 2 and 2..."print(result.answer)# "4"
How ChainOfThought works:
Standard Predict: Input: question Output: answer ChainOfThought: Input: question Output: reasoning, answer ← reasoning is automatically added The prompt instructs the LM to "think step by step" before answering.

dspy.ChainOfThoughtWithHint

Chain-of-thought with optional hints:
cot_hint = dspy.ChainOfThoughtWithHint("question -> answer") result = cot_hint( question="What is the capital of France?", hint="Think about European geography" )

dspy.ProgramOfThought

Generates and executes code to solve problems:
pot = dspy.ProgramOfThought("question -> answer") result = pot(question="What is 15% of 80?") # LM generates: result = 80 * 0.15# DSPy executes the code# Returns: 12

dspy.ReAct

Combines reasoning and action (tool use):
# Define available toolsdef search(query: str) -> str: """Search the web for information.""" return web_search(query) def calculate(expression: str) -> str: """Evaluate a mathematical expression.""" return str(eval(expression)) # Create ReAct agent react = dspy.ReAct( "question -> answer", tools=[search, calculate] ) result = react(question="What is the population of France times 2?") # LM reasons: "I need to find France's population, then multiply"# LM acts: search("population of France")# LM reasons: "Got 67 million, now multiply by 2"# LM acts: calculate("67000000 * 2")# LM answers: "134,000,000"

dspy.Retrieve

Retrieves relevant passages:
retrieve = dspy.Retrieve(k=5)# Get top 5 passages results = retrieve("What causes rainbows?") for passage in results.passages: print(passage)

dspy.MultiChainComparison

Generates multiple reasoning chains and compares them:
mcc = dspy.MultiChainComparison( "question -> answer", num_chains=3 ) result = mcc(question="Complex reasoning problem...") # Generates 3 independent reasoning chains# Compares them to select best answer

Module Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ BUILT-IN MODULES β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Module β”‚ Description β”‚ β”‚ ────────────────────┼────────────────────────────────── β”‚ β”‚ Predict β”‚ Direct inputβ†’output mapping β”‚ β”‚ ChainOfThought β”‚ Adds reasoning before answer β”‚ β”‚ ChainOfThoughtHint β”‚ CoT with optional hints β”‚ β”‚ ProgramOfThought β”‚ Generates & executes code β”‚ β”‚ ReAct β”‚ Reasoning + tool use β”‚ β”‚ Retrieve β”‚ Retrieves relevant passages β”‚ β”‚ MultiChainComparisonβ”‚ Multiple chains, picks best β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Language Model Integration

DSPy abstracts away the specifics of different LM providers through a unified interface.

Configuring Language Models

import dspy # OpenAI lm = dspy.LM("openai/gpt-4o", api_key="...") # Anthropic lm = dspy.LM("anthropic/claude-3-opus-20240229", api_key="...") # Local models (via Ollama) lm = dspy.LM("ollama/llama3.1") # Azure OpenAI lm = dspy.LM("azure/gpt-4", api_key="...", api_base="...") # Configure globally dspy.configure(lm=lm)

The LM Adapter System

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LM ADAPTER SYSTEM β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Your DSPy Program β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ dspy.LM β”‚ Unified interface β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Provider Adapters β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚OpenAIβ”‚Anthroβ”‚ Azure β”‚ Local β”‚ ... β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β–Ό β”‚ β”‚ [APIs] [APIs] [APIs] [Local] β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LM Configuration Options

lm = dspy.LM( "openai/gpt-4o", api_key="...", temperature=0.7,# Sampling temperature max_tokens=1000,# Max response length top_p=0.9,# Nucleus sampling cache=True,# Cache responses num_retries=3,# Retry on failure )

Multiple LMs

You can configure different LMs for different parts of your program:
# Configure default LM dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # Use a different LM for specific callswith dspy.context(lm=dspy.LM("openai/gpt-4o")): # This uses gpt-4o result = expensive_module(input) # Back to default (gpt-4o-mini) result = cheap_module(input)

Retrieval Integration

DSPy integrates with retrieval systems for RAG (Retrieval-Augmented Generation) applications.

Configuring Retrievers

import dspy # ColBERT v2 rm = dspy.ColBERTv2(url="http://your-colbert-server:8080") # Qdrantfrom dspy.retrieve.qdrant_rm import QdrantRM rm = QdrantRM("collection_name", qdrant_client) # ChromaDBfrom dspy.retrieve.chromadb_rm import ChromadbRM rm = ChromadbRM("collection_name", persist_directory) # Configure globally dspy.configure(rm=rm)

Using Retrieve

class RAGModule(dspy.Module): def __init__(self, k=3): super().__init__() self.retrieve = dspy.Retrieve(k=k) self.generate = dspy.ChainOfThought("context, question -> answer") def forward(self, question): # Retrieve relevant passages retrieved = self.retrieve(question) context = "\n\n".join(retrieved.passages) # Generate answerreturn self.generate(context=context, question=question)

Retrieval Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ RETRIEVAL FLOW β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ 1. Query comes in β”‚ β”‚ "What causes climate change?" β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ 2. dspy.Retrieve(k=3) β”‚ β”‚ β”œβ”€β”€ Encodes query β”‚ β”‚ β”œβ”€β”€ Searches vector store β”‚ β”‚ └── Returns top-k passages β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ 3. Passages returned β”‚ β”‚ β”œβ”€β”€ "Greenhouse gases trap heat..." β”‚ β”‚ β”œβ”€β”€ "CO2 levels have risen 50%..." β”‚ β”‚ └── "Human activities since 1850..." β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ 4. Context provided to LM β”‚ β”‚ dspy.ChainOfThought(context=passages, question=query) β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ 5. Grounded answer generated β”‚ β”‚ "Climate change is caused by greenhouse gases..." β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Optimization System

DSPy's optimization system is what makes it truly powerful. It automatically improves your programs based on data and metrics.

The Optimization Triangle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OPTIMIZATION TRIANGLE β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ TRAINING DATA β”‚ β”‚ β–² β”‚ β”‚ /β”‚\ β”‚ β”‚ / β”‚ \ β”‚ β”‚ / β”‚ \ β”‚ β”‚ / β”‚ \ β”‚ β”‚ / β”‚ \ β”‚ β”‚ / β”‚ \ β”‚ β”‚ / β”‚ \ β”‚ β”‚ β–Ό β”‚ β–Ό β”‚ β”‚ METRIC ◄──────┴──────► PROGRAM β”‚ β”‚ β”‚ β”‚ All three are required for optimization: β”‚ β”‚ - Program: The DSPy modules to optimize β”‚ β”‚ - Metric: How to measure success β”‚ β”‚ - Training data: Examples to learn from β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Defining Metrics

Metrics measure how good a prediction is:
# Simple exact match metricdef exact_match(example, prediction, trace=None): return example.answer.lower() == prediction.answer.lower() # F1 score metricdef f1_metric(example, prediction, trace=None): pred_tokens = set(prediction.answer.lower().split()) gold_tokens = set(example.answer.lower().split()) if not pred_tokens or not gold_tokens: return 0.0 precision = len(pred_tokens & gold_tokens) / len(pred_tokens) recall = len(pred_tokens & gold_tokens) / len(gold_tokens) if precision + recall == 0: return 0.0 return 2 * precision * recall / (precision + recall) # Semantic similarity metricdef semantic_similarity(example, prediction, trace=None): # Use embeddings to compute similarityreturn compute_cosine_similarity(example.answer, prediction.answer) # Composite metricdef composite_metric(example, prediction, trace=None): em = exact_match(example, prediction, trace) f1 = f1_metric(example, prediction, trace) return 0.3 * em + 0.7 * f1

Training Data (Examples)

# Create training examples trainset = [ dspy.Example( question="What is the capital of France?", answer="Paris" ).with_inputs("question"), dspy.Example( question="Who wrote Romeo and Juliet?", answer="William Shakespeare" ).with_inputs("question"), # ... more examples ] # with_inputs() specifies which fields are inputs (rest are outputs)

The Compilation Process

flowchart TD A[Unoptimized Program] --> B[Optimizer/Teleprompter] C[Training Data] --> B D[Metric Function] --> B B --> E[Compilation Process] E --> F{Optimizer Type} F -->|BootstrapFewShot| G[Select few-shot examples] F -->|MIPROv2| H[Optimize instructions + examples] F -->|BootstrapFinetune| I[Generate fine-tuning data] F -->|COPRO| J[Coordinate instruction optimization] G --> K[Compiled Program] H --> K I --> K J --> K

Optimizers (Teleprompters)

Optimizers (historically called "teleprompters") are algorithms that improve DSPy programs. Each optimizer has different strengths.

BootstrapFewShot

The simplest optimizer - selects effective few-shot examples:
from dspy.teleprompt import BootstrapFewShot optimizer = BootstrapFewShot( metric=my_metric, max_bootstrapped_demos=4,# Max examples to include max_labeled_demos=16,# Max labeled examples to try max_rounds=1,# Optimization rounds ) compiled = optimizer.compile( student=my_program, trainset=training_data, )
How it works:
BootstrapFewShot Algorithm: 1. Run program on training examples 2. Collect successful traces (where metric passes) 3. Select diverse, high-quality traces as demonstrations 4. Insert demonstrations into prompt 5. Return compiled program with few-shot examples

BootstrapFewShotWithRandomSearch

Adds random search over example combinations:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch optimizer = BootstrapFewShotWithRandomSearch( metric=my_metric, max_bootstrapped_demos=4, num_candidate_programs=10,# Number of combinations to try num_threads=4,# Parallel evaluation )

MIPROv2

State-of-the-art optimizer that jointly optimizes instructions and examples:
from dspy.teleprompt import MIPROv2 optimizer = MIPROv2( metric=my_metric, num_candidates=10,# Instruction candidates init_temperature=1.0,# Exploration temperature verbose=True, ) compiled = optimizer.compile( student=my_program, trainset=training_data, valset=validation_data,# For evaluation num_trials=30,# Optimization budget )
How MIPROv2 works:
MIPROv2 Algorithm: 1. INSTRUCTION PROPOSAL └── LLM generates candidate instructions based on task 2. EXAMPLE SELECTION └── Bootstrap effective few-shot examples 3. JOINT OPTIMIZATION └── Bayesian optimization over instruction-example space 4. EVALUATION └── Validate on held-out data 5. SELECTION └── Return best configuration found

COPRO

Coordinate Prompt Optimization - optimizes instructions across modules:
from dspy.teleprompt import COPRO optimizer = COPRO( metric=my_metric, depth=3,# Optimization depth breadth=5,# Candidates per iteration )

BootstrapFinetune

Creates fine-tuning data from program traces:
from dspy.teleprompt import BootstrapFinetune optimizer = BootstrapFinetune( metric=my_metric, multitask=True,# Train on multiple tasks ) compiled = optimizer.compile( student=my_program, trainset=training_data, target_model="meta-llama/Llama-3-8B",# Model to fine-tune )

GEPA

Genetic Evolution with Pareto optimization:
from dspy.teleprompt import GEPA optimizer = GEPA( metric=my_metric, num_generations=20, population_size=10, )
(See theΒ GEPA Deep DiveΒ for detailed explanation)

Optimizer Comparison

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OPTIMIZER COMPARISON β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Optimizer β”‚ What it Optimizes β”‚ Sample Cost β”‚ β”‚ ─────────────────────┼──────────────────────┼──────────── β”‚ β”‚ BootstrapFewShot β”‚ Few-shot examples β”‚ Low β”‚ β”‚ BootstrapFewShot+RS β”‚ Example combinations β”‚ Medium β”‚ β”‚ COPRO β”‚ Instructions β”‚ Medium β”‚ β”‚ MIPROv2 β”‚ Instructions+Examplesβ”‚ High β”‚ β”‚ BootstrapFinetune β”‚ Model weights β”‚ Very High β”‚ β”‚ GEPA β”‚ All + Pareto diverse β”‚ High β”‚ β”‚ β”‚ β”‚ Recommended starting point: BootstrapFewShot β”‚ β”‚ For production: MIPROv2 or GEPA β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Assertions and Constraints

DSPy provides assertions to add constraints and self-correction to programs.

dspy.Assert

Hard constraints that must be satisfied:
class ConstrainedQA(dspy.Module): def __init__(self): super().__init__() self.qa = dspy.ChainOfThought("question -> answer") def forward(self, question): result = self.qa(question=question) # Hard constraint: answer must be less than 50 words dspy.Assert( len(result.answer.split()) < 50, "Answer must be concise (under 50 words)" ) return result

dspy.Suggest

Soft constraints that guide but don't halt:
class GuidedQA(dspy.Module): def __init__(self): super().__init__() self.qa = dspy.ChainOfThought("question -> answer") def forward(self, question): result = self.qa(question=question) # Soft constraint: prefer answers with citations dspy.Suggest( "[" in result.answer and "]" in result.answer, "Consider including citations in brackets" ) return result

How Assertions Work

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ASSERTION BEHAVIOR β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ dspy.Assert (Hard Constraint): β”‚ β”‚ β”œβ”€β”€ If constraint fails: β”‚ β”‚ β”‚ β”œβ”€β”€ Add feedback to prompt β”‚ β”‚ β”‚ β”œβ”€β”€ Retry with constraint context β”‚ β”‚ β”‚ └── If still fails after retries: raise exception β”‚ β”‚ └── If constraint passes: continue normally β”‚ β”‚ β”‚ β”‚ dspy.Suggest (Soft Constraint): β”‚ β”‚ β”œβ”€β”€ If constraint fails: β”‚ β”‚ β”‚ β”œβ”€β”€ Log suggestion β”‚ β”‚ β”‚ β”œβ”€β”€ May retry once with hint β”‚ β”‚ β”‚ └── Continue even if still fails β”‚ β”‚ └── If constraint passes: continue normally β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Assertion Configuration

# Configure assertion behavior dspy.configure( assert_max_retries=3,# Max retries for Assert suggest_max_retries=1,# Max retries for Suggest backoff_time=0.5,# Delay between retries )

The Compilation Process

Compilation transforms an unoptimized DSPy program into an optimized one.

Before and After Compilation

BEFORE COMPILATION: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ class QAModule(dspy.Module): β”‚ β”‚ def __init__(self): β”‚ β”‚ self.qa = dspy.ChainOfThought("question -> answer")β”‚ β”‚ β”‚ β”‚ def forward(self, question): β”‚ β”‚ return self.qa(question=question) β”‚ β”‚ β”‚ β”‚ # Prompt is minimal, no examples, generic instruction β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ AFTER COMPILATION: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ # Same code, but internal state is different: β”‚ β”‚ β”‚ β”‚ compiled_qa.qa.demos = [ β”‚ β”‚ # Carefully selected few-shot examples β”‚ β”‚ Example(question="...", reasoning="...", answer="..."),β”‚ β”‚ Example(question="...", reasoning="...", answer="..."),β”‚ β”‚ Example(question="...", reasoning="...", answer="..."),β”‚ β”‚ ] β”‚ β”‚ β”‚ β”‚ compiled_qa.qa.instructions = """ β”‚ β”‚ You are an expert question answering system. Given a β”‚ β”‚ question, think step by step and provide a clear, β”‚ β”‚ accurate answer. Focus on factual accuracy... β”‚ β”‚ """ β”‚ β”‚ β”‚ β”‚ # Prompt is now rich with examples and refined instruction β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What Gets Compiled

Compilation affects: 1. INSTRUCTIONS β”œβ”€β”€ Task descriptions in signatures └── Generated from optimization 2. DEMONSTRATIONS (Few-shot examples) β”œβ”€β”€ Selected from successful traces └── Optimized for diversity and quality 3. FIELD PREFIXES β”œβ”€β”€ How input/output fields are labeled └── Can be optimized for clarity 4. (Optionally) MODEL WEIGHTS └── If using BootstrapFinetune

Saving and Loading Compiled Programs

# Save compiled program compiled.save("my_compiled_program.json") # Load compiled program loaded = MyModule() loaded.load("my_compiled_program.json") # Or load state into existing program my_program.load_state(compiled.dump_state())

Execution Flow

Understanding how DSPy executes a call helps debug and optimize.

Call Flow Diagram

flowchart TD A[User calls module.forward] --> B[Module logic executes] B --> C[Predictor called] C --> D[Build prompt from signature] D --> E{Demos available?} E -->|Yes| F[Add few-shot examples] E -->|No| G[Skip demos] F --> G G --> H[Add current input] H --> I[Call language model] I --> J[Parse LM response] J --> K{Assertions?} K -->|Assert fails| L[Retry with feedback] K -->|Assert passes| M[Return prediction] L --> I M --> N[Continue module logic] N --> O[Return final result]

Trace Inspection

DSPy records traces for debugging:
# Enable tracing dspy.configure(trace=[]) # Run program result = my_program(question="What is AI?") # Inspect tracefor step in dspy.settings.trace: print(f"Module: {step['module']}") print(f"Input: {step['input']}") print(f"Output: {step['output']}") print("---")

Prompt Inspection

# See what prompt was actually sent lm = dspy.LM("openai/gpt-4o", cache=False) dspy.configure(lm=lm) # Enable inspection lm.inspect_history(n=1)# Show last 1 call result = my_program(question="What is AI?") # This prints the full prompt and response

Advanced Patterns

Multi-Stage Pipelines

class MultiStagePipeline(dspy.Module): def __init__(self): super().__init__() self.decompose = dspy.ChainOfThought("question -> sub_questions") self.answer_sub = dspy.ChainOfThought("sub_question -> sub_answer") self.synthesize = dspy.ChainOfThought( "question, sub_answers -> final_answer" ) def forward(self, question): # Stage 1: Decompose question decomposition = self.decompose(question=question) sub_questions = decomposition.sub_questions.split("\n") # Stage 2: Answer each sub-question sub_answers = [] for sq in sub_questions: answer = self.answer_sub(sub_question=sq) sub_answers.append(answer.sub_answer) # Stage 3: Synthesize final answer final = self.synthesize( question=question, sub_answers="\n".join(sub_answers) ) return final

Branching Logic

class BranchingModule(dspy.Module): def __init__(self): super().__init__() self.classifier = dspy.Predict("question -> category") self.factual_qa = dspy.ChainOfThought("question -> answer") self.creative_qa = dspy.ChainOfThought("question -> answer") self.analytical_qa = dspy.ChainOfThought("question -> answer") def forward(self, question): # Classify the question type classification = self.classifier(question=question) # Route to appropriate handlerif classification.category == "factual": return self.factual_qa(question=question) elif classification.category == "creative": return self.creative_qa(question=question) else: return self.analytical_qa(question=question)

Ensemble Patterns

class EnsembleModule(dspy.Module): def __init__(self, n_models=3): super().__init__() self.predictors = [ dspy.ChainOfThought("question -> answer") for _ in range(n_models) ] self.aggregator = dspy.Predict("answers -> best_answer") def forward(self, question): # Get answers from all models answers = [] for predictor in self.predictors: result = predictor(question=question) answers.append(result.answer) # Aggregate combined = "\n".join(f"- {a}" for a in answers) final = self.aggregator(answers=combined) return final

Self-Refinement

class SelfRefiningModule(dspy.Module): def __init__(self, max_iterations=3): super().__init__() self.max_iterations = max_iterations self.generate = dspy.ChainOfThought("question -> answer") self.critique = dspy.Predict("question, answer -> critique, needs_improvement") self.refine = dspy.ChainOfThought("question, answer, critique -> improved_answer") def forward(self, question): # Initial answer result = self.generate(question=question) answer = result.answer # Iterative refinementfor _ in range(self.max_iterations): # Critique current answer critique = self.critique(question=question, answer=answer) if critique.needs_improvement.lower() != "yes": break # Refine based on critique refined = self.refine( question=question, answer=answer, critique=critique.critique ) answer = refined.improved_answer return dspy.Prediction(answer=answer)

Comparison with Traditional Approaches

DSPy vs Manual Prompting

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DSPy vs MANUAL PROMPTING β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Aspect β”‚ Manual Prompting β”‚ DSPy β”‚ β”‚ ────────────────┼──────────────────┼───────────────────── β”‚ β”‚ Prompt creation β”‚ Hand-written β”‚ Auto-generated β”‚ β”‚ Optimization β”‚ Trial and error β”‚ Systematic algorithms β”‚ β”‚ Maintainability β”‚ Difficult β”‚ Modular, structured β”‚ β”‚ Portability β”‚ Model-specific β”‚ Model-agnostic β”‚ β”‚ Reproducibility β”‚ Low β”‚ High (data-driven) β”‚ β”‚ Debugging β”‚ Print statements β”‚ Traces, assertions β”‚ β”‚ Testing β”‚ Ad-hoc β”‚ Metric-based β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

DSPy vs LangChain

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DSPy vs LANGCHAIN β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ Aspect β”‚ LangChain β”‚ DSPy β”‚ β”‚ ────────────────┼──────────────────┼───────────────────── β”‚ β”‚ Philosophy β”‚ Chaining tools β”‚ Programming LMs β”‚ β”‚ Prompts β”‚ Templates β”‚ Compiled from data β”‚ β”‚ Optimization β”‚ Manual β”‚ Automatic β”‚ β”‚ Abstraction β”‚ Chains, agents β”‚ Signatures, modules β”‚ β”‚ Focus β”‚ Orchestration β”‚ Optimization β”‚ β”‚ Learning curve β”‚ Moderate β”‚ Steeper β”‚ β”‚ β”‚ β”‚ They're complementary - LangChain for orchestration, β”‚ β”‚ DSPy for optimization. Can be used together. β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

When to Use DSPy

USE DSPy WHEN: β”œβ”€β”€ You have training data (even small amounts) β”œβ”€β”€ You need reproducible, optimized prompts β”œβ”€β”€ You're building production LM applications β”œβ”€β”€ You want modular, testable code β”œβ”€β”€ Prompt quality matters significantly └── You're willing to invest in the learning curve USE SIMPLER APPROACHES WHEN: β”œβ”€β”€ One-off scripts or experiments β”œβ”€β”€ No training data available β”œβ”€β”€ Simple, single-prompt tasks β”œβ”€β”€ Rapid prototyping is priority └── Team is unfamiliar with DSPy

Summary

DSPy represents a paradigm shift in how we build LLM applications:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DSPY SUMMARY β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ CORE PHILOSOPHY: β”‚ β”‚ "Program, don't prompt" β”‚ β”‚ - Prompts are compiled artifacts, not source code β”‚ β”‚ - Optimization is data-driven, not intuition-driven β”‚ β”‚ β”‚ β”‚ KEY ABSTRACTIONS: β”‚ β”‚ β”œβ”€β”€ Signatures: Define WHAT (input/output contracts) β”‚ β”‚ β”œβ”€β”€ Modules: Define HOW (composable logic units) β”‚ β”‚ β”œβ”€β”€ Predictors: Interface to LMs β”‚ β”‚ └── Optimizers: Improve programs automatically β”‚ β”‚ β”‚ β”‚ OPTIMIZATION REQUIRES: β”‚ β”‚ β”œβ”€β”€ Program (modules to optimize) β”‚ β”‚ β”œβ”€β”€ Metric (how to measure success) β”‚ β”‚ └── Data (examples to learn from) β”‚ β”‚ β”‚ β”‚ BUILT-IN MODULES: β”‚ β”‚ β”œβ”€β”€ Predict: Direct mapping β”‚ β”‚ β”œβ”€β”€ ChainOfThought: Reasoning + answer β”‚ β”‚ β”œβ”€β”€ ReAct: Reasoning + actions β”‚ β”‚ β”œβ”€β”€ ProgramOfThought: Code generation + execution β”‚ β”‚ └── Retrieve: RAG integration β”‚ β”‚ β”‚ β”‚ OPTIMIZERS: β”‚ β”‚ β”œβ”€β”€ BootstrapFewShot: Select few-shot examples β”‚ β”‚ β”œβ”€β”€ MIPROv2: Joint instruction + example optimization β”‚ β”‚ β”œβ”€β”€ COPRO: Instruction coordination β”‚ β”‚ β”œβ”€β”€ BootstrapFinetune: Create fine-tuning data β”‚ β”‚ └── GEPA: Evolutionary + Pareto optimization β”‚ β”‚ β”‚ β”‚ BENEFITS: β”‚ β”‚ β”œβ”€β”€ Systematic optimization β”‚ β”‚ β”œβ”€β”€ Modular, maintainable code β”‚ β”‚ β”œβ”€β”€ Model-agnostic programs β”‚ β”‚ β”œβ”€β”€ Reproducible results β”‚ β”‚ └── Production-ready patterns β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The DSPy Mental Model

Traditional LLM Development: You β†’ (write prompt) β†’ Prompt β†’ LLM β†’ Output DSPy Development: You β†’ (write program) β†’ Program Data + Metric β†’ Optimizer β†’ (compiles) β†’ Optimized Prompt Optimized Prompt β†’ LLM β†’ Output The key difference: You focus on the PROGRAM and METRICS, DSPy handles the PROMPT.

Getting Started Checklist

1. [ ] Install DSPy: pip install dspy 2. [ ] Configure LM: dspy.configure(lm=dspy.LM("...")) 3. [ ] Define signatures for your tasks 4. [ ] Build modules that compose signatures 5. [ ] Collect training examples 6. [ ] Define a metric function 7. [ ] Choose an optimizer (start with BootstrapFewShot) 8. [ ] Compile and evaluate 9. [ ] Iterate on data, metrics, and program structure 10.[ ] Deploy compiled program
DSPy transforms LLM application development from an art into an engineering discipline. By treating prompts as compiled artifacts and providing systematic optimization, it enables building reliable, maintainable, and high-performing LLM applications.