A comprehensive deep dive into DSPy’s philosophy, architecture, core abstractions (signatures, modules, predictors), optimization system, and how to use it to build reliable, data-driven LLM applications.

Atharv Yeolekar

Dec 7, 2025

DSPy (Declarative Self-improving Python) is a framework that fundamentally reimagines how we build applications with language models. Instead of crafting prompts by hand, you write structured Python programs that DSPy compiles into optimized prompts, fine-tuned weights, or both. This document provides a comprehensive understanding of DSPy's architecture, components, and how they work together.

The DSPy Philosophy

The Problem with Traditional Prompt Engineering

Traditional LLM application development suffers from several issues:


Traditional Approach:

1. Write a prompt by hand
2. Test it on a few examples
3. Tweak wording when it fails
4. Add more instructions
5. Prompt becomes unwieldy
6. Model changes → prompts break
7. Repeat forever

Problems:
├── Prompts are brittle (small changes break them)
├── No systematic optimization
├── Prompt-model coupling (prompts don't transfer)
├── Hard to maintain at scale
└── Human intuition is the only guide

DSPy's Solution: Programming, Not Prompting

DSPy treats LM interactions as a programming problem, not a prompt-writing problem:


DSPy Approach:

1. Define WHAT you want (signatures)
2. Define HOW to compose (modules)
3. Provide training examples
4. Let DSPy optimize prompts/weights
5. Get compiled, optimized program

Benefits:
├── Prompts are generated, not hand-written
├── Systematic optimization with metrics
├── Portable across models
├── Modular and maintainable
└── Data-driven improvement

The Key Insight


┌─────────────────────────────────────────────────────────────┐
│                    THE DSPY INSIGHT                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Traditional: You write prompts, hope they work             │
│                                                             │
│  DSPy: You write programs, DSPy writes prompts              │
│                                                             │
│  The prompt becomes a COMPILED ARTIFACT, not source code    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Architecture Overview

DSPy's architecture consists of several interconnected layers:


┌─────────────────────────────────────────────────────────────┐
│                      DSPy Architecture                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                  APPLICATION LAYER                   │   │
│  │  Your DSPy Program (Modules composed together)       │   │
│  └──────────────────────┬──────────────────────────────┘   │
│                         │                                   │
│  ┌──────────────────────▼──────────────────────────────┐   │
│  │                  ABSTRACTION LAYER                   │   │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────────┐    │   │
│  │  │Signatures │  │ Modules   │  │  Predictors   │    │   │
│  │  │(schemas)  │  │(logic)    │  │  (LM calls)   │    │   │
│  │  └───────────┘  └───────────┘  └───────────────┘    │   │
│  └──────────────────────┬──────────────────────────────┘   │
│                         │                                   │
│  ┌──────────────────────▼──────────────────────────────┐   │
│  │                 OPTIMIZATION LAYER                   │   │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────────┐    │   │
│  │  │Telepromp- │  │ Metrics   │  │  Assertions   │    │   │
│  │  │ters       │  │           │  │               │    │   │
│  │  └───────────┘  └───────────┘  └───────────────┘    │   │
│  └──────────────────────┬──────────────────────────────┘   │
│                         │                                   │
│  ┌──────────────────────▼──────────────────────────────┐   │
│  │                 INTEGRATION LAYER                    │   │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────────┐    │   │
│  │  │    LM     │  │ Retrieval │  │    Tools      │    │   │
│  │  │ Adapters  │  │  Models   │  │               │    │   │
│  │  └───────────┘  └───────────┘  └───────────────┘    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Layer Responsibilities

Layer	Purpose	Components
Application	Your business logic	Custom modules, pipelines
Abstraction	Define structure and behavior	Signatures, Modules, Predictors
Optimization	Improve program quality	Teleprompters, Metrics, Assertions
Integration	Connect to external systems	LM adapters, Retrievers, Tools

Core Abstractions

Signatures

Signatures are the foundational abstraction in DSPy. They define the input-output contract for a task without specifying how to accomplish it.

What is a Signature?


class QuestionAnswering(dspy.Signature):
    """Answer questions based on provided context."""

    context = dspy.InputField(desc="Background information")
    question = dspy.InputField(desc="The question to answer")
    answer = dspy.OutputField(desc="A concise answer")

A signature specifies:

Docstring: Natural language description of the task

Input fields: What data goes in

Output fields: What data comes out

Field descriptions: Hints about each field's purpose

Signature Anatomy


┌─────────────────────────────────────────────────────────────┐
│                    SIGNATURE STRUCTURE                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  class TaskName(dspy.Signature):                            │
│      """Task description (becomes part of prompt)"""        │
│                                                             │
│      # Input fields - data provided to the LM               │
│      input1 = dspy.InputField(desc="description")           │
│      input2 = dspy.InputField()  # desc is optional         │
│                                                             │
│      # Output fields - data extracted from LM response      │
│      output1 = dspy.OutputField(desc="description")         │
│      output2 = dspy.OutputField()                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Inline Signatures

For simple cases, you can define signatures inline as strings:


# Inline signature format: "input1, input2 -> output1, output2"# Simple QA
qa = dspy.Predict("question -> answer")

# With context
rag = dspy.Predict("context, question -> answer")

# Multiple outputs
analysis = dspy.Predict("text -> sentiment, confidence, keywords")

Why Signatures Matter


Signatures serve as:

1. CONTRACTS
   └── Define what the module expects and produces

2. DOCUMENTATION
   └── Self-documenting code via docstrings and field descriptions

3. PROMPT TEMPLATES
   └── DSPy generates prompts from signature structure

4. TYPE HINTS
   └── Enable validation and IDE support

5. OPTIMIZATION TARGETS
   └── Teleprompters know what to optimize based on signatures

Modules

Modules are the building blocks of DSPy programs. Inspired by PyTorch's nn.Module, they encapsulate logic and can be composed hierarchically.

Basic Module Structure


class MyModule(dspy.Module):
    def __init__(self):
        super().__init__()
# Initialize sub-modules and predictors
        self.predictor = dspy.Predict(MySignature)

    def forward(self, **kwargs):
# Define the logic
        result = self.predictor(**kwargs)
        return result

Module Composition

Modules can contain other modules, enabling complex pipelines:


class RAGPipeline(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
# Step 1: Retrieve relevant passages
        passages = self.retrieve(question).passages
        context = "\n".join(passages)

# Step 2: Generate answer with context
        answer = self.generate(context=context, question=question)
        return answer

Module Hierarchy


┌─────────────────────────────────────────────────────────────┐
│                    MODULE COMPOSITION                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ComplexPipeline (dspy.Module)                              │
│  │                                                          │
│  ├── QueryExpander (dspy.Module)                            │
│  │   └── dspy.ChainOfThought("query -> expanded_queries")   │
│  │                                                          │
│  ├── MultiRetriever (dspy.Module)                           │
│  │   ├── dspy.Retrieve(k=5)  # Primary retriever            │
│  │   └── dspy.Retrieve(k=3)  # Fallback retriever           │
│  │                                                          │
│  ├── Ranker (dspy.Module)                                   │
│  │   └── dspy.Predict("passages, query -> ranked_passages") │
│  │                                                          │
│  └── Generator (dspy.Module)                                │
│      └── dspy.ChainOfThought("context, query -> answer")    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Predictors

Predictors are the interface between your program and the language model. They take a signature and handle the actual LM invocation.

The Predict Class

dspy.Predict is the basic predictor that directly calls the LM:


class BasicQA(dspy.Module):
    def __init__(self):
        super().__init__()
# Create a predictor from a signature
        self.qa = dspy.Predict("context, question -> answer")

    def forward(self, context, question):
# Call the predictor
        result = self.qa(context=context, question=question)
        return result.answer

What Predictors Do


When you call a predictor:

1. SIGNATURE → PROMPT
   └── Convert signature + inputs into a prompt

2. PROMPT → LM
   └── Send prompt to configured language model

3. LM RESPONSE → PARSING
   └── Extract output fields from response

4. PARSED → dspy.Prediction
   └── Return structured prediction object

Built-in Modules

DSPy provides several built-in modules for common patterns:

dspy.Predict

The basic predictor - direct input-to-output mapping:


# Simple prediction
predict = dspy.Predict("question -> answer")
result = predict(question="What is 2+2?")
print(result.answer)# "4"

dspy.ChainOfThought

Adds reasoning before the answer:


# With chain-of-thought reasoning
cot = dspy.ChainOfThought("question -> answer")
result = cot(question="What is 2+2?")
print(result.reasoning)# "I need to add 2 and 2..."print(result.answer)# "4"

How ChainOfThought works:


Standard Predict:
  Input: question
  Output: answer

ChainOfThought:
  Input: question
  Output: reasoning, answer  ← reasoning is automatically added

The prompt instructs the LM to "think step by step" before answering.

dspy.ChainOfThoughtWithHint

Chain-of-thought with optional hints:


cot_hint = dspy.ChainOfThoughtWithHint("question -> answer")
result = cot_hint(
    question="What is the capital of France?",
    hint="Think about European geography"
)

dspy.ProgramOfThought

Generates and executes code to solve problems:


pot = dspy.ProgramOfThought("question -> answer")
result = pot(question="What is 15% of 80?")
# LM generates: result = 80 * 0.15# DSPy executes the code# Returns: 12

dspy.ReAct

Combines reasoning and action (tool use):


# Define available toolsdef search(query: str) -> str:
    """Search the web for information."""
    return web_search(query)

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

# Create ReAct agent
react = dspy.ReAct(
    "question -> answer",
    tools=[search, calculate]
)

result = react(question="What is the population of France times 2?")
# LM reasons: "I need to find France's population, then multiply"# LM acts: search("population of France")# LM reasons: "Got 67 million, now multiply by 2"# LM acts: calculate("67000000 * 2")# LM answers: "134,000,000"

dspy.Retrieve

Retrieves relevant passages:


retrieve = dspy.Retrieve(k=5)# Get top 5 passages
results = retrieve("What causes rainbows?")
for passage in results.passages:
    print(passage)

dspy.MultiChainComparison

Generates multiple reasoning chains and compares them:


mcc = dspy.MultiChainComparison(
    "question -> answer",
    num_chains=3
)
result = mcc(question="Complex reasoning problem...")
# Generates 3 independent reasoning chains# Compares them to select best answer

Module Summary


┌─────────────────────────────────────────────────────────────┐
│                    BUILT-IN MODULES                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Module              │ Description                          │
│  ────────────────────┼──────────────────────────────────    │
│  Predict             │ Direct input→output mapping          │
│  ChainOfThought      │ Adds reasoning before answer         │
│  ChainOfThoughtHint  │ CoT with optional hints              │
│  ProgramOfThought    │ Generates & executes code            │
│  ReAct               │ Reasoning + tool use                 │
│  Retrieve            │ Retrieves relevant passages          │
│  MultiChainComparison│ Multiple chains, picks best          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Language Model Integration

DSPy abstracts away the specifics of different LM providers through a unified interface.

Configuring Language Models


import dspy

# OpenAI
lm = dspy.LM("openai/gpt-4o", api_key="...")

# Anthropic
lm = dspy.LM("anthropic/claude-3-opus-20240229", api_key="...")

# Local models (via Ollama)
lm = dspy.LM("ollama/llama3.1")

# Azure OpenAI
lm = dspy.LM("azure/gpt-4", api_key="...", api_base="...")

# Configure globally
dspy.configure(lm=lm)

The LM Adapter System


┌─────────────────────────────────────────────────────────────┐
│                    LM ADAPTER SYSTEM                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Your DSPy Program                                          │
│         │                                                   │
│         ▼                                                   │
│  ┌─────────────────┐                                        │
│  │   dspy.LM       │  Unified interface                     │
│  └────────┬────────┘                                        │
│           │                                                 │
│           ▼                                                 │
│  ┌─────────────────────────────────────────┐                │
│  │           Provider Adapters              │                │
│  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐       │                │
│  │  │OpenAI│Anthro│ Azure │ Local │ ...   │                │
│  │  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘       │                │
│  └─────┼───────┼───────┼───────┼──────────┘                │
│        │       │       │       │                            │
│        ▼       ▼       ▼       ▼                            │
│     [APIs]  [APIs]  [APIs]  [Local]                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

LM Configuration Options


lm = dspy.LM(
    "openai/gpt-4o",
    api_key="...",
    temperature=0.7,# Sampling temperature
    max_tokens=1000,# Max response length
    top_p=0.9,# Nucleus sampling
    cache=True,# Cache responses
    num_retries=3,# Retry on failure
)

Multiple LMs

You can configure different LMs for different parts of your program:


# Configure default LM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Use a different LM for specific callswith dspy.context(lm=dspy.LM("openai/gpt-4o")):
# This uses gpt-4o
    result = expensive_module(input)

# Back to default (gpt-4o-mini)
result = cheap_module(input)

Retrieval Integration

DSPy integrates with retrieval systems for RAG (Retrieval-Augmented Generation) applications.

Configuring Retrievers


import dspy

# ColBERT v2
rm = dspy.ColBERTv2(url="http://your-colbert-server:8080")

# Qdrantfrom dspy.retrieve.qdrant_rm import QdrantRM
rm = QdrantRM("collection_name", qdrant_client)

# ChromaDBfrom dspy.retrieve.chromadb_rm import ChromadbRM
rm = ChromadbRM("collection_name", persist_directory)

# Configure globally
dspy.configure(rm=rm)

Using Retrieve


class RAGModule(dspy.Module):
    def __init__(self, k=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=k)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
# Retrieve relevant passages
        retrieved = self.retrieve(question)
        context = "\n\n".join(retrieved.passages)

# Generate answerreturn self.generate(context=context, question=question)

Retrieval Flow


┌─────────────────────────────────────────────────────────────┐
│                    RETRIEVAL FLOW                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Query comes in                                          │
│     "What causes climate change?"                           │
│                    │                                        │
│                    ▼                                        │
│  2. dspy.Retrieve(k=3)                                      │
│     ├── Encodes query                                       │
│     ├── Searches vector store                               │
│     └── Returns top-k passages                              │
│                    │                                        │
│                    ▼                                        │
│  3. Passages returned                                       │
│     ├── "Greenhouse gases trap heat..."                     │
│     ├── "CO2 levels have risen 50%..."                      │
│     └── "Human activities since 1850..."                    │
│                    │                                        │
│                    ▼                                        │
│  4. Context provided to LM                                  │
│     dspy.ChainOfThought(context=passages, question=query)   │
│                    │                                        │
│                    ▼                                        │
│  5. Grounded answer generated                               │
│     "Climate change is caused by greenhouse gases..."       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The Optimization System

DSPy's optimization system is what makes it truly powerful. It automatically improves your programs based on data and metrics.

The Optimization Triangle


┌─────────────────────────────────────────────────────────────┐
│                  OPTIMIZATION TRIANGLE                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                      TRAINING DATA                          │
│                          ▲                                  │
│                         /│\                                 │
│                        / │ \                                │
│                       /  │  \                               │
│                      /   │   \                              │
│                     /    │    \                             │
│                    /     │     \                            │
│                   /      │      \                           │
│                  ▼       │       ▼                          │
│            METRIC ◄──────┴──────► PROGRAM                   │
│                                                             │
│  All three are required for optimization:                   │
│  - Program: The DSPy modules to optimize                    │
│  - Metric: How to measure success                           │
│  - Training data: Examples to learn from                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Defining Metrics

Metrics measure how good a prediction is:


# Simple exact match metricdef exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

# F1 score metricdef f1_metric(example, prediction, trace=None):
    pred_tokens = set(prediction.answer.lower().split())
    gold_tokens = set(example.answer.lower().split())

    if not pred_tokens or not gold_tokens:
        return 0.0

    precision = len(pred_tokens & gold_tokens) / len(pred_tokens)
    recall = len(pred_tokens & gold_tokens) / len(gold_tokens)

    if precision + recall == 0:
        return 0.0

    return 2 * precision * recall / (precision + recall)

# Semantic similarity metricdef semantic_similarity(example, prediction, trace=None):
# Use embeddings to compute similarityreturn compute_cosine_similarity(example.answer, prediction.answer)

# Composite metricdef composite_metric(example, prediction, trace=None):
    em = exact_match(example, prediction, trace)
    f1 = f1_metric(example, prediction, trace)
    return 0.3 * em + 0.7 * f1

Training Data (Examples)


# Create training examples
trainset = [
    dspy.Example(
        question="What is the capital of France?",
        answer="Paris"
    ).with_inputs("question"),

    dspy.Example(
        question="Who wrote Romeo and Juliet?",
        answer="William Shakespeare"
    ).with_inputs("question"),

# ... more examples
]

# with_inputs() specifies which fields are inputs (rest are outputs)

The Compilation Process



flowchart TD
    A[Unoptimized Program] --> B[Optimizer/Teleprompter]
    C[Training Data] --> B
    D[Metric Function] --> B
    B --> E[Compilation Process]
    E --> F{Optimizer Type}
    F -->|BootstrapFewShot| G[Select few-shot examples]
    F -->|MIPROv2| H[Optimize instructions + examples]
    F -->|BootstrapFinetune| I[Generate fine-tuning data]
    F -->|COPRO| J[Coordinate instruction optimization]
    G --> K[Compiled Program]
    H --> K
    I --> K
    J --> K

Optimizers (Teleprompters)

Optimizers (historically called "teleprompters") are algorithms that improve DSPy programs. Each optimizer has different strengths.

BootstrapFewShot

The simplest optimizer - selects effective few-shot examples:


from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(
    metric=my_metric,
    max_bootstrapped_demos=4,# Max examples to include
    max_labeled_demos=16,# Max labeled examples to try
    max_rounds=1,# Optimization rounds
)

compiled = optimizer.compile(
    student=my_program,
    trainset=training_data,
)

How it works:


BootstrapFewShot Algorithm:

1. Run program on training examples
2. Collect successful traces (where metric passes)
3. Select diverse, high-quality traces as demonstrations
4. Insert demonstrations into prompt
5. Return compiled program with few-shot examples

BootstrapFewShotWithRandomSearch

Adds random search over example combinations:


from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    metric=my_metric,
    max_bootstrapped_demos=4,
    num_candidate_programs=10,# Number of combinations to try
    num_threads=4,# Parallel evaluation
)

MIPROv2

State-of-the-art optimizer that jointly optimizes instructions and examples:


from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=my_metric,
    num_candidates=10,# Instruction candidates
    init_temperature=1.0,# Exploration temperature
    verbose=True,
)

compiled = optimizer.compile(
    student=my_program,
    trainset=training_data,
    valset=validation_data,# For evaluation
    num_trials=30,# Optimization budget
)

How MIPROv2 works:


MIPROv2 Algorithm:

1. INSTRUCTION PROPOSAL
   └── LLM generates candidate instructions based on task

2. EXAMPLE SELECTION
   └── Bootstrap effective few-shot examples

3. JOINT OPTIMIZATION
   └── Bayesian optimization over instruction-example space

4. EVALUATION
   └── Validate on held-out data

5. SELECTION
   └── Return best configuration found

COPRO

Coordinate Prompt Optimization - optimizes instructions across modules:


from dspy.teleprompt import COPRO

optimizer = COPRO(
    metric=my_metric,
    depth=3,# Optimization depth
    breadth=5,# Candidates per iteration
)

BootstrapFinetune

Creates fine-tuning data from program traces:


from dspy.teleprompt import BootstrapFinetune

optimizer = BootstrapFinetune(
    metric=my_metric,
    multitask=True,# Train on multiple tasks
)

compiled = optimizer.compile(
    student=my_program,
    trainset=training_data,
    target_model="meta-llama/Llama-3-8B",# Model to fine-tune
)

GEPA

Genetic Evolution with Pareto optimization:


from dspy.teleprompt import GEPA

optimizer = GEPA(
    metric=my_metric,
    num_generations=20,
    population_size=10,
)

(See the GEPA Deep Dive for detailed explanation)

Optimizer Comparison


┌─────────────────────────────────────────────────────────────┐
│                  OPTIMIZER COMPARISON                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Optimizer            │ What it Optimizes    │ Sample Cost  │
│  ─────────────────────┼──────────────────────┼────────────  │
│  BootstrapFewShot     │ Few-shot examples    │ Low          │
│  BootstrapFewShot+RS  │ Example combinations │ Medium       │
│  COPRO                │ Instructions         │ Medium       │
│  MIPROv2              │ Instructions+Examples│ High         │
│  BootstrapFinetune    │ Model weights        │ Very High    │
│  GEPA                 │ All + Pareto diverse │ High         │
│                                                             │
│  Recommended starting point: BootstrapFewShot               │
│  For production: MIPROv2 or GEPA                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Assertions and Constraints

DSPy provides assertions to add constraints and self-correction to programs.

dspy.Assert

Hard constraints that must be satisfied:


class ConstrainedQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.qa = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        result = self.qa(question=question)

# Hard constraint: answer must be less than 50 words
        dspy.Assert(
            len(result.answer.split()) < 50,
            "Answer must be concise (under 50 words)"
        )

        return result

dspy.Suggest

Soft constraints that guide but don't halt:


class GuidedQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.qa = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        result = self.qa(question=question)

# Soft constraint: prefer answers with citations
        dspy.Suggest(
            "[" in result.answer and "]" in result.answer,
            "Consider including citations in brackets"
        )

        return result

How Assertions Work


┌─────────────────────────────────────────────────────────────┐
│                  ASSERTION BEHAVIOR                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  dspy.Assert (Hard Constraint):                             │
│  ├── If constraint fails:                                   │
│  │   ├── Add feedback to prompt                             │
│  │   ├── Retry with constraint context                      │
│  │   └── If still fails after retries: raise exception      │
│  └── If constraint passes: continue normally                │
│                                                             │
│  dspy.Suggest (Soft Constraint):                            │
│  ├── If constraint fails:                                   │
│  │   ├── Log suggestion                                     │
│  │   ├── May retry once with hint                           │
│  │   └── Continue even if still fails                       │
│  └── If constraint passes: continue normally                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Assertion Configuration


# Configure assertion behavior
dspy.configure(
    assert_max_retries=3,# Max retries for Assert
    suggest_max_retries=1,# Max retries for Suggest
    backoff_time=0.5,# Delay between retries
)

The Compilation Process

Compilation transforms an unoptimized DSPy program into an optimized one.

Before and After Compilation


BEFORE COMPILATION:
┌─────────────────────────────────────────────────────────────┐
│  class QAModule(dspy.Module):                               │
│      def __init__(self):                                    │
│          self.qa = dspy.ChainOfThought("question -> answer")│
│                                                             │
│      def forward(self, question):                           │
│          return self.qa(question=question)                  │
│                                                             │
│  # Prompt is minimal, no examples, generic instruction      │
└─────────────────────────────────────────────────────────────┘

AFTER COMPILATION:
┌─────────────────────────────────────────────────────────────┐
│  # Same code, but internal state is different:              │
│                                                             │
│  compiled_qa.qa.demos = [                                   │
│      # Carefully selected few-shot examples                 │
│      Example(question="...", reasoning="...", answer="..."),│
│      Example(question="...", reasoning="...", answer="..."),│
│      Example(question="...", reasoning="...", answer="..."),│
│  ]                                                          │
│                                                             │
│  compiled_qa.qa.instructions = """                          │
│  You are an expert question answering system. Given a       │
│  question, think step by step and provide a clear,          │
│  accurate answer. Focus on factual accuracy...              │
│  """                                                        │
│                                                             │
│  # Prompt is now rich with examples and refined instruction │
└─────────────────────────────────────────────────────────────┘

What Gets Compiled


Compilation affects:

1. INSTRUCTIONS
   ├── Task descriptions in signatures
   └── Generated from optimization

2. DEMONSTRATIONS (Few-shot examples)
   ├── Selected from successful traces
   └── Optimized for diversity and quality

3. FIELD PREFIXES
   ├── How input/output fields are labeled
   └── Can be optimized for clarity

4. (Optionally) MODEL WEIGHTS
   └── If using BootstrapFinetune

Saving and Loading Compiled Programs


# Save compiled program
compiled.save("my_compiled_program.json")

# Load compiled program
loaded = MyModule()
loaded.load("my_compiled_program.json")

# Or load state into existing program
my_program.load_state(compiled.dump_state())

Execution Flow

Understanding how DSPy executes a call helps debug and optimize.

Call Flow Diagram


flowchart TD
    A[User calls module.forward] --> B[Module logic executes]
    B --> C[Predictor called]
    C --> D[Build prompt from signature]
    D --> E{Demos available?}
    E -->|Yes| F[Add few-shot examples]
    E -->|No| G[Skip demos]
    F --> G
    G --> H[Add current input]
    H --> I[Call language model]
    I --> J[Parse LM response]
    J --> K{Assertions?}
    K -->|Assert fails| L[Retry with feedback]
    K -->|Assert passes| M[Return prediction]
    L --> I
    M --> N[Continue module logic]
    N --> O[Return final result]

Trace Inspection

DSPy records traces for debugging:


# Enable tracing
dspy.configure(trace=[])

# Run program
result = my_program(question="What is AI?")

# Inspect tracefor step in dspy.settings.trace:
    print(f"Module: {step['module']}")
    print(f"Input: {step['input']}")
    print(f"Output: {step['output']}")
    print("---")

Prompt Inspection


# See what prompt was actually sent
lm = dspy.LM("openai/gpt-4o", cache=False)
dspy.configure(lm=lm)

# Enable inspection
lm.inspect_history(n=1)# Show last 1 call

result = my_program(question="What is AI?")

# This prints the full prompt and response

Advanced Patterns

Multi-Stage Pipelines


class MultiStagePipeline(dspy.Module):
    def __init__(self):
        super().__init__()
        self.decompose = dspy.ChainOfThought("question -> sub_questions")
        self.answer_sub = dspy.ChainOfThought("sub_question -> sub_answer")
        self.synthesize = dspy.ChainOfThought(
            "question, sub_answers -> final_answer"
        )

    def forward(self, question):
# Stage 1: Decompose question
        decomposition = self.decompose(question=question)
        sub_questions = decomposition.sub_questions.split("\n")

# Stage 2: Answer each sub-question
        sub_answers = []
        for sq in sub_questions:
            answer = self.answer_sub(sub_question=sq)
            sub_answers.append(answer.sub_answer)

# Stage 3: Synthesize final answer
        final = self.synthesize(
            question=question,
            sub_answers="\n".join(sub_answers)
        )

        return final

Branching Logic


class BranchingModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.classifier = dspy.Predict("question -> category")
        self.factual_qa = dspy.ChainOfThought("question -> answer")
        self.creative_qa = dspy.ChainOfThought("question -> answer")
        self.analytical_qa = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
# Classify the question type
        classification = self.classifier(question=question)

# Route to appropriate handlerif classification.category == "factual":
            return self.factual_qa(question=question)
        elif classification.category == "creative":
            return self.creative_qa(question=question)
        else:
            return self.analytical_qa(question=question)

Ensemble Patterns


class EnsembleModule(dspy.Module):
    def __init__(self, n_models=3):
        super().__init__()
        self.predictors = [
            dspy.ChainOfThought("question -> answer")
            for _ in range(n_models)
        ]
        self.aggregator = dspy.Predict("answers -> best_answer")

    def forward(self, question):
# Get answers from all models
        answers = []
        for predictor in self.predictors:
            result = predictor(question=question)
            answers.append(result.answer)

# Aggregate
        combined = "\n".join(f"- {a}" for a in answers)
        final = self.aggregator(answers=combined)

        return final

Self-Refinement


class SelfRefiningModule(dspy.Module):
    def __init__(self, max_iterations=3):
        super().__init__()
        self.max_iterations = max_iterations
        self.generate = dspy.ChainOfThought("question -> answer")
        self.critique = dspy.Predict("question, answer -> critique, needs_improvement")
        self.refine = dspy.ChainOfThought("question, answer, critique -> improved_answer")

    def forward(self, question):
# Initial answer
        result = self.generate(question=question)
        answer = result.answer

# Iterative refinementfor _ in range(self.max_iterations):
# Critique current answer
            critique = self.critique(question=question, answer=answer)

            if critique.needs_improvement.lower() != "yes":
                break

# Refine based on critique
            refined = self.refine(
                question=question,
                answer=answer,
                critique=critique.critique
            )
            answer = refined.improved_answer

        return dspy.Prediction(answer=answer)

Comparison with Traditional Approaches

DSPy vs Manual Prompting


┌─────────────────────────────────────────────────────────────┐
│              DSPy vs MANUAL PROMPTING                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Aspect          │ Manual Prompting │ DSPy                  │
│  ────────────────┼──────────────────┼─────────────────────  │
│  Prompt creation │ Hand-written     │ Auto-generated        │
│  Optimization    │ Trial and error  │ Systematic algorithms │
│  Maintainability │ Difficult        │ Modular, structured   │
│  Portability     │ Model-specific   │ Model-agnostic        │
│  Reproducibility │ Low              │ High (data-driven)    │
│  Debugging       │ Print statements │ Traces, assertions    │
│  Testing         │ Ad-hoc           │ Metric-based          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

DSPy vs LangChain


┌─────────────────────────────────────────────────────────────┐
│              DSPy vs LANGCHAIN                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Aspect          │ LangChain        │ DSPy                  │
│  ────────────────┼──────────────────┼─────────────────────  │
│  Philosophy      │ Chaining tools   │ Programming LMs       │
│  Prompts         │ Templates        │ Compiled from data    │
│  Optimization    │ Manual           │ Automatic             │
│  Abstraction     │ Chains, agents   │ Signatures, modules   │
│  Focus           │ Orchestration    │ Optimization          │
│  Learning curve  │ Moderate         │ Steeper               │
│                                                             │
│  They're complementary - LangChain for orchestration,       │
│  DSPy for optimization. Can be used together.               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

When to Use DSPy


USE DSPy WHEN:
├── You have training data (even small amounts)
├── You need reproducible, optimized prompts
├── You're building production LM applications
├── You want modular, testable code
├── Prompt quality matters significantly
└── You're willing to invest in the learning curve

USE SIMPLER APPROACHES WHEN:
├── One-off scripts or experiments
├── No training data available
├── Simple, single-prompt tasks
├── Rapid prototyping is priority
└── Team is unfamiliar with DSPy

Summary

DSPy represents a paradigm shift in how we build LLM applications:


┌─────────────────────────────────────────────────────────────┐
│                    DSPY SUMMARY                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  CORE PHILOSOPHY:                                           │
│  "Program, don't prompt"                                    │
│  - Prompts are compiled artifacts, not source code          │
│  - Optimization is data-driven, not intuition-driven        │
│                                                             │
│  KEY ABSTRACTIONS:                                          │
│  ├── Signatures: Define WHAT (input/output contracts)       │
│  ├── Modules: Define HOW (composable logic units)           │
│  ├── Predictors: Interface to LMs                           │
│  └── Optimizers: Improve programs automatically             │
│                                                             │
│  OPTIMIZATION REQUIRES:                                     │
│  ├── Program (modules to optimize)                          │
│  ├── Metric (how to measure success)                        │
│  └── Data (examples to learn from)                          │
│                                                             │
│  BUILT-IN MODULES:                                          │
│  ├── Predict: Direct mapping                                │
│  ├── ChainOfThought: Reasoning + answer                     │
│  ├── ReAct: Reasoning + actions                             │
│  ├── ProgramOfThought: Code generation + execution          │
│  └── Retrieve: RAG integration                              │
│                                                             │
│  OPTIMIZERS:                                                │
│  ├── BootstrapFewShot: Select few-shot examples             │
│  ├── MIPROv2: Joint instruction + example optimization      │
│  ├── COPRO: Instruction coordination                        │
│  ├── BootstrapFinetune: Create fine-tuning data             │
│  └── GEPA: Evolutionary + Pareto optimization               │
│                                                             │
│  BENEFITS:                                                  │
│  ├── Systematic optimization                                │
│  ├── Modular, maintainable code                             │
│  ├── Model-agnostic programs                                │
│  ├── Reproducible results                                   │
│  └── Production-ready patterns                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The DSPy Mental Model


Traditional LLM Development:
  You → (write prompt) → Prompt → LLM → Output

DSPy Development:
  You → (write program) → Program
  Data + Metric → Optimizer → (compiles) → Optimized Prompt
  Optimized Prompt → LLM → Output

The key difference: You focus on the PROGRAM and METRICS,
DSPy handles the PROMPT.

Getting Started Checklist


1. [ ] Install DSPy: pip install dspy
2. [ ] Configure LM: dspy.configure(lm=dspy.LM("..."))
3. [ ] Define signatures for your tasks
4. [ ] Build modules that compose signatures
5. [ ] Collect training examples
6. [ ] Define a metric function
7. [ ] Choose an optimizer (start with BootstrapFewShot)
8. [ ] Compile and evaluate
9. [ ] Iterate on data, metrics, and program structure
10.[ ] Deploy compiled program

DSPy transforms LLM application development from an art into an engineering discipline. By treating prompts as compiled artifacts and providing systematic optimization, it enables building reliable, maintainable, and high-performing LLM applications.

Knowledge Graph Basics

Atharv Yeolekar

DSPy: Architecture and Framework Deep Dive

The DSPy Philosophy

The Problem with Traditional Prompt Engineering

DSPy's Solution: Programming, Not Prompting

The Key Insight

Architecture Overview

Layer Responsibilities

Core Abstractions

Signatures

What is a Signature?

Signature Anatomy

Inline Signatures

Why Signatures Matter

Modules

Basic Module Structure

Module Composition

Module Hierarchy

Predictors

The Predict Class

What Predictors Do

Built-in Modules

dspy.Predict

dspy.ChainOfThought

dspy.ChainOfThoughtWithHint

dspy.ProgramOfThought

dspy.ReAct

dspy.Retrieve

dspy.MultiChainComparison

Module Summary

Language Model Integration

Configuring Language Models

The LM Adapter System

LM Configuration Options

Multiple LMs

Retrieval Integration

Configuring Retrievers

Using Retrieve

Retrieval Flow

The Optimization System

The Optimization Triangle

Defining Metrics

Training Data (Examples)

The Compilation Process

Optimizers (Teleprompters)

BootstrapFewShot

BootstrapFewShotWithRandomSearch

MIPROv2

COPRO

BootstrapFinetune

GEPA

Optimizer Comparison

Assertions and Constraints

dspy.Assert

dspy.Suggest

How Assertions Work

Assertion Configuration

The Compilation Process

Before and After Compilation

What Gets Compiled

Saving and Loading Compiled Programs

Execution Flow

Call Flow Diagram

Trace Inspection

Prompt Inspection

Advanced Patterns

Multi-Stage Pipelines

Branching Logic

Ensemble Patterns

Self-Refinement

Comparison with Traditional Approaches

DSPy vs Manual Prompting

DSPy vs LangChain

When to Use DSPy

Summary

The DSPy Mental Model

Getting Started Checklist

Related Posts

Knowledge Graph Basics