Dafny AST-aware chunker for cocoindex-code (ccc) using the official Dafny parser (DafnyCore.dll) via pythonnet.
Uses Dafny's own C# parser for perfect parsing of all Dafny syntax — no parse errors, correct handling of abstract modules, refinement, includes, codatatypes, greatest/least predicates, and all language features.
Every indexed declaration is a self-contained chunk with metadata:
// Module: Std.Collections.Seq
// Kind: ghost lemma
// Name: SortedUnique
lemma SortedUnique<T(!new)>(xs: seq<T>, ys: seq<T>, R: (T, T) -> bool)
requires SortedBy(R, xs)
requires SortedBy(R, ys)
requires TotalOrdering(R)
requires multiset(xs) == multiset(ys)
ensures xs == ys
{ ... }
Module and class headers show the full API surface — imports, types, fields, constructors, and method signatures with contracts:
// Module: Std.Collections.Seq
// Kind: module
// Name: Seq
module Std.Collections.Seq {
import opened Wrappers
import opened Relations
import Math
function First<T>(xs: seq<T>): T
requires |xs| > 0
function DropFirst<T>(xs: seq<T>): seq<T>
requires |xs| > 0
ghost lemma LemmaSplitAt<T>(xs: seq<T>, pos: nat)
requires pos < |xs|
ensures xs[..pos] + xs[pos..] == xs
function MergeSortBy<T>(lessThanOrEq: (T, T) -> bool, a: seq<T>): seq<T>
requires TotalOrdering(lessThanOrEq)
ensures multiset(a) == multiset(result)
ensures SortedBy(lessThanOrEq, result)
...
}
// Module:— fully qualified module path (use forimportstatements)// Kind:— declaration kind with modifiers (ghost lemma,opaque function,abstract module,codatatype, etc.)// Name:— declaration name- Source — the complete declaration including signature, contracts (
requires/ensures), and body
- Import and export declarations
- Type declarations and constants
- Constructor (full source for classes)
- All method/function/lemma/predicate signatures with:
- Generic type parameters (
<T, U>) - Parameters (
(xs: seq<T>, pos: nat)) - Named return types (
(m': imap<X, Y>)) - Pre/postconditions, reads/modifies/decreases clauses
- Generic type parameters (
Headers are summaries only — they don't include method bodies. Search for a specific declaration by name to get the full implementation.
-
.NET 8.0+ runtime — download
-
DafnyCore.dll and dependencies — download from a Dafny release:
# Download and extract the latest Dafny release for your platform # e.g., dafny-4.x.x-x64-win.zip, dafny-4.x.x-x64-linux.tar.gz, etc.
The DLLs are in the
dafny/directory of the release archive. SetDAFNY_CORE_DLL_PATHto that directory (it contains DafnyCore.dll and all Boogie dependencies).Alternatively, build from source:
git clone https://github.com/dafny-lang/dafny.git cd dafny/Source/DafnyCore dotnet publish -c Release -o ~/dafny-lib
-
cocoindex-code (
ccc) — version 0.2.9+ with custom chunker support:uv tool install cocoindex-code --prerelease explicit \ --with "cocoindex>=1.0.0a24"Or install from the latest source if the chunker feature isn't released yet.
Install ccc with this chunker and pythonnet:
uv tool install cocoindex-code --prerelease explicit \
--with "cocoindex>=1.0.0a24" \
--with pythonnet \
--with cocoindex-code-dafnyOr if installing from a local checkout:
uv tool install --editable /path/to/cocoindex-code --prerelease explicit \
--with "cocoindex>=1.0.0a24" \
--with pythonnet \
--with "cocoindex-code-dafny @ file:///path/to/cocoindex-code-dafny-v2"Edit ~/.cocoindex_code/global_settings.yml:
envs:
DAFNY_CORE_DLL_PATH: /path/to/dafny-libReplace /path/to/dafny-lib with the directory containing DafnyCore.dll from the publish step.
In the same ~/.cocoindex_code/global_settings.yml, configure an embedding model. We recommend nomic-embed-code for Dafny code:
# Local model (no API key needed)
embedding:
provider: sentence-transformers
model: nomic-ai/nomic-embed-code
device: cpu# Or via LM Studio / OpenAI-compatible API
embedding:
provider: litellm
model: openai/text-embedding-nomic-embed-code
envs:
OPENAI_API_BASE: http://localhost:1234/v1
OPENAI_API_KEY: not-needed
DAFNY_CORE_DLL_PATH: /path/to/dafny-libSee the cocoindex-code docs for more embedding options.
cd /path/to/your/dafny/project
ccc initEdit .cocoindex_code/settings.yml to include .dfy files and register the chunker:
include_patterns:
- '**/*.dfy'
language_overrides:
- ext: dfy
lang: dafny
chunkers:
- ext: dfy
module: cocoindex_code_dafny:dafny_chunkerccc index
ccc search "lemma about sorted sequences"
ccc search "predicate checking if map is injective"
ccc search "multiset(xs) == multiset(f(xs)) && sorted(f(xs))"The chunker works transparently with ccc's MCP server, exposing a search tool that returns file paths, line numbers, and chunk content:
# For Claude Code
claude mcp add cocoindex-code -- ccc mcp
# For Codex
codex mcp add cocoindex-code -- ccc mcpEach search result includes:
file_path— relative path to the source filestart_line/end_line— line range in the filecontent— the chunk text with metadata headerslanguage—"dafny"for Dafny chunksscore— similarity score (0-1, higher is better)
- Module headers are summaries — they show signatures but not implementations. Follow up with a name-specific query to get the full body.
- Code-pattern queries work — the embedding model understands Dafny syntax:
search("requires Valid() ensures fresh(Repr)") - Use the module path for imports —
// Module: Std.Collections.Seqmeansimport opened Std.Collections.Seq - Filter by language — use
languages: ["dafny"]to exclude non-Dafny files
See DAFNY_SEARCH_SKILL.md for a comprehensive guide to effective Dafny code search.
To index multiple Dafny codebases under one ccc project, create directory junctions:
# Windows
mklink /J Dafny44 C:\path\to\your\other\dafny\project
# Linux/macOS
ln -s /path/to/your/other/dafny/project Dafny44Then ccc index will walk into the junction and index those files too.
MIT