Skip to content

hath995/cocoindex-code-dafny-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cocoindex-code-dafny

Dafny AST-aware chunker for cocoindex-code (ccc) using the official Dafny parser (DafnyCore.dll) via pythonnet.

Uses Dafny's own C# parser for perfect parsing of all Dafny syntax — no parse errors, correct handling of abstract modules, refinement, includes, codatatypes, greatest/least predicates, and all language features.

What it produces

Every indexed declaration is a self-contained chunk with metadata:

// Module: Std.Collections.Seq
// Kind: ghost lemma
// Name: SortedUnique

  lemma SortedUnique<T(!new)>(xs: seq<T>, ys: seq<T>, R: (T, T) -> bool)
    requires SortedBy(R, xs)
    requires SortedBy(R, ys)
    requires TotalOrdering(R)
    requires multiset(xs) == multiset(ys)
    ensures xs == ys
  { ... }

Module and class headers show the full API surface — imports, types, fields, constructors, and method signatures with contracts:

// Module: Std.Collections.Seq
// Kind: module
// Name: Seq

module Std.Collections.Seq {
    import opened Wrappers
    import opened Relations
    import Math

    function First<T>(xs: seq<T>): T
        requires |xs| > 0
    function DropFirst<T>(xs: seq<T>): seq<T>
        requires |xs| > 0
    ghost lemma LemmaSplitAt<T>(xs: seq<T>, pos: nat)
        requires pos < |xs|
        ensures xs[..pos] + xs[pos..] == xs
    function MergeSortBy<T>(lessThanOrEq: (T, T) -> bool, a: seq<T>): seq<T>
        requires TotalOrdering(lessThanOrEq)
        ensures multiset(a) == multiset(result)
        ensures SortedBy(lessThanOrEq, result)
    ...
}

What's in a chunk

  • // Module: — fully qualified module path (use for import statements)
  • // Kind: — declaration kind with modifiers (ghost lemma, opaque function, abstract module, codatatype, etc.)
  • // Name: — declaration name
  • Source — the complete declaration including signature, contracts (requires/ensures), and body

What's in a module/class header

  • Import and export declarations
  • Type declarations and constants
  • Constructor (full source for classes)
  • All method/function/lemma/predicate signatures with:
    • Generic type parameters (<T, U>)
    • Parameters ((xs: seq<T>, pos: nat))
    • Named return types ((m': imap<X, Y>))
    • Pre/postconditions, reads/modifies/decreases clauses

Headers are summaries only — they don't include method bodies. Search for a specific declaration by name to get the full implementation.

Prerequisites

  1. .NET 8.0+ runtimedownload

  2. DafnyCore.dll and dependencies — download from a Dafny release:

    # Download and extract the latest Dafny release for your platform
    # e.g., dafny-4.x.x-x64-win.zip, dafny-4.x.x-x64-linux.tar.gz, etc.

    The DLLs are in the dafny/ directory of the release archive. Set DAFNY_CORE_DLL_PATH to that directory (it contains DafnyCore.dll and all Boogie dependencies).

    Alternatively, build from source:

    git clone https://github.com/dafny-lang/dafny.git
    cd dafny/Source/DafnyCore
    dotnet publish -c Release -o ~/dafny-lib
  3. cocoindex-code (ccc) — version 0.2.9+ with custom chunker support:

    uv tool install cocoindex-code --prerelease explicit \
      --with "cocoindex>=1.0.0a24"

    Or install from the latest source if the chunker feature isn't released yet.

Installation

Install ccc with this chunker and pythonnet:

uv tool install cocoindex-code --prerelease explicit \
  --with "cocoindex>=1.0.0a24" \
  --with pythonnet \
  --with cocoindex-code-dafny

Or if installing from a local checkout:

uv tool install --editable /path/to/cocoindex-code --prerelease explicit \
  --with "cocoindex>=1.0.0a24" \
  --with pythonnet \
  --with "cocoindex-code-dafny @ file:///path/to/cocoindex-code-dafny-v2"

Setup

1. Configure the DafnyCore DLL path

Edit ~/.cocoindex_code/global_settings.yml:

envs:
  DAFNY_CORE_DLL_PATH: /path/to/dafny-lib

Replace /path/to/dafny-lib with the directory containing DafnyCore.dll from the publish step.

2. Configure the embedding model

In the same ~/.cocoindex_code/global_settings.yml, configure an embedding model. We recommend nomic-embed-code for Dafny code:

# Local model (no API key needed)
embedding:
  provider: sentence-transformers
  model: nomic-ai/nomic-embed-code
  device: cpu
# Or via LM Studio / OpenAI-compatible API
embedding:
  provider: litellm
  model: openai/text-embedding-nomic-embed-code
envs:
  OPENAI_API_BASE: http://localhost:1234/v1
  OPENAI_API_KEY: not-needed
  DAFNY_CORE_DLL_PATH: /path/to/dafny-lib

See the cocoindex-code docs for more embedding options.

3. Initialize and configure a project

cd /path/to/your/dafny/project
ccc init

Edit .cocoindex_code/settings.yml to include .dfy files and register the chunker:

include_patterns:
  - '**/*.dfy'

language_overrides:
  - ext: dfy
    lang: dafny

chunkers:
  - ext: dfy
    module: cocoindex_code_dafny:dafny_chunker

4. Index and search

ccc index
ccc search "lemma about sorted sequences"
ccc search "predicate checking if map is injective"
ccc search "multiset(xs) == multiset(f(xs)) && sorted(f(xs))"

MCP Server for AI Agents

The chunker works transparently with ccc's MCP server, exposing a search tool that returns file paths, line numbers, and chunk content:

# For Claude Code
claude mcp add cocoindex-code -- ccc mcp

# For Codex
codex mcp add cocoindex-code -- ccc mcp

Each search result includes:

  • file_path — relative path to the source file
  • start_line / end_line — line range in the file
  • content — the chunk text with metadata headers
  • language"dafny" for Dafny chunks
  • score — similarity score (0-1, higher is better)

Search tips for agents

  • Module headers are summaries — they show signatures but not implementations. Follow up with a name-specific query to get the full body.
  • Code-pattern queries work — the embedding model understands Dafny syntax: search("requires Valid() ensures fresh(Repr)")
  • Use the module path for imports// Module: Std.Collections.Seq means import opened Std.Collections.Seq
  • Filter by language — use languages: ["dafny"] to exclude non-Dafny files

See DAFNY_SEARCH_SKILL.md for a comprehensive guide to effective Dafny code search.

Including additional directories

To index multiple Dafny codebases under one ccc project, create directory junctions:

# Windows
mklink /J Dafny44 C:\path\to\your\other\dafny\project

# Linux/macOS
ln -s /path/to/your/other/dafny/project Dafny44

Then ccc index will walk into the junction and index those files too.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages