codira: your code ferret
codira is a repository-local indexing and context retrieval tool for
agent-assisted development. It gives coding agents a concise, deterministic
map of the codebase so they can answer focused questions, find the right files,
and start edits with less broad scanning.
The practical effect is simple: when codira is used alongside a coding
agent, the same task can often be handled with fewer tokens because the agent
receives a small, relevant context pack instead of rediscovering the repository
from scratch.
codira builds a SQLite index inside the target repository and supports exact
symbol lookup, docstring auditing, deterministic local semantic embeddings,
static call and callable-reference inspection, plugin discovery, and
deterministic context generation for natural-language queries.
The current release indexes mixed-language repositories through registered language analyzers:
- Python via the first-party
codira-analyzer-pythonplugin - JSON via the first-party
codira-analyzer-jsonplugin for JSON Schema,package.json, and.releaserc.json - C-family
*.cand*.hfiles via the first-partycodira-analyzer-cplugin backed bytree-sitter-c - Bash scripts via the first-party
codira-analyzer-bashplugin backed bytree-sitter-bash - SQLite persistence via the first-party
codira-backend-sqlitebackend plugin
Storage and query persistence remain SQLite-backed through the active backend registry.
Coding agents are strongest when they start with the right local facts. codira
turns the repository into a compact map of symbols, docstring issues, semantic
matches, call edges, callable references, and plugin coverage. Instead of asking
an agent to spend a large part of the context window scanning files, you can run
one focused command and hand it the relevant slice.
That usually means:
- fewer tokens spent on broad repository exploration
- fewer repeated file reads for the same task
- more deterministic handoffs between human intent and agent action
- a clearer audit trail for why a file or symbol was considered relevant
The repository-local operational and contributor documentation is organized
under docs/.
Start with:
docs/getting_started.mddocs/CONTRIBUTING.mddocs/architecture/index.mddocs/plugins/index.mddocs/release/checklist.mddocs/release/process.mddocs/process/branching.mddocs/process/decisions.mddocs/adr/index.md
The project documentation site uses the same badge on the landing page and uses the small badge as the MkDocs favicon.
For the official runtime with the first-party analyzers and SQLite backend:
pip install codira-bundle-officialcodira and the official bundle are published on PyPI. If you only need to
use the tool and do not need a development branch, prefer the published package
instead of an editable checkout.
The installed command is:
codira --helpFor a core-only install:
pip install codiraInstall codira into the virtual environment of the repository you want to
analyze.
Example: from a target repository such as Fontshow:
source .venv/bin/activate
pip install -e ../codiraThe editable install keeps the codira CLI available in the target
repository's virtual environment while still using the live source tree from
this repository.
Developer automation is uv-based for dependency resolution and lockfile
maintenance. Local validation and CI execute against the uv-managed .venv
and the editable first-party package set.
Install optional analyzer packages only when needed. For repository-local development inside this repo, the bootstrap flow installs the official first-party packages automatically through:
uv run python scripts/install_first_party_packages.py \
--include-core \
--core-extra dev \
--core-extra docs \
--core-extra semanticFor an editable install into another repository with the current source tree:
source .venv/bin/activate
uv run python ../codira/scripts/install_first_party_packages.py \
--python "$VIRTUAL_ENV/bin/python" \
--include-core \
--core-extra semanticThe published end-user bundle is codira-bundle-official. Inside the current
checkout, install the extracted first-party analyzers and backend from
packages/; the canonical local install set is owned by
scripts/install_first_party_packages.py.
Use codira plugins to inspect discovery. The report marks each plugin as
origin=core, origin=first_party, or origin=third_party.
The current architecture after completed ADR-004 migration work is:
- one active backend per repository instance, selected through
codira.registry - SQLite as the default first-party backend distributed through
codira-backend-sqlite - multiple language analyzers in one indexing run
- deterministic mixed-language indexing for tracked Python, supported JSON, Bash, and C-family files
- query-time retrieval planning with deterministic intent families for behavior, test, configuration, API-surface, and architecture/navigation queries
The first-party JSON analyzer is intentionally family-based rather than generic. It currently supports:
- JSON Schema documents
- npm-style
package.jsonmanifests - semantic-release
.releaserc.jsonfiles
It intentionally does not claim lockfiles, VS Code JSONC settings, or generic unclassified JSON blobs.
The detailed architecture and migration record live under:
docs/architecture/index.mddocs/adr/ADR-004-pluggable-backends-migration-plan.md
Build or refresh the repository-local index:
codira indexIndexing also precomputes local deterministic embeddings for indexed symbols. Unchanged files are reused by default.
Index a repository other than the current working directory and store
.codira state somewhere else:
codira index --path /mnt/readonly/repo --output-dir /tmp/codira-runUse environment variables when you want the same target/output roots across multiple commands:
export CODIRA_TARGET_DIR=/mnt/readonly/repo
export CODIRA_OUTPUT_DIR=/tmp/codira-run
codira index
codira sym build_parser
codira ctx "schema migration rules"Force a full rebuild:
codira index --fullShow incremental reuse decisions:
codira index --explainInspect canonical-directory analyzer coverage without building the index:
codira cov
codira cov --jsonCoverage checks tracked files under src/, tests/, and scripts/. A file is
considered covered only when some active analyzer both discovers it and returns
True from supports_path().
Require full canonical coverage before indexing:
codira index --require-full-coverageAudit indexed docstrings:
codira audit
codira audit --json
codira audit --prefix src/codira/queryFor Python callables, audit applies Python-aware result-section
rules:
- regular functions should document
Returnsand notYields - generator and async-generator functions should document
Yields - generators may also document
Returnsonly when they explicitly usereturn <value>to produce a terminalStopIteration.value
Query exact symbols:
codira sym build_parser
codira sym build_parser --json
codira sym build_parser --prefix src/codiraList indexed symbols with graph metrics:
codira symlist
codira symlist --json
codira symlist --limit 20Inspect embedding-only matches and backend metadata:
codira emb "schema migration rules"
codira emb "schema migration rules" --json
codira emb "schema migration rules" --prefix src/codira/queryInspect static call edges:
codira calls context_for
codira calls context_for --json
codira calls context_for --tree
codira calls context_for --tree --dot
codira calls imported_helper --module pkg.b --incoming
codira calls imported_helper --module pkg.b --incoming --prefix src/codira/queryInspect callable-object references such as registry bindings:
codira refs _retrieve_script_candidates --module codira.query.context --incoming
codira refs _retrieve_script_candidates --incoming --json
codira refs _retrieve_script_candidates --incoming --tree
codira refs _retrieve_script_candidates --incoming --tree --dot
codira refs _retrieve_script_candidates --incoming --prefix src/codira/queryGenerate deterministic context for a natural-language query:
codira ctx "missing numpy docstring"
codira ctx "missing numpy docstring" --prefix src/codiraEmbedding-assisted retrieval works best for natural-language queries such as:
codira ctx "schema migration rules"Emit structured JSON for agent workflows:
codira ctx "missing numpy docstring" --json
codira ctx "missing numpy docstring" --json --prefix src/codira/queryEmit a prompt-oriented view:
codira ctx "parse inventory validation flow" --promptUse the subcommands in roughly this order during development and maintenance: refresh the index, inspect exact symbols or relations, then ask for broader task-oriented context.
Use index to build or refresh the repository-local snapshot that every other
retrieval command depends on.
Suggested use cases:
- first use in a repository
- after switching branches
- after rebases, pulls, or merges
- after significant code or structure changes
- before trusting results from
sym,calls,refs,ctx, oraudit
Examples:
codira index
codira index --require-full-coverageExpected result semantics:
- refreshes
.codira/for the current working tree - makes later queries deterministic against the current indexed state
- should be rerun when you suspect the current snapshot is stale
Use sym when you already know the exact symbol name and want the indexed
definition sites.
Suggested use cases:
- jump to
build_parser,context_for, or another known symbol - confirm exact defining files before editing
- narrow down repeated symbol names with
--prefix
Examples:
codira sym build_parser
codira sym build_parser --json
codira sym build_parser --prefix src/codiraExpected result semantics:
- returns exact symbol-name matches, not semantic approximations
- is best when the symbol name is already known
Use symlist when you want an indexed symbol inventory with static call and
callable-reference connectivity counts.
Suggested use cases:
- inspect repository structure after indexing
- identify symbols with many incoming calls or references
- export a deterministic symbol inventory for tooling with
--json - include test symbols explicitly with
--include-tests
Examples:
codira symlist
codira symlist --json
codira symlist --limit 20
codira symlist --include-tests
codira symlist --prefix src/codiraExpected result semantics:
- sorts symbols by
(module, name)before applying--limit - excludes
testsmodules by default - reports
calls_out,calls_in,refs_out, andrefs_incounts - includes unresolved outgoing graph edges in total and unresolved counts
Use emb to inspect the embedding channel by itself.
Suggested use cases:
- debug semantic recall
- inspect backend metadata and raw embedding-ranked matches
- compare embedding-only behavior with
ctx
Examples:
codira emb "schema migration rules"
codira emb "schema migration rules" --jsonExpected result semantics:
- shows embedding-ranked matches only
- does not include the multi-channel merge used by
ctx
Use calls to inspect direct indexed static call edges.
Suggested use cases:
- see what a function directly calls
- see who directly calls a function with
--incoming - render a bounded traversal with
--tree - export a bounded traversal as Graphviz DOT with
--tree --dot
Examples:
codira calls context_for
codira calls context_for --incoming
codira calls context_for --tree
codira calls context_for --tree --dotExpected result semantics:
- covers direct static call edges only
- tree mode remains bounded by
--max-depthand--max-nodes - DOT export is opt-in and only available for bounded tree mode
Use refs to inspect callable-object references rather than direct call sites.
Suggested use cases:
- inspect registry bindings
- see which owners return or store a callable object
- trace incoming owners of one callable target with
--incoming - render or export a bounded reference tree
Examples:
codira refs _retrieve_script_candidates --incoming
codira refs _retrieve_script_candidates --incoming --tree
codira refs _retrieve_script_candidates --incoming --tree --dotExpected result semantics:
- focuses on callable-object references such as registries, assignment values, and returned function objects
- is complementary to
calls, not interchangeable with it
Use ctx when you have a task or question rather than an exact symbol
name.
Suggested use cases:
- understand where behavior lives for a bug fix
- prepare a maintenance or refactor pass
- gather bounded context for an agent or review workflow
- inspect retrieval diagnostics with
--explain
Examples:
codira ctx "schema migration rules"
codira ctx "missing numpy docstring" --json
codira ctx "parse inventory validation flow" --prompt
codira ctx "missing numpy docstring" --explainExpected result semantics:
- uses bounded multi-channel retrieval rather than exact lookup only
- can use bounded graph evidence during ranking
- can expand related cross-module symbols after ranking
- is a focused context pack, not a full repository report
Use audit to inspect indexed docstring problems directly.
Suggested use cases:
- run a documentation cleanup pass
- focus audits on one subtree with
--prefix - emit machine-readable results for automation with
--json
Examples:
codira audit
codira audit --prefix src/codira/query
codira audit --jsonExpected result semantics:
- reports indexed docstring issues, not arbitrary style suggestions
- is most useful after a fresh
codira index
Use plugins to inspect which capabilities are active and where they come
from.
Suggested use cases:
- confirm whether a capability came from core, an official package, or a third-party plugin
- verify packaging and installation state in a repository
Examples:
codira plugins
codira plugins --jsonExpected result semantics:
- reports installed or active plugin and capability surfaces
- is useful when debugging environment or packaging issues
Use caps when a tool, contributor, or agent needs codira to declare what it
can answer before making retrieval decisions. The longer capabilities command
is kept as a compatibility alias.
Suggested use cases:
- inspect the canonical ontology used by active analyzers
- verify analyzer declarations after plugin changes
- inspect command and retrieval-channel guarantees
- feed deterministic capability metadata into agent workflows
Examples:
codira caps
codira caps --json
codira caps --strict --jsonExpected result semantics:
- exports command, channel, analyzer, and retrieval-producer declarations
- reports degraded metadata if an active analyzer does not explicitly declare ontology coverage
- fails on missing or invalid analyzer declarations only when
--strictis set - describes capability surfaces only; it does not index or query repository content
The most important cross-cutting flags are:
--prefix: scope results to one subtree or file--json: machine-readable output--prompt: compact agent handoff forctx--explain: retrieval diagnostics forctx--tree: bounded traversal mode forcallsandrefs--dot: Graphviz DOT export for boundedcallsandrefstrees
Practical rule:
- use exact commands first when you already know what you are looking for
- use
ctxwhen the task is known but the exact symbol is not - rerun
indexwhenever you would not trust the current snapshot - always read the referenced files before patching
Use --prefix <path> to scope supported read/query subcommands to one
repo-root-relative directory or file.
Examples:
codira sym build_parser --prefix src/codira
codira symlist --prefix src/codira
codira emb "schema migration rules" --prefix src/codira/query
codira calls imported_helper --module pkg.b --incoming --prefix src/codira/query
codira refs _retrieve_script_candidates --incoming --prefix src/codira/query
codira audit --prefix src/codira/query
codira ctx "missing numpy docstring" --json --prefix src/codira/querySemantics:
sym --prefix P NAME: only symbols whose defining file is underPsymlist --prefix P: only inventory symbols whose defining file is underPemb --prefix P QUERY: only matched symbols whose file is underPctx --prefix P QUERY: retrieval, expansion, issues, and references are restricted to files underPcalls --prefix P NAME: only call edges whose caller file is underPrefs --prefix P NAME: only callable-object references whose owner file is underPaudit --prefix P: only issues for symbols defined underP
--prefix must be relative to the repository root. It may point to either a
directory or a single file.
Use --json on the exact/query subcommands when another tool or agent needs a
machine-readable result instead of human-oriented text.
Supported subcommands:
symsymlistembcallsrefsauditctxcaps
Examples:
codira sym build_parser --json
codira symlist --json --limit 20
codira emb "schema migration rules" --json --prefix src/codira/query
codira calls imported_helper --module pkg.b --incoming --json
codira refs _retrieve_script_candidates --incoming --json --prefix src/codira/query
codira audit --json --prefix src/codira/query
codira ctx "missing numpy docstring" --jsonFor sym, emb, calls, refs, and audit, the JSON
contract uses a lightweight shared envelope:
{
"schema_version": "1.0",
"command": "symbol",
"status": "ok",
"query": {
"name": "build_parser",
"prefix": "src/codira"
},
"results": []
}Status values:
ok: one or more results were foundno_matches: the filtered query returned no resultsnot_indexed: the command requires indexed embedding data that is not present
ctx --json keeps its existing richer retrieval schema. It is not part
of the lightweight query-envelope contract above.
symlist --json uses an inventory schema:
{
"schema_version": "1.0",
"status": "ok",
"symbols": []
}Use codira ctx "<query>" --prompt when you want a compact,
copy-ready prompt for an agent session.
Recommended use cases:
- starting a focused bug-fix task
- preparing a docstring audit pass
- analyzing an external repository before patching
- resuming work on a specific subsystem after context switching
Recommended workflow:
- Verify likely symbols or files with
rg. - Run
codira index. - Run
codira ctx "<query>" --prompt. - Read the returned files and symbols before editing.
The prompt view is optimized for fast operator handoff. It is not a substitute for reading the referenced files.
Use the plain text mode when you want a compact human-readable summary across the symbol, semantic, and embedding channels:
codira ctx "missing numpy docstring"Use JSON when another tool or agent workflow needs structured output:
codira ctx "missing numpy docstring" --json
codira sym build_parser --jsonUse prompt mode when you want a copy-ready task preamble:
codira ctx "parse inventory validation flow" --promptUse explain mode when you need retrieval diagnostics:
codira ctx "missing numpy docstring" --explainPractical rule:
- plain text: human inspection
--json: automation and downstream tooling--prompt: agent handoff--explain: debugging retrieval behavior
The emb command is a debugging surface for the embedding channel only.
Use it when you want backend metadata and raw embedding-ranked matches without
the normal multi-channel merge used by ctx.
Natural-language queries:
codira ctx "missing numpy docstring"
codira ctx "parse inventory validation flow"
codira ctx "where is schema validation performed"
codira ctx "how does release tagging work"
codira ctx "semantic merge ordering"Exact symbol lookup:
codira sym build_parser
codira sym context_for
codira sym validate_docstringStatic call-edge inspection:
codira calls context_for
codira calls imported_helper --module pkg.b --incomingCallable-reference inspection:
codira refs _retrieve_script_candidates --module codira.query.context --incomingThe most useful queries are usually:
- behavior-oriented
- scoped to one subsystem
- phrased in terms of the problem you are solving
Prefer specific queries over broad ones such as "project structure" or
"everything about indexing".
Rerun codira index when the repository state has changed enough that the
existing .codira/ snapshot may no longer reflect the current code.
Typical cases:
- after significant code changes
- after switching branches
- after rebases, pulls, or merges
- before a larger audit session
- before querying a repository that has not been indexed yet
The index is repository-local and intentionally conservative. Rebuilding it is cheap compared with working from stale symbol or docstring data.
Practical rule:
codira indexRun it again whenever you would not trust an earlier search result to describe the current working tree.
codira is a retrieval and inspection tool. It narrows search and improves
determinism, but it does not replace direct source inspection.
Important limits:
- it includes a deterministic in-repo embedding backend rather than a full external-model semantic stack
- stored embeddings carry explicit backend and version metadata so the backend can be replaced later without changing the retrieval interface
- it does not prove behavior correctness on its own
- it does not replace reading the referenced files
- it does not authorize blind edits based only on retrieved snippets
- it is only as current as the indexed repository state
- embedding recall is intentionally lightweight and local-first in the current implementation
codira callsonly covers direct static call sitescodira refsshould be used for callable-object references such as registry values, assignment values, and returned function objectscodira calls --treeandcodira refs --treeprovide bounded traversal views, and--dotrenders those bounded trees as Graphviz DOTctxnow uses bounded graph evidence during retrieval and then uses stored call and callable-reference data to pull in related cross-module symbols around top function and method matches
Recommended use:
- use
codirato find likely files, symbols, and related issues - use
rgto verify concrete symbol existence - read the actual files before patching
- rerun tests and validation after changes
Run codira from the target repository, not from the codira source
tree.
Example workflow:
- Activate the target repository virtual environment.
- Run
codira index. - Verify candidate symbols with
rg <query>before patching. - Run
codira ctx "<query>" --json. - Inspect the actual files and symbols returned.
- Apply changes only after verification.
- Rebuild the index after material source changes.
This keeps the .codira/ cache local to the analyzed repository and avoids
cross-repo state drift.
alias ri='codira'
alias ri-index='codira index'
alias ri-audit='codira audit'
alias ri-ctx='codira ctx'
alias ri-docs='codira ctx "missing numpy docstring" --json'A thin wrapper script in the target repository can make the workflow more repeatable:
#!/usr/bin/env bash
set -euo pipefail
source .venv/bin/activate
codira "$@"Example target-repo setup:
mkdir -p scripts
cat > scripts/ri.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
source .venv/bin/activate
codira "$@"
EOF
chmod +x scripts/ri.shThen run:
./scripts/ri.sh index
./scripts/ri.sh audit
./scripts/ri.sh ctx "missing numpy docstring" --jsonUse codira as a developer tool.
Recommended:
- install it editable into the target repository virtual environment
- keep the index local to the target repository
- verify symbol existence with
rgbefore editing
Not recommended:
- global installation for day-to-day work
- treating
codiraas a runtime dependency of the target project - relying on ad-hoc
PYTHONPATHlaunch patterns for normal usage
If you want a target repository to standardize codira usage, this snippet
can be copied into its AGENTS.md:
### codira Workflow
Use `codira` as a repository-local developer tool.
Before broad code exploration or patching:
1. Activate the repository virtual environment.
2. Run `codira index`.
3. Verify candidate symbols with `rg <query>` before editing.
4. Run `codira ctx "<query>" --json` or `--prompt` as needed.
5. Inspect the referenced files before applying changes.
Use output modes as follows:
- plain `ctx`: compact human-readable context
- `ctx --json`: structured tool/agent workflows
- `ctx --prompt`: copy-ready agent preamble
- `ctx --explain`: retrieval diagnostics
`codira` narrows search and improves determinism. It does not replace
reading the actual source files before editing.
