Skip to content

Add "misra_help" query docs generator#1114

Draft
data-douser wants to merge 1 commit intomainfrom
dd/misra-qhelp/1
Draft

Add "misra_help" query docs generator#1114
data-douser wants to merge 1 commit intomainfrom
dd/misra-qhelp/1

Conversation

@data-douser
Copy link
Copy Markdown
Contributor

Introduces scripts/generate_rules/misra_help/, a two-stage pipeline for (mostly) idempotent generation of per-query .md help files. It uses MISRA rule text as input and creates (or updates) documentation for codeql-coding-standards queries in C and C++.

Initial supported standards:

  • MISRA C 2012 / 2023
  • MISRA C++ 2023

Stage 1 — deterministic, docling-based extraction and rendering, with a JSON sidecar for downstream consumption.

Stage 2 — a headless Python driver for the Copilot SDK that rewrites each help file from the JSON sidecar against a fixed Markdown schema, normalized to American English.

See scripts/generate_rules/misra_help/README.md for usage, architecture, and operational notes.

Description

Adds a new internal tooling package under scripts/generate_rules/misra_help/ that automates generation of per-query Markdown help files for MISRA C/C++ queries. No query files, query metadata, rule packages, shared libraries, tests, .expected files, or release artifacts are modified by this PR — it is purely additive tooling (7 new files, ~2.1k lines, all under scripts/generate_rules/misra_help/).

The pipeline is split so that the deterministic extraction stage can be re-run cheaply and audited independently of the LLM-driven rewrite stage. The JSON sidecar is the contract between the two stages, which keeps Stage 2 reproducible against a pinned input.

Change request type

  • Release or process automation (GitHub workflows, internal scripts)
  • Internal documentation
  • External documentation
  • Query files (.ql, .qll, .qls or unit tests)
  • External scripts (analysis report or other code shipped as part of a release)

Rules with added or modified queries

  • No rules added
  • Queries have been added for the following rules:
    • rule number here
  • Queries have been modified for the following rules:
    • rule number here

Release change checklist

A change note (development_handbook.md#change-notes) is required for any pull request which modifies:

  • The structure or layout of the release artifacts.
  • The evaluation performance (memory, execution time) of an existing query.
  • The results of an existing query in any circumstance.

If you are only adding new rule queries, a change note is not required.

Author: Is a change note required?

  • Yes
  • No — tooling-only change under scripts/generate_rules/misra_help/; no release artifacts, query results, or query performance are affected.

🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.

  • N/A — no .ql/.qll files modified.

Reviewer: Confirm that either a change note is not required or the change note is required and has been added.

  • Confirmed

Query development review checklist

For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:

N/A for this PR — no queries are added or modified. This section is left as-is for the reviewer to confirm.

Author

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

Reviewer

  • Have all the relevant rule package description files been checked in?
  • Have you verified that the metadata properties of each new query is set appropriately?
  • Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
  • Are the alert messages properly formatted and consistent with the style guide?
  • Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
    As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
  • Does the query have an appropriate level of in-query comments/documentation?
  • Have you considered/identified possible edge cases?
  • Does the query not reinvent features in the standard library?
  • Can the query be simplified further (not golfed!)

Introduces scripts/generate_rules/misra_help/, a two-stage
pipeline for (mostly) idempotent generation of per-query .md help
files. Uses MISRA rules as input and creates (or updates, as needed)
documentation for codeql-coding-standards queries for C and C++.

Focuses on immediate support for:

- MISRA C 2012/2023
- MISRA C++ 2023.

Stage 1: deterministic docling-based extraction and rendering, with
a JSON sidecar for downstream consumption.

Stage 2: a headless Python driver for the Copilot SDK that rewrites
each help file from the JSON sidecar against a fixed Markdown
schema and American English spelling.

Adds docs via -> "scripts/generate_rules/misra_help/README.md"
@data-douser data-douser self-assigned this Apr 21, 2026
# Drop trailing references of the form "C90 [...]" / "C99 [...]" etc.
s = re.sub(
r"\s+(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\]"
r"(?:\s*[,;]?\s*(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\])*\s*$",
ap.add_argument("--standard", required=True, choices=SUPPORTED_STANDARDS,
help="MISRA standard to populate (the source language is "
"derived from this)")
ap.add_argument("--query-repo", type=Path, default=DEFAULT_QUERY_REPO,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO some of this stuff should be deleted as YAGNI.

I think it's totally fine to either assume that the working directory is the project root, or to find the project root via relative path to __FILE__. We already have other scripts that assume the help repo can be found via ../codeql-coding-standards-help.

Given the size of this PR, I'd rather not add too many bells and whistles

#
# If none of those resolve to exactly one file, we abort with a clear message.
PDF_ENV_VARS = {
"MISRA-C-2023": "MISRA_C_PDF",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just a clarification here. We should not differentiate C-2023 and C-2012 at all.

Every rule we have is both a part of MISRA C 2012, and a part of MISRA C 2023, there isn't an actual distinction. MISRA C 2012 with all amendments included = MISRA C 2023

}

RULE_DIR_RE = re.compile(r"^(?:RULE|DIR)-\d+(?:-\d+){1,2}$")
QL_NAME_RE = re.compile(r"@name\s+(?:RULE|DIR)-\d+(?:-\d+){1,2}:\s+(?P<title>.+?)\s*$")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reproducing some prior art and needs to be consolidated.

We have normalized titles etc already in rule_packages/*.json. The script scripts/generate_rules/generate_package_files.py already takes the parsed rule_package data which is organized per rule and per query, and that's what's used to fill in the existing help template.

What's especially important is that some rule_package.json entries have an implementation_scope property (see here ) that's added to the query help. This is critical, because it is the only part of our query help that isn't a direct copy of the misra text, but rather describes expected FPs and FNs to the user.

if not cli_pdf.is_file():
raise SystemExit(f"error: --pdf {cli_pdf} does not exist")
return cli_pdf
env_var = PDF_ENV_VARS[standard]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another thing we should cut via YAGNI -- no need to support setting the pdf path as an environment variable, just creates more code to have to maintain

f"error: ${env_var} is set to {p} which does not exist")
return p
matches: list[Path] = []
for pattern in PDF_FILE_GLOBS[standard]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also unnecessary magic

p.add_argument("--model", default=DEFAULT_MODEL,
help=f"Copilot model id. Default: {DEFAULT_MODEL}. "
f"Known good: {', '.join(MODEL_FALLBACKS)}.")
p.add_argument("--no-overwrite", action="store_true",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd probably prefer default behavior doesn't overwrite and it requires --overwrite to do so.

import requests


SUPPORTED_STANDARDS = ("MISRA-C-2012", "MISRA-C-2023", "MISRA-C++-2023")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, there aren't really two C standards

return time.time() + slack_seconds >= self.expires_at


def fetch_copilot_token(oauth_token: str) -> CopilotToken:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the copilot API, or copilot CLI?



# ---------------------------------------------------------------------------
# Prompt construction (mirrors codeql-coding-standards-agent/src/rewriteHelp.ts)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a "nice to have."

The "must have" is that we have query help. As a first pass, this should be a word for word match to the MISRA documents.

"8. End with these two sections verbatim, with the rule id and the short rule statement substituted in:",
" \"## Implementation notes\"",
" \"\"",
" \"None\"",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is provided by the implementation_scope field in our rule_packages json files, and should not be None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants