Add "misra_help" query docs generator by data-douser · Pull Request #1114 · github/codeql-coding-standards

data-douser · 2026-04-21T05:11:18Z

Introduces scripts/generate_rules/misra_help/, a two-stage pipeline for (mostly) idempotent generation of per-query .md help files. It uses MISRA rule text as input and creates (or updates) documentation for codeql-coding-standards queries in C and C++.

Initial supported standards:

MISRA C 2012 / 2023
MISRA C++ 2023

Stage 1 — deterministic, docling-based extraction and rendering, with a JSON sidecar for downstream consumption.

Stage 2 — a headless Python driver for the Copilot SDK that rewrites each help file from the JSON sidecar against a fixed Markdown schema, normalized to American English.

See scripts/generate_rules/misra_help/README.md for usage, architecture, and operational notes.

Description

Adds a new internal tooling package under scripts/generate_rules/misra_help/ that automates generation of per-query Markdown help files for MISRA C/C++ queries. No query files, query metadata, rule packages, shared libraries, tests, .expected files, or release artifacts are modified by this PR — it is purely additive tooling (7 new files, ~2.1k lines, all under scripts/generate_rules/misra_help/).

The pipeline is split so that the deterministic extraction stage can be re-run cheaply and audited independently of the LLM-driven rewrite stage. The JSON sidecar is the contract between the two stages, which keeps Stage 2 reproducible against a pinned input.

Change request type

Release or process automation (GitHub workflows, internal scripts)
Internal documentation
External documentation
Query files (.ql, .qll, .qls or unit tests)
External scripts (analysis report or other code shipped as part of a release)

Rules with added or modified queries

No rules added
Queries have been added for the following rules:
- rule number here
Queries have been modified for the following rules:
- rule number here

Release change checklist

A change note (development_handbook.md#change-notes) is required for any pull request which modifies:

The structure or layout of the release artifacts.
The evaluation performance (memory, execution time) of an existing query.
The results of an existing query in any circumstance.

If you are only adding new rule queries, a change note is not required.

Author: Is a change note required?

Yes
No — tooling-only change under scripts/generate_rules/misra_help/; no release artifacts, query results, or query performance are affected.

🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.

N/A — no .ql/.qll files modified.

Reviewer: Confirm that either a change note is not required or the change note is required and has been added.

Confirmed

Query development review checklist

For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:

N/A for this PR — no queries are added or modified. This section is left as-is for the reviewer to confirm.

Author

Have all the relevant rule package description files been checked in?
Have you verified that the metadata properties of each new query is set appropriately?
Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
Are the alert messages properly formatted and consistent with the style guide?
Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Does the query have an appropriate level of in-query comments/documentation?
Have you considered/identified possible edge cases?
Does the query not reinvent features in the standard library?
Can the query be simplified further (not golfed!)

Reviewer

Have all the relevant rule package description files been checked in?
Have you verified that the metadata properties of each new query is set appropriately?
Do all the unit tests contain both "COMPLIANT" and "NON_COMPLIANT" cases?
Are the alert messages properly formatted and consistent with the style guide?
Have you run the queries on OpenPilot and verified that the performance and results are acceptable?
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Does the query have an appropriate level of in-query comments/documentation?
Have you considered/identified possible edge cases?
Does the query not reinvent features in the standard library?
Can the query be simplified further (not golfed!)

Introduces scripts/generate_rules/misra_help/, a two-stage pipeline for (mostly) idempotent generation of per-query .md help files. Uses MISRA rules as input and creates (or updates, as needed) documentation for codeql-coding-standards queries for C and C++. Focuses on immediate support for: - MISRA C 2012/2023 - MISRA C++ 2023. Stage 1: deterministic docling-based extraction and rendering, with a JSON sidecar for downstream consumption. Stage 2: a headless Python driver for the Copilot SDK that rewrites each help file from the JSON sidecar against a fixed Markdown schema and American English spelling. Adds docs via -> "scripts/generate_rules/misra_help/README.md"

+    # Drop trailing references of the form "C90 [...]" / "C99 [...]" etc.
+    s = re.sub(
+        r"\s+(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\]"
+        r"(?:\s*[,;]?\s*(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\])*\s*$",


MichaelRFairhurst · 2026-04-21T16:43:02Z

+    ap.add_argument("--standard", required=True, choices=SUPPORTED_STANDARDS,
+                    help="MISRA standard to populate (the source language is "
+                         "derived from this)")
+    ap.add_argument("--query-repo", type=Path, default=DEFAULT_QUERY_REPO,


IMO some of this stuff should be deleted as YAGNI.

I think it's totally fine to either assume that the working directory is the project root, or to find the project root via relative path to __FILE__. We already have other scripts that assume the help repo can be found via ../codeql-coding-standards-help.

Given the size of this PR, I'd rather not add too many bells and whistles

MichaelRFairhurst · 2026-04-21T16:45:53Z

+#
+# If none of those resolve to exactly one file, we abort with a clear message.
+PDF_ENV_VARS = {
+    "MISRA-C-2023":   "MISRA_C_PDF",


So just a clarification here. We should not differentiate C-2023 and C-2012 at all.

Every rule we have is both a part of MISRA C 2012, and a part of MISRA C 2023, there isn't an actual distinction. MISRA C 2012 with all amendments included = MISRA C 2023

MichaelRFairhurst · 2026-04-21T16:50:52Z

+}
+
+RULE_DIR_RE = re.compile(r"^(?:RULE|DIR)-\d+(?:-\d+){1,2}$")
+QL_NAME_RE = re.compile(r"@name\s+(?:RULE|DIR)-\d+(?:-\d+){1,2}:\s+(?P<title>.+?)\s*$")


This is reproducing some prior art and needs to be consolidated.

We have normalized titles etc already in rule_packages/*.json. The script scripts/generate_rules/generate_package_files.py already takes the parsed rule_package data which is organized per rule and per query, and that's what's used to fill in the existing help template.

What's especially important is that some rule_package.json entries have an implementation_scope property (see here ) that's added to the query help. This is critical, because it is the only part of our query help that isn't a direct copy of the misra text, but rather describes expected FPs and FNs to the user.

MichaelRFairhurst · 2026-04-21T16:53:09Z

+        if not cli_pdf.is_file():
+            raise SystemExit(f"error: --pdf {cli_pdf} does not exist")
+        return cli_pdf
+    env_var = PDF_ENV_VARS[standard]


This is another thing we should cut via YAGNI -- no need to support setting the pdf path as an environment variable, just creates more code to have to maintain

MichaelRFairhurst · 2026-04-21T16:53:51Z

+                f"error: ${env_var} is set to {p} which does not exist")
+        return p
+    matches: list[Path] = []
+    for pattern in PDF_FILE_GLOBS[standard]:


this is also unnecessary magic

MichaelRFairhurst · 2026-04-21T18:08:44Z

+    p.add_argument("--model", default=DEFAULT_MODEL,
+                   help=f"Copilot model id. Default: {DEFAULT_MODEL}. "
+                        f"Known good: {', '.join(MODEL_FALLBACKS)}.")
+    p.add_argument("--no-overwrite", action="store_true",


Again, I'd probably prefer default behavior doesn't overwrite and it requires --overwrite to do so.

MichaelRFairhurst · 2026-04-21T18:09:16Z

+import requests
+
+
+SUPPORTED_STANDARDS = ("MISRA-C-2012", "MISRA-C-2023", "MISRA-C++-2023")


again, there aren't really two C standards

MichaelRFairhurst · 2026-04-21T18:09:46Z

+        return time.time() + slack_seconds >= self.expires_at
+
+
+def fetch_copilot_token(oauth_token: str) -> CopilotToken:


Should we be using the copilot API, or copilot CLI?

MichaelRFairhurst · 2026-04-21T18:11:34Z

+
+
+# ---------------------------------------------------------------------------
+# Prompt construction (mirrors codeql-coding-standards-agent/src/rewriteHelp.ts)


Note that this is a "nice to have."

The "must have" is that we have query help. As a first pass, this should be a word for word match to the MISRA documents.

MichaelRFairhurst · 2026-04-21T18:15:06Z

+        "8. End with these two sections verbatim, with the rule id and the short rule statement substituted in:",
+        "   \"## Implementation notes\"",
+        "   \"\"",
+        "   \"None\"",


This is provided by the implementation_scope field in our rule_packages json files, and should not be None

data-douser requested a review from MichaelRFairhurst April 21, 2026 05:11

data-douser self-assigned this Apr 21, 2026

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

Comment thread scripts/generate_rules/misra_help/populate_help.py

# Drop trailing references of the form "C90 [...]" / "C99 [...]" etc.

s = re.sub(

r"\s+(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\]"

r"(?:\s*[,;]?\s*(?:C90|C99|C11|C17|C18)\s*\[[^\]]*\])*\s*$",

MichaelRFairhurst requested changes Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "misra_help" query docs generator#1114

Add "misra_help" query docs generator#1114
data-douser wants to merge 1 commit intomainfrom
dd/misra-qhelp/1

data-douser commented Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

MichaelRFairhurst Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		import requests


		SUPPORTED_STANDARDS = ("MISRA-C-2012", "MISRA-C-2023", "MISRA-C++-2023")

		return time.time() + slack_seconds >= self.expires_at


		def fetch_copilot_token(oauth_token: str) -> CopilotToken:



		# ---------------------------------------------------------------------------
		# Prompt construction (mirrors codeql-coding-standards-agent/src/rewriteHelp.ts)

Conversation

data-douser commented Apr 21, 2026

Description

Change request type

Rules with added or modified queries

Release change checklist

Query development review checklist

Author

Reviewer

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants