HYPERFLEET-1199 - feat: Add /e2e-debug skill#63
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThe Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
hyperfleet-devtools/skills/e2e-debug/SKILL.md (1)
6-6: 💤 Low valueRemove unused
Readfrom allowed-tools.Line 6 declares
allowed-tools: Bash, Read, WebFetch, AskUserQuestion, but the skill only uses Bash (for gh, gcloud, kubectl, jira commands), WebFetch (for GCS artifacts), and AskUserQuestion (line 26). The Read tool is not exercised. Per coding guidelines, do not request tools the skill does not use.♻️ Proposed fix
-allowed-tools: Bash, Read, WebFetch, AskUserQuestion +allowed-tools: Bash, WebFetch, AskUserQuestion🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md` at line 6, The allowed-tools declaration includes the Read tool, but reviewing the skill implementation shows it only uses Bash, WebFetch, and AskUserQuestion. Remove Read from the allowed-tools list in the SKILL.md file to ensure only the tools actually used by the skill are declared, per coding guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Line 266: The jira issue list command on line 266 is vulnerable to JQL
injection because the keyword-from-error is interpolated directly into the query
string without validation or escaping. To fix this, validate the keyword before
interpolation by filtering it to only alphanumeric characters, underscores, and
hyphens (removing or replacing any special characters or JQL operators), then
use the sanitized keyword in the query string. Alternatively, if the jira CLI
supports structured parameter passing or environment variables for query
parameters, use those mechanisms instead of string interpolation to avoid
injection entirely.
- Line 366: The kubectl port-forward commands at
hyperfleet-devtools/skills/e2e-debug/SKILL.md lines 366, 419, and 424 lack
timeout protection, causing indefinite hangs if the service is unreachable or
kubectl context is misconfigured. For each of these three locations, replace the
current `kubectl port-forward ... & PF_PID=$!; sleep 2; curl ...` pattern with a
timeout wrapper around the port-forward command (e.g., `timeout 5 kubectl
port-forward ...`), followed by a check to verify the process started
successfully using `kill -0 $PF_PID`, and only proceed with the curl command if
the process is running. This ensures the skill fails safely if port-forward
cannot establish within the timeout period, satisfying the fail-safe requirement
for dynamic context.
- Line 47: The SKILL.md file accepts a JIRA ticket input in the format
HYPERFLEET-XXXX without validating its format before passing it to the jira CLI
query around lines 262-270. This creates a security vulnerability where
malformed or attacker-controlled input could inject JQL metacharacters. Add
format validation early in the step (before line 266 where the jira query is
executed) to ensure the JIRA_TICKET variable matches the expected pattern of
uppercase project key characters, followed by a hyphen, followed by one or more
digits using a regex pattern like ^[A-Z]+-[0-9]+$. If the format is invalid,
output an error message and exit the step. This validation should be documented
or referenced at line 47 where the argument-hint is defined, and the actual
validation logic should be placed before the jira CLI execution in the 262-270
line range.
---
Nitpick comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Line 6: The allowed-tools declaration includes the Read tool, but reviewing
the skill implementation shows it only uses Bash, WebFetch, and AskUserQuestion.
Remove Read from the allowed-tools list in the SKILL.md file to ensure only the
tools actually used by the skill are declared, per coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4e34d43d-e223-4891-a79e-9d2861e10f4e
📒 Files selected for processing (8)
.claude-plugin/marketplace.jsonAGENTS.mdhyperfleet-devtools/.claude-plugin/plugin.jsonhyperfleet-devtools/README.mdhyperfleet-devtools/docs/e2e-debug-presentation.mdhyperfleet-devtools/skills/e2e-debug/SKILL.mdhyperfleet-devtools/skills/e2e-debug/references/ci-quick-reference.mdhyperfleet-devtools/skills/e2e-debug/references/known-failure-patterns.md
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Line 47: Add input validation to prevent JIRA and JQL injection
vulnerabilities by implementing two complementary fixes. At
hyperfleet-devtools/skills/e2e-debug/SKILL.md#L47-L47, add a format validation
check using the regex pattern `^[A-Z]+-[0-9]+$` to validate the JIRA ticket
input before it is used in Step 3c jira execution command. At
hyperfleet-devtools/skills/e2e-debug/SKILL.md#L262-L270, add keyword
sanitization that filters out non-alphanumeric characters from the extracted
keywords before line 266 where the keywords are interpolated into the JQL query.
Both fixes address the root cause of missing input validation on external input
passed directly to CLI commands.
- Around line 266-270: The jira issue list commands on lines 266 and subsequent
lines interpolate error keywords directly into JQL queries without sanitization,
creating a query injection vulnerability. Before interpolating the
keyword_from_error variable into the -q parameter, sanitize it to contain only
alphanumeric characters, hyphens, and underscores by using sed or similar
filtering (e.g., sed 's/[^a-zA-Z0-9_-]/ /g'). Apply this sanitization to all
locations where error text is extracted from logs and inserted into JQL query
strings to prevent injection of JQL operators or quotes that could modify the
query logic.
- Line 366: The kubectl port-forward commands at line 366 (Maestro DB check),
line 419 (Sentinel metrics), and line 424 (API status check) lack timeout
protection, which can cause indefinite hangs and block skill execution. For each
of these three instances, wrap the kubectl port-forward command with a timeout
wrapper (e.g., timeout 5), add a process-alive check using kill -0 on the PF_PID
variable to verify the port-forward started successfully, execute the curl
command only if the process is running, and ensure the process is cleaned up
with kill. This prevents hangs and ensures fail-safe behavior when services are
unreachable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 9bd2e055-80de-4459-ae65-f504bf0ee2c0
📒 Files selected for processing (8)
.claude-plugin/marketplace.jsonAGENTS.mdhyperfleet-devtools/.claude-plugin/plugin.jsonhyperfleet-devtools/README.mdhyperfleet-devtools/docs/e2e-debug-presentation.mdhyperfleet-devtools/skills/e2e-debug/SKILL.mdhyperfleet-devtools/skills/e2e-debug/references/ci-quick-reference.mdhyperfleet-devtools/skills/e2e-debug/references/known-failure-patterns.md
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
✅ Files skipped from review due to trivial changes (4)
- hyperfleet-devtools/.claude-plugin/plugin.json
- .claude-plugin/marketplace.json
- AGENTS.md
- hyperfleet-devtools/skills/e2e-debug/references/ci-quick-reference.md
- Add JIRA ticket format validation (^[A-Z]+-[0-9]+$) before CLI use - Sanitize JQL keywords from error logs (strip metacharacters) - Add timeout wrapper (timeout 10) to all kubectl port-forward commands - Add kill -0 process check before curl to fail safely if port-forward hangs - Keep Read in allowed-tools (needed for loading references/ files) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Line 47: The JIRA ticket input handling at line 47 instructs to validate the
format against the regex pattern but contains no executable validation code,
creating a security risk for injection attacks. Add explicit validation code
immediately after the JIRA ticket input is identified and before any CLI command
is constructed: check if the input matches the specified regex pattern
(^[A-Z]+-[0-9]+$), and if validation fails, output an error message to stderr
and exit with a non-zero status code. Only proceed with the jira CLI call to
fetch the ticket and search for pipeline links once validation succeeds,
matching the defensive pattern already applied to keyword sanitization elsewhere
in the document.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 136bd9b0-79f3-4875-80d0-d7d35e861ce8
📒 Files selected for processing (1)
hyperfleet-devtools/skills/e2e-debug/SKILL.md
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
CodeRabbit flagged that line 47 had a validation instruction but no executable code, unlike the JQL keyword sanitization at line 268. Added explicit grep -qE validation with error message before jira CLI use. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hyperfleet-devtools/skills/e2e-debug/SKILL.md (1)
555-556:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep live cluster data in the evidence set.
This guardrail excludes kubectl/gcloud evidence even though Step 5 makes live cluster inspection mandatory when available and the opening instructions already require live cluster corroboration. That contradiction can let the model certify a diagnosis without the only data source that confirms node drains, restarts, or Maestro state.
♻️ Proposed fix
-- **NO GUESSWORK:** Base your root cause ONLY on the intersection of logs, the debugging handbook, and the repository state. ++ **NO GUESSWORK:** Base your root cause ONLY on the intersection of logs, the debugging handbook, the repository state, and live cluster data.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md` around lines 555 - 556, The guardrails in the "NO HALLUCINATIONS" and "NO GUESSWORK" rules are excluding kubectl and gcloud evidence from the evidence set, which contradicts the requirement in Step 5 to mandatorily inspect the live cluster when available and the opening instructions requiring live cluster corroboration. Modify these two guardrail statements to explicitly include kubectl/gcloud evidence in the evidence set when available, ensuring that live cluster data about node drains, restarts, and Maestro state is always retained and used to confirm diagnoses rather than allowing certifications without this critical data source.
♻️ Duplicate comments (1)
hyperfleet-devtools/skills/e2e-debug/SKILL.md (1)
49-56:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFail closed on invalid JIRA keys.
The regex check only logs an error; it still falls through to
jira issue view, so malformed ticket IDs remain eligible for CLI use. That leaves the SEC-01 guard unenforced and reopens the injection path.🔒 Proposed fix
if ! echo "$JIRA_INPUT" | grep -qE '^[A-Z]+-[0-9]+$'; then echo "ERROR: Invalid JIRA ticket format. Expected: HYPERFLEET-1234. Received: $JIRA_INPUT" >&2 - # Stop — do not pass unvalidated input to jira CLI + exit 1 fi jira issue view "$JIRA_INPUT" --plain 2>/dev/nullAs per coding guidelines, SEC-01: validate input at system boundaries before passing untrusted data to CLI commands.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md` around lines 49 - 56, The JIRA input validation in this script checks the format using grep with the regex pattern but fails to stop execution when validation fails. Currently, after logging the error message for an invalid JIRA ticket format, the script continues and passes the malformed JIRA_INPUT to the jira issue view command, creating a security vulnerability. Add an exit statement or equivalent control flow termination immediately after the error log within the validation block to ensure that execution stops and the jira CLI command is never invoked with invalid input, thus enforcing the SEC-01 validation requirement at the system boundary.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Around line 555-556: The guardrails in the "NO HALLUCINATIONS" and "NO
GUESSWORK" rules are excluding kubectl and gcloud evidence from the evidence
set, which contradicts the requirement in Step 5 to mandatorily inspect the live
cluster when available and the opening instructions requiring live cluster
corroboration. Modify these two guardrail statements to explicitly include
kubectl/gcloud evidence in the evidence set when available, ensuring that live
cluster data about node drains, restarts, and Maestro state is always retained
and used to confirm diagnoses rather than allowing certifications without this
critical data source.
---
Duplicate comments:
In `@hyperfleet-devtools/skills/e2e-debug/SKILL.md`:
- Around line 49-56: The JIRA input validation in this script checks the format
using grep with the regex pattern but fails to stop execution when validation
fails. Currently, after logging the error message for an invalid JIRA ticket
format, the script continues and passes the malformed JIRA_INPUT to the jira
issue view command, creating a security vulnerability. Add an exit statement or
equivalent control flow termination immediately after the error log within the
validation block to ensure that execution stops and the jira CLI command is
never invoked with invalid input, thus enforcing the SEC-01 validation
requirement at the system boundary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 950c36e4-7691-4160-85b3-d0607c31f38a
📒 Files selected for processing (1)
hyperfleet-devtools/skills/e2e-debug/SKILL.md
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
Summary
/e2e-debugskill to thehyperfleet-devtoolsplugin — an AI-powered forensic debugger that automates root cause analysis of failed E2E CI pipeline runsWhat it does
The skill runs a 6-step forensic workflow:
finished.json, cross-run comparison (commit, chart versions, GKE cluster version, node names)gcloud container operations list), maintenance policy, pod health, Sentinel metricsKey design decisions
since=/until=, PR queries usemerged:>, gcloud operations use date filters. No--limitflags anywheregcloud container operations listfinds UPGRADE_NODES overlapping the test window (see HYPERFLEET-1225)ghCLI alone. kubectl/gcloud add live cluster validation but are optional. Confidence capped at MEDIUM if kubectl is available but not usedNegative scenarios handled
finished.json, reports statusall-resources.txt→ notes "node assignments unknown" and proceedsFiles changed
hyperfleet-devtools/skills/e2e-debug/SKILL.mdhyperfleet-devtools/skills/e2e-debug/references/known-failure-patterns.mdhyperfleet-devtools/skills/e2e-debug/references/ci-quick-reference.mdhyperfleet-devtools/.claude-plugin/plugin.jsonhyperfleet-devtools/README.md.claude-plugin/marketplace.jsonAGENTS.mdhyperfleet-devtools/docs/Test plan
claude --plugin-dir ./hyperfleet-devtools/hyperfleet-devtools:e2e-debugin available skillslatest-build.txtand resolve