Skip to content

Improve error handling and metrics visibility#611

Open
satyakigh wants to merge 2 commits into
mainfrom
error-metrics
Open

Improve error handling and metrics visibility#611
satyakigh wants to merge 2 commits into
mainfrom
error-metrics

Conversation

@satyakigh

@satyakigh satyakigh commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

This PR refactors the error-handling and telemetry plumbing to surface richer, structured information when errors occur, while tightening the rules around what counts as a "client error" (and therefore should be suppressed as a fault). Stack-trace sanitization is extracted into a reusable module, and AWS-aware classification is propagated into telemetry attributes.

  1. New Sanitizer module (src/utils/Sanitizer.ts)
    Extracts the previously inline stack-sanitization logic from Errors.ts / ErrorStackInfo.ts into a standalone, reusable module:
  • sensitiveInfo() — returns strings to redact (install path segments, hostname(), homedir()).
  • sanitizeMessage() — normalizes path separators, redacts sensitive identity strings, redacts ARNs containing account IDs (arn:aws:), and redacts standalone 12-digit account IDs (<ACCOUNT_ID>).
  • Also adds an allowlist for the literals aws and cloudformation-languageserver so they're not redacted.
  1. ErrorStackInfo.ts rewritten
    Replaces the old determineSensitiveInfo() cache helper with the actual error-attribute extraction logic (moved out of Errors.ts):
  • extractLocationFromStack() — returns error.message and error.stack, sanitized.
  • errorAttributes() — adds error.origin, plus error.cause.message / error.cause.stack when a cause chain exists.
  • errorType() — emits error.type, error.code, error.http.status, AWS classification (error.aws.category, error.aws.http.status), and corresponding error.cause.* fields.
  1. Errors.ts — richer client-network detection and error introspection
  • Splits CLIENT_NETWORK_ERROR_PATTERNS into:
    • CLIENT_NETWORK_ERROR_CODES (errno-style: ECONNRESET, ETIMEDOUT, EPIPE, EHOSTUNREACH, ENETUNREACH, NGHTTP2_REFUSED_STREAM, etc., each with an explanatory comment).
    • CLIENT_NETWORK_ERROR_MESSAGE_PATTERNS (free-form text patterns for SSL/proxy issues).
  • isClientNetworkError() now also inspects the error's code and name fields, not just the message.
  • New helpers exported: extractRootCause() (walks commitError then cause), extractErrorCode() (handles code / Code / CODE / errno), extractHttpStatus() (handles $metadata.httpStatusCode, response.status, status).
  • Removes EBADF and the duplicate errno entries that were also pattern-matched by message.
  1. ScopedTelemetry.error() — always emit error type by default
    Previously error.type and error.code were only attached when the caller opted in. Now they are emitted unconditionally; captureErrorAttributes only controls whether the more verbose error.message / error.stack / cause attributes are included.

  2. FaultSuppression.ts — symbol-keyed flag, broader marking

  • Switches from a public suppressFault?: true property to a non-enumerable Symbol('SUPPRESS_FAULT') so the flag doesn't pollute the object's shape or get serialized.
  • markSuppressFault() / markIfClientError() now accept unknown and become no-ops on non-objects.
  • markIfClientError() now also suppresses errors classified by isClientNetworkError(), not just AWS client errors. Throttling (429) is explicitly not suppressed.
  1. AwsErrorMapper.ts — share network-error code set
    Network error names are now built by merging AWS_NETWORK_ERROR_NAMES (NetworkingError, TimeoutError) with the canonical CLIENT_NETWORK_ERROR_CODES set from Errors.ts, removing the previously duplicated errno list.

@satyakigh satyakigh requested a review from a team as a code owner June 16, 2026 19:34
@github-code-quality

Copy link
Copy Markdown

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/vitest

The overall coverage in the branch remains at 89%, unchanged from the branch.

Show a code coverage summary of the most impacted files.
File b12129f 164a383 +/-
src/telemetry/S...pedTelemetry.ts 88% 87% -1%
src/utils/FaultSuppression.ts 100% 100% 0%
src/utils/Errors.ts 97% 100% +3%
src/utils/AwsErrorMapper.ts 95% 98% +3%
src/utils/ErrorStackInfo.ts 80% 95% +15%
src/utils/Sanitizer.ts 0% 89% +89%

Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@kddejong kddejong left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Verified the telemetry data paths — no new user data exposure. errorType() now always emits error.type/error.code (structural metadata only, sanitized), and error.message/error.stack remain gated behind captureErrorAttributes. Import graph is a clean DAG, no circular dependency risk. Test coverage is comprehensive.

url: url,
responseType: 'arraybuffer',
proxy: getProxyConfig(),
maxRedirects: 7,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change in this PR?


const CLIENT_FAULT_CATEGORIES: ReadonlySet<AwsErrorCategory> = new Set(['credentials', 'network', 'permissions']);

export function isClientError(error: unknown): boolean {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are making the distinction, let's rename this to isAwsClientError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants