Skip to content

fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71

Open
LawsonWillard wants to merge 3 commits intomainfrom
BED-7883
Open

fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71
LawsonWillard wants to merge 3 commits intomainfrom
BED-7883

Conversation

@LawsonWillard
Copy link
Copy Markdown
Contributor

@LawsonWillard LawsonWillard commented May 5, 2026

Description

  • Escapes cypher literals at the PGSQL translation layer.
  • Adds unit tests to ensure escaping works as expected.

Resolves: BED-7883

Type of Change

  • Bug fix (a change that fixes an issue)

Testing

Added new unit tests

  • Unit tests added / updated
  • Integration tests added / updated
  • Manual integration tests run (go test -tags manual_integration ./integration/...)

Screenshots (if appropriate):

Driver Impact

  • PostgreSQL driver (drivers/pg)
  • Neo4j driver (drivers/neo4j)

Checklist

  • Code is formatted
  • All existing tests pass
  • go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

  • Bug Fixes

    • String literals now properly escape special characters (backslashes and single quotes) when formatted in Cypher.
    • String escape sequences are now correctly decoded during translation from Cypher to PostgreSQL.
  • Tests

    • Added comprehensive test coverage for string literal escaping validation.
    • Added test coverage for escape sequence decoding including various escape types and quote styles.

@LawsonWillard LawsonWillard self-assigned this May 5, 2026
@LawsonWillard LawsonWillard added bug Something isn't working go Pull requests that update go code labels May 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Walkthrough

The PR adds proper escaping for Cypher string literals and corresponding decoding during PostgreSQL translation. NewStringLiteral now escapes backslashes and single quotes; the formatter tests verify rendering with escaped literals; the PostgreSQL translator decodes Cypher escape sequences instead of simply stripping quotes.

Changes

String Literal Escaping and Decoding

Layer / File(s) Summary
Escaping API
cypher/models/cypher/model.go
NewStringLiteral now escapes backslashes and single quotes for Cypher single-quoted literals. Literal type documentation clarifies the escaping contract.
Format / Emit Tests
cypher/models/cypher/format/format_test.go
TestNewStringLiteral_Escaping verifies escaping of backslashes, quotes, and combinations. TestNewStringLiteral_InQuery asserts escaped literals render correctly in full Cypher queries.
PostgreSQL Decoding
cypher/models/pgsql/translate/translator.go
String literal translation path now decodes Cypher escape sequences (backslash, single/double quotes, control characters) via new decodeCypherStringLiteral helper instead of blindly stripping quotes.
Decoding Tests
cypher/models/pgsql/translate/translator_test.go
TestDecodeCypherStringLiteral covers successful decoding of escaped sequences and error handling for invalid quote styles, dangling escapes, and malformed literals.

Sequence Diagram

sequenceDiagram
    participant User
    participant CypherModel as Cypher Model<br/>(NewStringLiteral)
    participant Formatter as Cypher Formatter<br/>(NewCypherEmitter)
    participant Translator as PostgreSQL Translator<br/>(decodeCypherStringLiteral)
    participant PG as PostgreSQL Literal

    User->>CypherModel: Input string with quotes/backslashes
    CypherModel->>CypherModel: Escape \\ and ' for Cypher
    CypherModel-->>User: Literal with escaped value
    
    User->>Formatter: Literal in query context
    Formatter->>Formatter: Render with surrounding quotes
    Formatter-->>User: Cypher query string with escaped literal
    
    User->>Translator: Cypher escaped literal from query
    Translator->>Translator: Decode escape sequences
    Translator->>Translator: Validate and strip quotes
    Translator->>PG: Decoded string value
    PG-->>User: PostgreSQL literal
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, skip, and jump through strings we go,
Escaping quotes with a careful flow,
From Cypher's grasp to Postgres free,
Each backslash decoded, just right—hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing Cypher character escaping in PostgreSQL literals, with a reference to the related ticket.
Description check ✅ Passed The description follows the template structure, includes the resolved ticket number, selects appropriate change type, documents testing additions, identifies driver impact, and confirms checklist items.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch BED-7883

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cypher/models/pgsql/translate/translator.go`:
- Around line 479-500: The decodeCypherStringLiteral function currently rejects
\u and \U Unicode escapes; update the escape handling in translator.go (inside
decodeCypherStringLiteral switch) to support '\u' followed by exactly 4 hex
digits and '\U' followed by exactly 8 hex digits: validate there are enough
bytes remaining, parse the hex digits, return an error on non-hex digits or
incomplete sequences, validate the resulting code point is a legal Unicode
scalar (<= 0x10FFFF and not a surrogate 0xD800–0xDFFF), and append the
corresponding UTF-8 encoding to the output buffer; keep existing error behavior
for other invalid escapes and update tests that previously expected failure for
\u/\U.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a0681da-6b00-4182-93b8-3dd9d31aae5e

📥 Commits

Reviewing files that changed from the base of the PR and between 1eace19 and ac616cc.

📒 Files selected for processing (4)
  • cypher/models/cypher/format/format_test.go
  • cypher/models/cypher/model.go
  • cypher/models/pgsql/translate/translator.go
  • cypher/models/pgsql/translate/translator_test.go

Comment on lines +479 to +500
switch c := body[i+1]; c {
case '\\', '\'', '"':
b.WriteByte(c)
i++
case 'b', 'B':
b.WriteByte('\b')
i++
case 'f', 'F':
b.WriteByte('\f')
i++
case 'n', 'N':
b.WriteByte('\n')
i++
case 'r', 'R':
b.WriteByte('\r')
i++
case 't', 'T':
b.WriteByte('\t')
i++
default:
return "", fmt.Errorf("invalid escape \\%c", c)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does openCypher support \uXXXXand\UXXXXXXXX escapes in quoted string literals, and should they decode to Unicode code points?

💡 Result:

Yes. openCypher (Cypher string literals) supports Unicode escapes of the form \uXXXX and \UXXXXXXXX inside quoted string literals, and those escapes should decode to the Unicode code point they represent. Evidence: - Cypher/OpenCypher string literal escape sequences include \uxxxx, defined as “Unicode UTF-16 code point (4 hex digits must follow the \u)” [1]. This corresponds to U+XXXX code points in the Basic Multilingual Plane (BMP). - openCypher lexer grammar (ANTLR) for string escapes includes both “\u” followed by 4 hex digits and “\U” followed by 8 hex digits as valid EscapeSequence forms [2]. - The openCypher/Cypher grammars therefore treat these sequences as escapes within the string literal tokenization stage; they are intended to represent the corresponding Unicode character, i.e., decode to the code point indicated by the hex digits [1][2]. Practical implication: - In openCypher, you should write \uXXXX for BMP code points and \UXXXXXXXX for non-BMP code points, expecting them to produce the character with that Unicode code point (not the literal characters backslash-u / backslash-U) [1][2].

Citations:


🏁 Script executed:

# First, let's find and examine the file
find . -name "translator.go" -type f | head -20

Repository: SpecterOps/DAWGS

Length of output: 106


🏁 Script executed:

# Search for the decodeCypherStringLiteral function to understand its full context
rg -n "decodeCypherStringLiteral" --type go -B 5 -A 30

Repository: SpecterOps/DAWGS

Length of output: 6588


🏁 Script executed:

# Look for Literal contract or documentation about string escapes
rg -n "Literal.*contract|Unicode.*escape|escape.*sequence" --type go -i | head -30

Repository: SpecterOps/DAWGS

Length of output: 584


🏁 Script executed:

# Examine the Literal contract around line 818
sed -n '770,830p' ./cypher/models/cypher/model.go

Repository: SpecterOps/DAWGS

Length of output: 2804


🏁 Script executed:

# Also search for any documentation about string literal escapes in the Cypher model
rg -n "escape|Literal" ./cypher/models/cypher/model.go -B 2 -A 2 | head -50

Repository: SpecterOps/DAWGS

Length of output: 2196


decodeCypherStringLiteral rejects documented Unicode escapes (\u / \U).

The Literal contract documents \uXXXX and \UXXXXXXXX as valid escape sequences in source-form string literals, but the decoder returns invalid escape for these. This breaks translation for otherwise valid Cypher string literals containing Unicode escapes.

The function currently handles \\, \', ", \b, \f, \n, \r, and \t, but has no cases for u or U. The proposed fix adds support for both Unicode escape forms with proper validation: \uXXXX for 4-digit hex code points and \UXXXXXXXX for 8-digit code points, including checks for incomplete escapes, invalid hex digits, and invalid Unicode code points.

Note: Existing tests explicitly expect these escapes to fail and will need updating.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cypher/models/pgsql/translate/translator.go` around lines 479 - 500, The
decodeCypherStringLiteral function currently rejects \u and \U Unicode escapes;
update the escape handling in translator.go (inside decodeCypherStringLiteral
switch) to support '\u' followed by exactly 4 hex digits and '\U' followed by
exactly 8 hex digits: validate there are enough bytes remaining, parse the hex
digits, return an error on non-hex digits or incomplete sequences, validate the
resulting code point is a legal Unicode scalar (<= 0x10FFFF and not a surrogate
0xD800–0xDFFF), and append the corresponding UTF-8 encoding to the output
buffer; keep existing error behavior for other invalid escapes and update tests
that previously expected failure for \u/\U.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant