fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71
fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71LawsonWillard wants to merge 3 commits intomainfrom
Conversation
WalkthroughThe PR adds proper escaping for Cypher string literals and corresponding decoding during PostgreSQL translation. ChangesString Literal Escaping and Decoding
Sequence DiagramsequenceDiagram
participant User
participant CypherModel as Cypher Model<br/>(NewStringLiteral)
participant Formatter as Cypher Formatter<br/>(NewCypherEmitter)
participant Translator as PostgreSQL Translator<br/>(decodeCypherStringLiteral)
participant PG as PostgreSQL Literal
User->>CypherModel: Input string with quotes/backslashes
CypherModel->>CypherModel: Escape \\ and ' for Cypher
CypherModel-->>User: Literal with escaped value
User->>Formatter: Literal in query context
Formatter->>Formatter: Render with surrounding quotes
Formatter-->>User: Cypher query string with escaped literal
User->>Translator: Cypher escaped literal from query
Translator->>Translator: Decode escape sequences
Translator->>Translator: Validate and strip quotes
Translator->>PG: Decoded string value
PG-->>User: PostgreSQL literal
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cypher/models/pgsql/translate/translator.go`:
- Around line 479-500: The decodeCypherStringLiteral function currently rejects
\u and \U Unicode escapes; update the escape handling in translator.go (inside
decodeCypherStringLiteral switch) to support '\u' followed by exactly 4 hex
digits and '\U' followed by exactly 8 hex digits: validate there are enough
bytes remaining, parse the hex digits, return an error on non-hex digits or
incomplete sequences, validate the resulting code point is a legal Unicode
scalar (<= 0x10FFFF and not a surrogate 0xD800–0xDFFF), and append the
corresponding UTF-8 encoding to the output buffer; keep existing error behavior
for other invalid escapes and update tests that previously expected failure for
\u/\U.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a0681da-6b00-4182-93b8-3dd9d31aae5e
📒 Files selected for processing (4)
cypher/models/cypher/format/format_test.gocypher/models/cypher/model.gocypher/models/pgsql/translate/translator.gocypher/models/pgsql/translate/translator_test.go
| switch c := body[i+1]; c { | ||
| case '\\', '\'', '"': | ||
| b.WriteByte(c) | ||
| i++ | ||
| case 'b', 'B': | ||
| b.WriteByte('\b') | ||
| i++ | ||
| case 'f', 'F': | ||
| b.WriteByte('\f') | ||
| i++ | ||
| case 'n', 'N': | ||
| b.WriteByte('\n') | ||
| i++ | ||
| case 'r', 'R': | ||
| b.WriteByte('\r') | ||
| i++ | ||
| case 't', 'T': | ||
| b.WriteByte('\t') | ||
| i++ | ||
| default: | ||
| return "", fmt.Errorf("invalid escape \\%c", c) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does openCypher support \uXXXXand\UXXXXXXXX escapes in quoted string literals, and should they decode to Unicode code points?
💡 Result:
Yes. openCypher (Cypher string literals) supports Unicode escapes of the form \uXXXX and \UXXXXXXXX inside quoted string literals, and those escapes should decode to the Unicode code point they represent. Evidence: - Cypher/OpenCypher string literal escape sequences include \uxxxx, defined as “Unicode UTF-16 code point (4 hex digits must follow the \u)” [1]. This corresponds to U+XXXX code points in the Basic Multilingual Plane (BMP). - openCypher lexer grammar (ANTLR) for string escapes includes both “\u” followed by 4 hex digits and “\U” followed by 8 hex digits as valid EscapeSequence forms [2]. - The openCypher/Cypher grammars therefore treat these sequences as escapes within the string literal tokenization stage; they are intended to represent the corresponding Unicode character, i.e., decode to the code point indicated by the hex digits [1][2]. Practical implication: - In openCypher, you should write \uXXXX for BMP code points and \UXXXXXXXX for non-BMP code points, expecting them to produce the character with that Unicode code point (not the literal characters backslash-u / backslash-U) [1][2].
Citations:
- 1: https://neo4j.com/docs/cypher-manual/current/values-and-types/boolean-numeric-string/
- 2: https://github.com/memgraph/memgraph/blob/ba4d7c3a/src/query/frontend/opencypher/grammar/CypherLexer.g4
🏁 Script executed:
# First, let's find and examine the file
find . -name "translator.go" -type f | head -20Repository: SpecterOps/DAWGS
Length of output: 106
🏁 Script executed:
# Search for the decodeCypherStringLiteral function to understand its full context
rg -n "decodeCypherStringLiteral" --type go -B 5 -A 30Repository: SpecterOps/DAWGS
Length of output: 6588
🏁 Script executed:
# Look for Literal contract or documentation about string escapes
rg -n "Literal.*contract|Unicode.*escape|escape.*sequence" --type go -i | head -30Repository: SpecterOps/DAWGS
Length of output: 584
🏁 Script executed:
# Examine the Literal contract around line 818
sed -n '770,830p' ./cypher/models/cypher/model.goRepository: SpecterOps/DAWGS
Length of output: 2804
🏁 Script executed:
# Also search for any documentation about string literal escapes in the Cypher model
rg -n "escape|Literal" ./cypher/models/cypher/model.go -B 2 -A 2 | head -50Repository: SpecterOps/DAWGS
Length of output: 2196
decodeCypherStringLiteral rejects documented Unicode escapes (\u / \U).
The Literal contract documents \uXXXX and \UXXXXXXXX as valid escape sequences in source-form string literals, but the decoder returns invalid escape for these. This breaks translation for otherwise valid Cypher string literals containing Unicode escapes.
The function currently handles \\, \', ", \b, \f, \n, \r, and \t, but has no cases for u or U. The proposed fix adds support for both Unicode escape forms with proper validation: \uXXXX for 4-digit hex code points and \UXXXXXXXX for 8-digit code points, including checks for incomplete escapes, invalid hex digits, and invalid Unicode code points.
Note: Existing tests explicitly expect these escapes to fail and will need updating.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cypher/models/pgsql/translate/translator.go` around lines 479 - 500, The
decodeCypherStringLiteral function currently rejects \u and \U Unicode escapes;
update the escape handling in translator.go (inside decodeCypherStringLiteral
switch) to support '\u' followed by exactly 4 hex digits and '\U' followed by
exactly 8 hex digits: validate there are enough bytes remaining, parse the hex
digits, return an error on non-hex digits or incomplete sequences, validate the
resulting code point is a legal Unicode scalar (<= 0x10FFFF and not a surrogate
0xD800–0xDFFF), and append the corresponding UTF-8 encoding to the output
buffer; keep existing error behavior for other invalid escapes and update tests
that previously expected failure for \u/\U.
Description
Resolves: BED-7883
Type of Change
Testing
Added new unit tests
go test -tags manual_integration ./integration/...)Screenshots (if appropriate):
Driver Impact
drivers/pg)drivers/neo4j)Checklist
go.mod/go.sumare up to date if dependencies changedSummary by CodeRabbit
Bug Fixes
Tests