fix(pgsql): Escape Cypher Characters in Literals - BED-7883 by LawsonWillard · Pull Request #71 · SpecterOps/DAWGS

LawsonWillard · 2026-05-05T02:34:06Z

Description

Escapes cypher literals at the PGSQL translation layer.
Adds unit tests to ensure escaping works as expected.

Resolves: BED-7883

Type of Change

Bug fix (a change that fixes an issue)

Testing

Added new unit tests

Unit tests added / updated
Integration tests added / updated
Manual integration tests run (go test -tags manual_integration ./integration/...)

Screenshots (if appropriate):

Driver Impact

PostgreSQL driver (drivers/pg)
Neo4j driver (drivers/neo4j)

Checklist

Code is formatted
All existing tests pass
go.mod / go.sum are up to date if dependencies changed

Summary by CodeRabbit

Bug Fixes
- String literals now properly escape special characters (backslashes and single quotes) when formatted in Cypher.
- String escape sequences are now correctly decoded during translation from Cypher to PostgreSQL.
Tests
- Added comprehensive test coverage for string literal escaping validation.
- Added test coverage for escape sequence decoding including various escape types and quote styles.

…(BED-7883

coderabbitai · 2026-05-05T02:34:18Z

Walkthrough

The PR adds proper escaping for Cypher string literals and corresponding decoding during PostgreSQL translation. NewStringLiteral now escapes backslashes and single quotes; the formatter tests verify rendering with escaped literals; the PostgreSQL translator decodes Cypher escape sequences instead of simply stripping quotes.

Changes

String Literal Escaping and Decoding

Layer / File(s)	Summary
Escaping API `cypher/models/cypher/model.go`	`NewStringLiteral` now escapes backslashes and single quotes for Cypher single-quoted literals. `Literal` type documentation clarifies the escaping contract.
Format / Emit Tests `cypher/models/cypher/format/format_test.go`	`TestNewStringLiteral_Escaping` verifies escaping of backslashes, quotes, and combinations. `TestNewStringLiteral_InQuery` asserts escaped literals render correctly in full Cypher queries.
PostgreSQL Decoding `cypher/models/pgsql/translate/translator.go`	String literal translation path now decodes Cypher escape sequences (backslash, single/double quotes, control characters) via new `decodeCypherStringLiteral` helper instead of blindly stripping quotes.
Decoding Tests `cypher/models/pgsql/translate/translator_test.go`	`TestDecodeCypherStringLiteral` covers successful decoding of escaped sequences and error handling for invalid quote styles, dangling escapes, and malformed literals.

Sequence Diagram

sequenceDiagram
    participant User
    participant CypherModel as Cypher Model<br/>(NewStringLiteral)
    participant Formatter as Cypher Formatter<br/>(NewCypherEmitter)
    participant Translator as PostgreSQL Translator<br/>(decodeCypherStringLiteral)
    participant PG as PostgreSQL Literal

    User->>CypherModel: Input string with quotes/backslashes
    CypherModel->>CypherModel: Escape \\ and ' for Cypher
    CypherModel-->>User: Literal with escaped value
    
    User->>Formatter: Literal in query context
    Formatter->>Formatter: Render with surrounding quotes
    Formatter-->>User: Cypher query string with escaped literal
    
    User->>Translator: Cypher escaped literal from query
    Translator->>Translator: Decode escape sequences
    Translator->>Translator: Validate and strip quotes
    Translator->>PG: Decoded string value
    PG-->>User: PostgreSQL literal

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, skip, and jump through strings we go,
Escaping quotes with a careful flow,
From Cypher's grasp to Postgres free,
Each backslash decoded, just right—hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing Cypher character escaping in PostgreSQL literals, with a reference to the related ticket.
Description check	✅ Passed	The description follows the template structure, includes the resolved ticket number, selects appropriate change type, documents testing additions, identifies driver impact, and confirms checklist items.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch BED-7883

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cypher/models/pgsql/translate/translator.go`:
- Around line 479-500: The decodeCypherStringLiteral function currently rejects
\u and \U Unicode escapes; update the escape handling in translator.go (inside
decodeCypherStringLiteral switch) to support '\u' followed by exactly 4 hex
digits and '\U' followed by exactly 8 hex digits: validate there are enough
bytes remaining, parse the hex digits, return an error on non-hex digits or
incomplete sequences, validate the resulting code point is a legal Unicode
scalar (<= 0x10FFFF and not a surrogate 0xD800–0xDFFF), and append the
corresponding UTF-8 encoding to the output buffer; keep existing error behavior
for other invalid escapes and update tests that previously expected failure for
\u/\U.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a0681da-6b00-4182-93b8-3dd9d31aae5e

📥 Commits

Reviewing files that changed from the base of the PR and between 1eace19 and ac616cc.

📒 Files selected for processing (4)

cypher/models/cypher/format/format_test.go
cypher/models/cypher/model.go
cypher/models/pgsql/translate/translator.go
cypher/models/pgsql/translate/translator_test.go

coderabbitai · 2026-05-05T02:37:39Z

+		switch c := body[i+1]; c {
+		case '\\', '\'', '"':
+			b.WriteByte(c)
+			i++
+		case 'b', 'B':
+			b.WriteByte('\b')
+			i++
+		case 'f', 'F':
+			b.WriteByte('\f')
+			i++
+		case 'n', 'N':
+			b.WriteByte('\n')
+			i++
+		case 'r', 'R':
+			b.WriteByte('\r')
+			i++
+		case 't', 'T':
+			b.WriteByte('\t')
+			i++
+		default:
+			return "", fmt.Errorf("invalid escape \\%c", c)
+		}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does openCypher support \uXXXXand\UXXXXXXXX escapes in quoted string literals, and should they decode to Unicode code points?

💡 Result:

Yes. openCypher (Cypher string literals) supports Unicode escapes of the form \uXXXX and \UXXXXXXXX inside quoted string literals, and those escapes should decode to the Unicode code point they represent. Evidence: - Cypher/OpenCypher string literal escape sequences include \uxxxx, defined as “Unicode UTF-16 code point (4 hex digits must follow the \u)” [1]. This corresponds to U+XXXX code points in the Basic Multilingual Plane (BMP). - openCypher lexer grammar (ANTLR) for string escapes includes both “\u” followed by 4 hex digits and “\U” followed by 8 hex digits as valid EscapeSequence forms [2]. - The openCypher/Cypher grammars therefore treat these sequences as escapes within the string literal tokenization stage; they are intended to represent the corresponding Unicode character, i.e., decode to the code point indicated by the hex digits [1][2]. Practical implication: - In openCypher, you should write \uXXXX for BMP code points and \UXXXXXXXX for non-BMP code points, expecting them to produce the character with that Unicode code point (not the literal characters backslash-u / backslash-U) [1][2].

Citations:

1: https://neo4j.com/docs/cypher-manual/current/values-and-types/boolean-numeric-string/

2: https://github.com/memgraph/memgraph/blob/ba4d7c3a/src/query/frontend/opencypher/grammar/CypherLexer.g4

🏁 Script executed:

# First, let's find and examine the file find . -name "translator.go" -type f | head -20

Repository: SpecterOps/DAWGS

Length of output: 106

🏁 Script executed:

# Search for the decodeCypherStringLiteral function to understand its full context rg -n "decodeCypherStringLiteral" --type go -B 5 -A 30

Repository: SpecterOps/DAWGS

Length of output: 6588

🏁 Script executed:

# Look for Literal contract or documentation about string escapes rg -n "Literal.*contract|Unicode.*escape|escape.*sequence" --type go -i | head -30

Repository: SpecterOps/DAWGS

Length of output: 584

🏁 Script executed:

# Examine the Literal contract around line 818 sed -n '770,830p' ./cypher/models/cypher/model.go

Repository: SpecterOps/DAWGS

Length of output: 2804

🏁 Script executed:

# Also search for any documentation about string literal escapes in the Cypher model rg -n "escape|Literal" ./cypher/models/cypher/model.go -B 2 -A 2 | head -50

Repository: SpecterOps/DAWGS

Length of output: 2196

decodeCypherStringLiteral rejects documented Unicode escapes (\u / \U).

The Literal contract documents \uXXXX and \UXXXXXXXX as valid escape sequences in source-form string literals, but the decoder returns invalid escape for these. This breaks translation for otherwise valid Cypher string literals containing Unicode escapes.

The function currently handles \\, \', ", \b, \f, \n, \r, and \t, but has no cases for u or U. The proposed fix adds support for both Unicode escape forms with proper validation: \uXXXX for 4-digit hex code points and \UXXXXXXXX for 8-digit code points, including checks for incomplete escapes, invalid hex digits, and invalid Unicode code points.

Note: Existing tests explicitly expect these escapes to fail and will need updating.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cypher/models/pgsql/translate/translator.go` around lines 479 - 500, The decodeCypherStringLiteral function currently rejects \u and \U Unicode escapes; update the escape handling in translator.go (inside decodeCypherStringLiteral switch) to support '\u' followed by exactly 4 hex digits and '\U' followed by exactly 8 hex digits: validate there are enough bytes remaining, parse the hex digits, return an error on non-hex digits or incomplete sequences, validate the resulting code point is a legal Unicode scalar (<= 0x10FFFF and not a surrogate 0xD800–0xDFFF), and append the corresponding UTF-8 encoding to the output buffer; keep existing error behavior for other invalid escapes and update tests that previously expected failure for \u/\U.

LawsonWillard added 3 commits April 28, 2026 13:06

escape special characters in string literals

653c2fc

Merge branch 'main' of github.com:SpecterOps/DAWGS into BED-7883

c99ba25

fix(pgsql): equality match against properties containing backslashes …

ac616cc

…(BED-7883

LawsonWillard self-assigned this May 5, 2026

LawsonWillard added bug Something isn't working go Pull requests that update go code labels May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71

fix(pgsql): Escape Cypher Characters in Literals - BED-7883#71
LawsonWillard wants to merge 3 commits intomainfrom
BED-7883

LawsonWillard commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LawsonWillard commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Screenshots (if appropriate):

Driver Impact

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LawsonWillard commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading