Skip to content

gh-148762: speed up SRE_AT_BEGINNING_LINE regexes#148778

Open
haampie wants to merge 1 commit intopython:mainfrom
haampie:hs/fix/multiline-caret
Open

gh-148762: speed up SRE_AT_BEGINNING_LINE regexes#148778
haampie wants to merge 1 commit intopython:mainfrom
haampie:hs/fix/multiline-caret

Conversation

@haampie
Copy link
Copy Markdown
Contributor

@haampie haampie commented Apr 19, 2026

SRE(search) has an early exit for SRE_AT_BEGINNING and
SRE_AT_BEGINNING_STRING, but lacks a similar fast-forward for
SRE_AT_BEGINNING_LINE. This means that a regex of the following form
is slow:

re.compile("^foo", re.MULTILINE)

The current implementation does a character-by-character loop that calls
SRE(match) each iteration, leading to a lot of overhead.

This commit

  • ensures SRE(match) is only called right after a newline
  • optimizes fast-forwarding to the next newline by calling memchr in
    the UCS-1 case

This can lead to 10x or even 100x speedups in the no-match case with
long lines, while not causing overhead in the case of short lines.

`SRE(search)` has an early exit for `SRE_AT_BEGINNING` and
`SRE_AT_BEGINNING_STRING`, but lacks fast-forward for
`SRE_AT_BEGINNING_LINE`. This means that a regex of the following form
is slow:

```
re.compile("^foo", re.MULTILINE)
```

The current implementation does a character-by-character loop that calls
`SRE(match)` each time. This is rather expensive function call.

This commit

* ensures `SRE(match)` is only called right after a newline
* optimizes fast-forwarding to the next newline by calling `memchr` in
  the UCS1 case

This can lead to 10x or even 100x speedups in the no-match case with
long lines, while not causing overhead in the case of short lines.

Signed-off-by: Harmen Stoppels <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant