Regular Expressions

Overview

This page contains recommendations for using regular expressions.

General

  • Do not use regular expressions if there is a clean non-regex solution, for example, searching for a substring or using if conditions.

  • Use regular expression engines that provide linear time expression matching at least for user-provided regular expressions or matching "hard-coded" expressions against user-controlled data, see the Linear time regular expression matching implementation section.

chevron-rightClarificationhashtag

Many regex engines support backtracking that causes them to work very slowly in some cases (exponentially related to input size), see the Vulnerability Mitigation: Regular Expression Denial of Service (ReDoS) page.

  • Do not use multi-line matching mode in regexes that are used for validation. Otherwise, make sure that full string matching ^...$ works as expected or rewrite regexes using more specific expressions like \A...\z.

    • Remember in some engines multi-line matching mode is a default mode, for example, the built-in regex engine in Ruby.

chevron-rightClarificationhashtag

In multi-line mode, the expressions with ^ and $ are matched differently. For example, $ matches not only before the end of the string but also at the end of each line. So, if there is a validation that uses a regular expression in multi-line mode, an attacker can use a new line \x0a to bypass this validation. Consider a regular expression matching in Python from the snippet below.

The regex from the snippet matches full strings containing numbers from 1 to 3 digits long. However, enabling multi-line mode completely changes this behaviour.

As can be seen, in multi-line mode the string 137\nabc will be successfully matched. To avoid this behaviour, disable multi-line mode (this is the preferred solution) or rewrite the regex using \A and \Z:

  • Implement input validation for strings for matching, at least for string length and allowed characters, see the Input Validation page.

  • Use the following practices to simplify regular expressions and reduce the likelihood of problems with catastrophic backtracking:

    • Avoid nested quantifiers, for example (a+)+.

    • Try to be as precise as possible and avoid the . pattern.

    • Use reasonable ranges, for example {1,10}, for repeating patterns instead of unbounded * and + patterns.

    • Simplify character ranges, for example [ab] instead of [a-z0-9].

chevron-rightDetecting Catastrophic backtrackinghashtag

You can use doyensec/regexploitarrow-up-right to detect Catastrophic backtracking in your regexes that lead to ReDoS.

regexploit does not guarantee the detection of 100% vulnerable regexes, this is just one of the relatively easy ways to check your regex

Linear time regular expression matching implementation

There is the re2arrow-up-right engine that provides linear time expression matching. Try to find a library that is based on the re2arrow-up-right engine.

Use the regexparrow-up-right package that uses the re2arrow-up-right engine.

References

Last updated