Regular Expressions
Overview
This page contains recommendations for using regular expressions.
General
Do not use regular expressions if there is a clean non-regex solution, for example, searching for a substring or using
ifconditions.Use regular expression engines that provide linear time expression matching at least for user-provided regular expressions or matching "hard-coded" expressions against user-controlled data, see the Linear time regular expression matching implementation section.
Clarification
Many regex engines support backtracking that causes them to work very slowly in some cases (exponentially related to input size), see the Vulnerability Mitigation: Regular Expression Denial of Service (ReDoS) page.
Do not use
multi-linematching mode in regexes that are used for validation. Otherwise, make sure that full string matching^...$works as expected or rewrite regexes using more specific expressions like\A...\z.Remember in some engines
multi-linematching mode is a default mode, for example, the built-in regex engine in Ruby.
Clarification
In multi-line mode, the expressions with ^ and $ are matched differently. For example, $ matches not only before the end of the string but also at the end of each line. So, if there is a validation that uses a regular expression in multi-line mode, an attacker can use a new line \x0a to bypass this validation. Consider a regular expression matching in Python from the snippet below.
The regex from the snippet matches full strings containing numbers from 1 to 3 digits long. However, enabling multi-line mode completely changes this behaviour.
As can be seen, in multi-line mode the string 137\nabc will be successfully matched. To avoid this behaviour, disable multi-line mode (this is the preferred solution) or rewrite the regex using \A and \Z:
Implement input validation for strings for matching, at least for string length and allowed characters, see the Input Validation page.
Use the following practices to simplify regular expressions and reduce the likelihood of problems with catastrophic backtracking:
Avoid nested quantifiers, for example
(a+)+.Try to be as precise as possible and avoid the
.pattern.Use reasonable ranges, for example
{1,10}, for repeating patterns instead of unbounded*and+patterns.Simplify character ranges, for example
[ab]instead of[a-z0-9].
Detecting Catastrophic backtracking
You can use doyensec/regexploit to detect Catastrophic backtracking in your regexes that lead to ReDoS.
regexploit does not guarantee the detection of 100% vulnerable regexes, this is just one of the relatively easy ways to check your regex
Log regex failures, especially if a regex is used for validation, see the Logging and Monitoring page.
Comply with requirements from the Error and Exception Handling page.
Use regular expression engines that provide linear time expression matching for matching all regular expressions, see the Linear time regular expression matching implementation section.
Linear time regular expression matching implementation
There is the re2 engine that provides linear time expression matching. Try to find a library that is based on the re2 engine.
Use the google-re2 package which is a wrapper for the re2 engine.
References
Last updated