👨‍💻
Application Security Handbook
  • Application Security Handbook
  • Web Application
    • Authentication
      • Authentication with Login and Password
      • Authentication with Phone Number
      • OAuth 2.0 Authentication
      • Multi-factor Authentication
      • Default Passwords
      • Password Change
      • Password Policy
      • Password Reset
      • Password Storage
      • One Time Password (OTP)
      • Email Address Confirmation
    • Authorization
    • Concept of Trusted Devices
    • Content Security Policy (CSP)
    • Cookie Security
    • Cryptography
      • Cryptographic Keys Management
      • Encryption
      • Hash-based Message Authentication Code (HMAC)
      • Hashing
      • Random Generators
      • Universal Unique Identifier (UUID)
    • Error and Exception Handling
    • File Upload
    • Input Validation
    • JSON Web Token (JWT)
    • Logging and Monitoring
    • Output Encoding
    • Regular Expressions
    • Sensitive Data Management
    • Session Management
    • Transport Layer Protection
    • Vulnerability Mitigation
      • Brute-force
      • Command Injection
      • Cross-Site Request Forgery (CSRF)
      • Cross-Site Scripting (XSS)
      • Mass Parameter Assignment
      • Parameter Pollution
      • Path Traversal
      • Regular Expression Denial of Service (ReDoS)
      • SQL Injection (SQLi)
      • XML External Entity (XXE) Injection
Powered by GitBook
On this page
  • Overview
  • General
  • Linear time regular expression matching implementation
  • References
  1. Web Application

Regular Expressions

PreviousOutput EncodingNextSensitive Data Management

Last updated 1 year ago

Overview

This page contains recommendations for using regular expressions.

General

  • Do not use regular expressions if there is a clean non-regex solution, for example, searching for a substring or using if conditions.

Clarification
  • Do not use multi-line matching mode in regexes that are used for validation. Otherwise, make sure that full string matching ^...$ works as expected or rewrite regexes using more specific expressions like \A...\z.

    • Remember in some engines multi-line matching mode is a default mode, for example, the built-in regex engine in Ruby.

Clarification

In multi-line mode, the expressions with ^ and $ are matched differently. For example, $ matches not only before the end of the string but also at the end of each line. So, if there is a validation that uses a regular expression in multi-line mode, an attacker can use a new line \x0a to bypass this validation. Consider a regular expression matching in Python from the snippet below.

import re

p = re.compile(r'^\d{1,3}$')

p.match('137') is not None
# => True
p.match('1337') is not None
# => False
p.match('abc') is not None
# => False
p.match('137\nabc') is not None
# => False

The regex from the snippet matches full strings containing numbers from 1 to 3 digits long. However, enabling multi-line mode completely changes this behaviour.

import re

p = re.compile(r'^\d{1,3}$', re.MULTILINE)

p.match('137') is not None
# => True
p.match('1337') is not None
# => False
p.match('abc') is not None
# => False
p.match('137\nabc') is not None
# => True

As can be seen, in multi-line mode the string 137\nabc will be successfully matched. To avoid this behaviour, disable multi-line mode (this is the preferred solution) or rewrite the regex using \A and \Z:

import re

# preferred
p = re.compile(r'^\d{1,3}$')

p.match('137\nabc') is not None
# => False

# or
p = re.compile(r'\A\d{1,3}\Z', re.MULTILINE)

p.match('137\nabc') is not None
# => False
  • Use the following practices to simplify regular expressions and reduce the likelihood of problems with catastrophic backtracking:

    • Avoid nested quantifiers, for example (a+)+.

    • Try to be as precise as possible and avoid the . pattern.

    • Use reasonable ranges, for example {1,10}, for repeating patterns instead of unbounded * and + patterns.

    • Simplify character ranges, for example [ab] instead of [a-z0-9].

Detecting Catastrophic backtracking

regexploit does not guarantee the detection of 100% vulnerable regexes, this is just one of the relatively easy ways to check your regex

$ python3 -m venv .env
$ source .env/bin/activate
$ pip install regexploit
$ regexploit
v\w*_\w*_\w*$
Pattern: v\w*_\w*_\w*$
---
Worst-case complexity: 3 ⭐⭐⭐ (cubic)
Repeated character: [5f:_]
Final character to cause backtracking: [^WORD]
Example: 'v' + '_' * 3456 + '!'

Linear time regular expression matching implementation

package main

import (
    "fmt"
    "regexp"
)

func main() {
    inputData := "some text to match"
    match, err := regexp.MatchString("[a-z]{1,16}", inputData)
    if err == nil {
        fmt.Println("Match:", match)
    } else {
        fmt.Println("Error:", err)
    }
}
import com.google.re2j.Matcher;
import com.google.re2j.Pattern;

Pattern p = Pattern.compile("[a-z]{1,16}");
Matcher m = p.matcher("some text to match");
assertTrue(m.find());
var RE2 = require("re2");
var re = new RE2("[a-z]{1,16}");
var result = re.exec("some text to match");
console.log(result);
import re2

re2.compile('[a-z]{1,16}')
print(p.match('some text to match').string)

References

Use regular expression engines that provide linear time expression matching at least for user-provided regular expressions or matching "hard-coded" expressions against user-controlled data, see the section.

Many regex engines support backtracking that causes them to work very slowly in some cases (exponentially related to input size), see the page.

Implement input validation for strings for matching, at least for string length and allowed characters, see the page.

You can use to detect Catastrophic backtracking in your regexes that lead to ReDoS.

Log regex failures, especially if a regex is used for validation, see the page.

Comply with requirements from the page.

Use regular expression engines that provide linear time expression matching for matching all regular expressions, see the section.

There is the engine that provides linear time expression matching. Try to find a library that is based on the engine.

Use the package that uses the engine.

Use the package which is a port of C++ to pure Java.

Use the package which is a wrapper for the engine.

Use the package which is a wrapper for the engine.

Vulnerability Mitigation: Regular Expression Denial of Service (ReDoS)
Input Validation
doyensec/regexploit
Logging and Monitoring
Error and Exception Handling
re2
re2
regexp
re2
re2j
re2
node-re2
re2
google-re2
re2
GitLab Docs: Secure coding development guidelines - Regular Expressions guidelines
Linear time regular expression matching implementation
Linear time regular expression matching implementation