# Path Traversal

## Overview

Path traversal (or directory traversal) is a vulnerability that allows an attacker to read or write arbitrary files on the server that is running an application.

For example, consider an application that loads images via some HTML like the following:

```html
<img src="/load?filename=218.png">
```

The `/load` URL takes the `filename` parameter and returns the contents of the specified file. The image files themselves are stored on disk in the location `/var/www/images/`. To return an image, the application:

1. appends the requested filename `218.png` to the base directory `/var/www/images/`,
2. uses a filesystem API to read the contents of the `/var/www/images/218.png` file.

This behaviour can be abused by an attacker. An attacker can request the following URL to retrieve an arbitrary file from the server:

```
https://vulnerable-website.local/load?filename=../../../etc/passwd
```

This causes the application to read from the following file path:

```
/var/www/images/../../../etc/passwd
```

As a result, the application will return to an attacker a content of the `/etc/passwd`.

You can find more details at [PortSwigger Web Security Academy: Directory traversal](https://portswigger.net/web-security/file-path-traversal).

This page contains recommendations for the implementation of protection against path traversal attacks.

## General

<div align="left"><img src="https://1795604890-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaH8j4W1MtabOUlUc8Trn%2Fuploads%2Fgit-blob-b41291c03c4de901e1f0faa235c5ad68838b2947%2Ftype-base-icon.svg?alt=media" alt=""></div>

* Do **not** allow a user to control data that is passed to the file system API.
* Generate random folder/file names using a UUID or random generator instead of relying on user-provided names, see the [Cryptography: Universal Unique Identifier (UUID)](https://0xn3va.gitbook.io/application-security-handbook/web-application/cryptography/universal-unique-identifier-uuid) and [Cryptography: Random Generators](https://0xn3va.gitbook.io/application-security-handbook/web-application/cryptography/random-generators) pages.
* If data can be controlled by a user, implement comprehensive input validation for all data that is passed to the file system API, see the [Input Validation](https://0xn3va.gitbook.io/application-security-handbook/web-application/input-validation) page.
  * Use an allow list of paths to validate user-provided data if possible.
  * Include only alphanumeric characters in an allow list of characters.
  * If it is necessary to use special characters such as `.` or `/`, prevent the use of combinations represented as regular expressions below.

    ```
    \A\.\z
    \A\.\.\z
    \A\.\.[/\\]
    [/\\]\.\.\z
    [/\\]\.\.[/\\]
    \A/
    \A~
    \n
    \r
    ```
  * Do **not** rely solely on a block list validation, it can be bypassed in many cases.
* Make sure the canonicalized path starts with the expected base directory, see the [Implementing the canonical path validation](#implementing-the-canonical-path-validation) section.
* Methods used to construct file paths can have non-intuitive behaviour. Make sure the methods you use work as expected with data like this:

  ```
  .
  ..
  /
  ~/some/path
  /etc/passwd
  ../../../etc/passwd
  /../../../etc/passwd
  ../../../../../../../../../../../../etc/passwd
  ```

<details>

<summary>Clarification</summary>

For example, the `Pathname.join` method in Ruby, which joins pathnames, handles absolute names unintuitively.

```ruby
require 'pathname'

p = Pathname.new('tmp')

user_controlled_input = 'etc/passwd'
print(p.join('log', user_controlled_input, 'foo'))
# => tmp/log/etc/passwd/foo

user_controlled_input = '/etc/passwd'
print(p.join('log', user_controlled_input, ''))
# => /etc/passwd
```

As can be seen, if `user_controller_input` contains an absolute path, `Pathname.join` will ignore everything up to the argument with the absolute path. In other words, it will allow an attacker to craft an arbitrary path.

</details>

<div align="left"><img src="https://1795604890-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FaH8j4W1MtabOUlUc8Trn%2Fuploads%2Fgit-blob-4b891edb8f5a26f439d757f90613f83f54c2107c%2Ftype-advanced-icon.svg?alt=media" alt=""></div>

* Use a sandbox to obtain or save data.

## Implementing the canonical path validation

{% tabs %}
{% tab title="Go" %}
Use the [path.Join](https://pkg.go.dev/path#Join) function to get a canonical path.

```go
package main

import (
    "fmt"
    "path"
    "strings"
)

func inTrustedBasePath(basePath string, userControlledPath string) bool {
    basePath = path.Join(basePath)
    fullPath := path.Join(basePath, userControlledPath)
    if !strings.HasPrefix(fullPath, basePath+"/") {
        return false
    }
    return true
}

func main() {
    basePath := "/tmp/path/to/base/folder"
    userControlledData := "nothing/dangerous.txt"
    fmt.Println(inTrustedBasePath(basePath, userControlledData))
    // => true

    userControlledData = "../../../path/traversal"
    fmt.Println(inTrustedBasePath(basePath, userControlledData))
    // => false
}
```

{% endtab %}

{% tab title="Java" %}
Use the [File.getCanonicalPath](https://docs.oracle.com/javase/8/docs/api/java/io/File.html#getCanonicalPath--) method to get a canonical path.

```java
import java.io.*;

public class CanonicalPathValidation {

    public static void main(String []args){
        String basePath = "/tmp/path/to/base/folder";
        String userControlledData = "nothing/dangerous.txt";
        System.out.println(inTrustedBasePath(basePath, userControlledData));
        // => true

        String basePath = "/tmp/path/to/base/folder";
        String userControlledData = "../../../path/traversal";
        System.out.println(inTrustedBasePath(basePath, userControlledData));
        // => false
    }

    private static boolean inTrustedBasePath(String basePath, String userControlledPath) {
        try {
            File file = new File(basePath, userControlledPath);
            if (!file.getCanonicalPath().startsWith(basePath)) {
                return false;
            }
            return true;
        } catch (IOException e) {
            return false;
        }
    }
}
```

{% endtab %}

{% tab title="Python" %}
Use the [os.path.realpath](https://docs.python.org/3/library/os.path.html#os.path.realpath) function to get a canonical path.

```python
import os
from pathlib import Path


def in_trusted_base_path(base_path: str, user_controlled_path: str) -> bool:
    base_path = Path(base_path)
    full_path = Path(base_path, user_controlled_path)
    real_path = os.path.realpath(full_path)
    return Path(real_path).is_relative_to(base_path)


def main() -> None:
    base_path = '/tmp/path/to/base/folder'
    user_controlled_data = 'nothing/dangerous.txt'
    print(in_trusted_base_path(base_path, user_controlled_data))
    # => True

    base_path = '/tmp/path/to/base/folder'
    user_controlled_data = '../../../path/traversal'
    print(in_trusted_base_path(base_path, user_controlled_data))
    # => False
```

{% endtab %}
{% endtabs %}

## References

* [GitLab Docs: Secure coding development guidelines](https://docs.gitlab.com/ee/development/secure_coding_guidelines.html#path-traversal-guidelines)
