We analyzed 2M AI-agent requests. None asked for LLMs.txt.

11/30/20254 min readRajeev Kumar

We analyzed 2M AI-agent requests. None asked for LLMs.txt.

Over the past few months we’ve been analyzing AI-agent traffic across websites using SonicLinker.

Across roughly 2 million AI-agent requests per month, we expected to see at least some requests for /llms.txt, a proposed standard for helping LLMs understand website content.

But after monitoring traffic across dozens of websites:

We haven’t seen a single request for llms.txt**.**

Not from ChatGPT.
Not from Claude.
Not from Perplexity.
Not from other AI agents.

This surprised us, because llms.txt has been getting a lot of attention in discussions about AI-readable websites.

So we decided to look deeper into the data.

What is llms.txt**?**

llms.txt is a proposed convention for helping LLMs understand how to consume a website.

The idea is similar to robots.txt, which websites use to communicate with search engine crawlers.

A typical llms.txt file might:

  • describe the purpose of the website
  • list important documentation or pages
  • provide structured entry points for AI agents
  • guide LLMs toward machine-readable content

Example:

Example llms.txt

site: example.com

description: Documentation for Example API

main-docs: https://example.com/docs

api-reference: https://example.com/api

The hope is that AI agents would fetch /llms.txt and use it as a starting point for understanding the site.

However, real-world traffic tells a different story.

Our dataset

Our observations come from traffic data across SonicLinker customers’ websites.

MetricValue
Websites monitored100+
Monthly AI agent requests~2 million
Time windowMultiple months
Agents observedChatGPT, Claude, Perplexity, and others

The traffic includes requests generated by:

  • LLM browsing tools
  • AI search assistants
  • automated agent workflows
  • AI-powered developer tools

These agents fetch normal web pages while answering questions, researching topics, or retrieving documentation.

Why are we confident in this data

One common question when analyzing traffic logs is: Are you actually seeing all requests?

In our case, yes.

SonicLinker sits at the CDN edge, meaning every HTTP request passes through our infrastructure before reaching the website origin.

For each request, we capture:

  • full request path
  • user agent
  • headers
  • IP
  • response metadata

This includes all types of traffic:

  • humans
  • search engines
  • AI agents
  • bots
  • scrapers

Importantly, this happens before application servers or origin infrastructure.

So if an AI agent attempted to request:

/llms.txt

we would see it.

Across millions of AI-agent requests, we observed zero such requests.

What AI agents actually request

Instead of llms.txt, AI agents mostly behave like normal web clients.

The most common requests we see include:

  • /
  • /docs
  • /pricing
  • /blog
  • /features
  • /sitemap.xml
  • /robots.txt

In other words, agents usually fetch normal website pages and parse them directly.

This aligns with how many AI systems currently retrieve information:

  1. search the web
  2. open relevant pages
  3. extract useful content
  4. summarize it

In that workflow, a special discovery file like llms.txt may not be necessary.

How LLMs currently explore websites

Based on the traffic patterns we observe, most AI agents follow a process similar to this:

Step 1: Find a page

Usually through search engines or internal knowledge.

Step 2: Fetch the page

Agents request the full page HTML.

Step 3: Extract text

They parse the page and extract readable content.

Step 4: Follow links

If needed, they fetch additional pages linked from the original page.

This means LLMs are currently relying more on page readability than on dedicated metadata files.

Why llms.txt might not be used (yet)

There are several possible explanations.

1. The standard is still experimental

llms.txt is a relatively new proposal and has not yet been widely adopted by AI companies.

Most LLM systems likely rely on existing web discovery mechanisms.

2. Many agents use search engines

A large portion of AI-agent traffic originates from systems that first use:

  • Google
  • Bing
  • internal search indexes

In these cases, agents already know which page they want to fetch.

They don’t need a discovery file.

3. Agents simply parse the page

Modern LLMs are very good at extracting information from:

  • HTML
  • Markdown
  • documentation pages
  • structured headings

Because of this, they can often understand a site without additional guidance.

What actually helps LLMs understand websites

From observing AI-agent traffic and behavior, several patterns are becoming clear.

Websites that are easier for LLMs to understand tend to have:

  • clean HTML structure
  • clear headings
  • documentation-style content
  • minimal client-side JavaScript
  • readable text content

Pages that rely heavily on:

  • complex JavaScript rendering
  • deeply nested UI components
  • text embedded in images

are much harder for AI agents to interpret.

In practice, content readability matters far more than discovery files.

The bigger shift: AI-readable websites

The web has historically been designed for human browsers.

But AI agents are quickly becoming another major consumer of web content.

This raises an interesting question:

Instead of adding files like llms.txt, should websites focus on becoming more AI-readable by default?

Some emerging approaches include:

  • markdown-based content
  • simplified HTML
  • structured documentation
  • AI-optimized content representations

These approaches improve how both humans and machines understand the web.

We’d love to hear what others are seeing

Our observations are based on millions of AI-agent requests, but the ecosystem is evolving quickly.

If you operate:

  • large websites
  • developer documentation platforms
  • AI infrastructure
  • crawler systems

We’d be curious to know:

Have you seen AI agents request /llms.txt in real traffic logs?

The idea behind llms.txt is interesting — but based on current traffic patterns, it does not yet appear to be used by major LLM systems.

Continue in Docs.

Related posts