ZAP 009 - Structured docs for the agentic era¶

12-15 min read · 3,200 words · View in Zensical Spark

Summary¶

Documentation now serves two distinct audiences: human readers and the AI agents¹ that increasingly act on their behalf. Agents that cannot reliably access current, well-structured documentation fall back on aging training data, hallucinate², or surface competitor products instead. The difference between documentation that merely exists and documentation that works well for machines is becoming a business-critical concern.

The central challenge for documentation teams is token³ efficiency. AI vendors are moving enterprise customers to token-based billing, context windows⁴ are finite, and reasoning models⁵ are especially sensitive to context pollution⁶. Every irrelevant sentence injected into an agent's context degrades output and increases cost. Documentation must be precise, self-contained, and retrievable in targeted fragments – not served as monolithic pages designed for linear reading.

Topic-based authoring⁷ is the content-level answer. Composing documentation from minimal, self-contained topics assembled into purpose-built outputs – user guides, admin guides, API references – ensures a high signal-to-noise ratio for both humans and agents. Conditional processing produces machine-oriented variants without duplicating source content. This is covered in detail in ZAPs 006 and 008.

Markdown is the language of choice for AI agents. The source Markdown, however, is not the right input – it is typically transformed in the build pipeline before rendering to HTML. What agents need is a Markdown representation of content after pipeline stages have processed include statements, macros, and key resolution in topic-based authoring.

Finally, agents need to retrieve information in chunks that contain everything relevant and nothing that isn't. Information retrieval systems should support incremental discovery rather than delivering large blocks of undifferentiated content.

This ZAP outlines how Zensical addresses this through a documentation pipeline built for both audiences – humans and agents – and why they're closer than you think.

Already using topic-based authoring?

If your team already works with structured authoring – DITA, a CCMS, or any topic-based approach – you're closer to agent-ready than you might think. The practices behind good structured content: self-contained topics, conditional processing, single-sourcing, turn out to be exactly what agents need.

This ZAP treats topic-based authoring as a first-class citizen throughout. If you're considering a move toward Docs-as-Code, Zensical Spark is where we're working through that transition – together with teams in the same position.

Problem statement¶

Already familiar with the AI landscape? Skip to our design.

In 2026, your documentation has two very different audiences. People are searching, browsing, and reading your content as they always have done. Increasingly, however, large language models¹⁰ (LLMs) and AI agents are mediating how customers, potential customers, and even your own employees find answers.

While human readers are still the primary target for documentation sites, the same content is increasingly consumed by machines, including LLM training pipelines, retrieval-augmented generation¹¹ systems, and LLMs used via web interfaces, apps, or AI agents. Their information-seeking behavior has not been sufficiently studied, and their needs differ from those of human readers in specific ways. LLMs and agents that cannot effectively access up-to-date documentation fall back on their training data, which quickly becomes outdated, especially for popular, fast-moving projects or products. If they cannot find answers or solutions, they may well surface competitor products and work with these. Both Gitbook¹² and Mintlify¹³ report that traffic to documentation sites from AI agents is growing rapidly and will likely soon surpass human readership. With agents becoming the primary consumers of content, the requirements for documentation are changing¹⁴.

Documentation is part of a product's interface, and this matters more than ever as coding agents and generic agents (such as Microsoft's ubiquitous CoPilot products or OpenClaw) perform sequences of actions without human intervention. They generate and run code. Reasoning models spend extra computation to generate an internal reasoning chain before generating outputs. Unlike simple chatbots, these agents do not serve merely as tools for information retrieval and text summarization. When an agent cannot complete a task based on your documentation, takes too long, or even gets stuck in a loop, your user experience and customer satisfaction suffer. AI agents present unprecedented challenges for authors of technical documentation:

Limited attention span: Like human readers, agents have a limited "attention span" in the form of their context window. Information extraneous to the task degrades performance. Minimizing such polluting noise within the context window is critical to your product's success in the world of agentic AI, especially as agents act autonomously and perform complex, multi-step reasoning and generation tasks.
Precision: Documentation needs to have a high signal-to-noise ratio for both humans and machines. However, agents require documentation to be precise, to a degree that would frustrate most human readers. Imprecise information can lead LLMs to fill in blanks from their aging training data or to straight-up hallucinate. It can cause an agent to spin and burn tokens during a planning or reasoning stage that never ends.
Preventing context rot: Ultimately, the goal of technical writing for agentic AI must be to prevent context rot¹⁶, in which relevant information is pushed out of the context window, degrading the model's performance. Minimizing context pollution is one strategy. Another is to ensure that authored content is self-contained so that LLMs do not need to rely on prior context to perform their tasks.
Discovery: Authors providing alternative Markdown versions of their documentation must ensure that machines find these, even if search engines index the version produced for human consumption. There are various approaches to ensuring this, but standards are only just emerging, with adoption uneven.
Safeguards to limit potential for harm: Where technical writers may rely on common sense on the part of the human reader to ease comprehension, there is no such thing in an LLM, and any warnings and restrictions must be stated explicitly for a machine. Recent incidents of agents dropping production databases or offering rebates that did not exist show how much damage agents can cause.
HTML is poor input for agents: AI tooling works best with plain-text formats and Markdown. However, most websites are designed to be accessible to humans and contain visual elements that are either of no use to LLMs or are actually detrimental. An example is syntax highlighting that renders code as a multitude of span elements. These are pure noise for agents, just eat up tokens, and make the content more difficult to parse.
Producing Markdown as output: It is not sufficient to simply deliver the source Markdown, as the sources will typically undergo transformations, such as macro expansions or assembly into multiple outputs as part of topic-based authoring. Instead, Markdown must be the target format, as Zensical must process Markdown extensions and MDX syntax¹⁷ before AI agents can use it.
Compatibility with topic-based authoring: When documentation consists of multiple overlapping outputs, such as user guides, admin guides, and product references, agents need guidance to locate the appropriate one for the task they are performing.

As a result, there is a push to "make the docs AI-ready," but uncertainty about which strategies and technologies work in practice. With the landscape of AI technologies constantly shifting, making the right choices is not easy, and measures of success are hard to find. As model vendors seek economically sustainable pricing models and move enterprise customers away from subscription pricing, pressure on AI budgets increases.

For technical writers and decision-makers alike, the challenge is to identify the right strategy and select the appropriate technologies to produce high-quality documentation that meets the requirements of both humans and machines.

Purpose¶

In this ZAP, we critically review the state of the art in content production for humans and machines and outline the strategy we pursue at Zensical to deliver a coherent set of systems to allow organizations to master these challenges. In particular, we discuss the relevance of topic-based authoring (as covered in ZAPs 006 and 008) to achieving this goal.

In the context of this ZAP, we can only sketch the outlines of our design. The functionality will be specified more precisely in a subsequent ZAP after prototyping in Zensical Spark.

Background¶

Initial uses of AI in technical documentation focused either on content generation or on aiding human information-seeking. Examples of uses of AI technologies in content generation include machine translation, grammar and style checking, terminology management, and reuse analysis. Some of these predate the rise of large language models (LLMs). They are relatively well-established and, since they are hidden from the documentation user's view, documentation teams can integrate them relatively smoothly into existing workflows and quality control processes.

Chatbots¶

The release of ChatGPT, the first publicly available generative model with a conversational interface, opened up new opportunities to support human information-seeking. Over time, with the emergence of competitors such as Anthropic's Claude, Google's Gemini, and DeepSeek, frontier models improved as did the tooling around them, making practical applications feasible.

The problem that training data built into the model inevitably ages was addressed by creating retrieval-augmented generation (RAG) pipelines that supplement the training data with up-to-date information. It became feasible to embed "ask the docs" widgets in documentation sites, giving users a conversational interface for the documentation that promised automated Q&A and problem-solving. The promise was that the self-service model offered by chatbots would improve the customer experience by reducing wait times while also reducing support costs for the organization.

AI chatbots began appearing on many customer-facing sites, especially as third-party services became available as a commodity that integrates relatively easily into almost any site. It is now a matter of securing the budget and adding a few lines of JavaScript to integrate one of these services into a documentation site. The quality of a chatbot depends on the quality of the content it can draw on, as well as the robustness of the RAG pipeline behind it. A chatbot that hallucinates too much will do more harm than good. There have been several cases where chatbots were harmful to the brand or where people managed to jailbreak them.

However, even if we assume that the quality of chatbots has improved with the general improvement of frontier models and of the tooling, there are reasons to question whether they will be the predominant model for human information-seeking:

Chatbots do not serve users well when users ask questions that go beyond the bot's scope, such as product comparisons.
Search engines now typically include AI-generated answers in their results. This means that a portion of users will not even access the documentation site.
With the increased use of frontier models via web interfaces and apps, the number of users getting their questions answered without visiting the documentation site is also increasing.
AI coding tools will not use integrated chatbots but instead other endpoints (see below).

There is an established market for chatbot services, and their widgets can be integrated into a documentation site with ease. As maintainers of Material for MkDocs, we have received many requests over the years to provide some form of chatbot integration, but have always considered it a matter of customization. We included a chatbot in the Material for MkDocs documentation as a trial, but concluded that they were likely to be only a bridging technology that would eventually be replaced. Arguably, this is now happening with the arrival of agentic AI.

Agentic AI and MCP servers¶

The human-oriented "ask the docs" widget is increasingly complemented by machine interfaces, such as MCP⁹ servers or other APIs that expose documentation to whatever assistant the user is already using (Claude Code, Codex, CoPilot, Devin, Cursor, etc.). Documentation is becoming more of a queryable knowledge source for agents rather than a destination for human users.

Vendors that offer "ask the docs" widgets now typically also offer MCP endpoints. These essentially wrap the same RAG pipeline that powers the chatbot into a customer-specific MCP endpoint that agents can consume. Similarly, providers of documentation search services are beginning to offer MCP wrappers around their services.

A challenge, however, is that users must install each specific MCP endpoint into their agent tooling. While companies like kapa.ai or Mintlify can claim that many companies adopt their MCP services, it is unclear how much usage they experience.

There are now several AI startups that index publicly available documentation and code repositories and then charge users of coding agents for retrieval services. This turns the economics of who pays for the service on its head. They charge the user of the information, not the provider. This can be an interesting proposition for those whose documentation is public, especially given how expensive it is to run an MCP server or procure one as a service.

Some of these services allow documentation owners to register their libraries and receive benefits such as usage statistics, a chat widget for the documentation site, or the ability to add private sources.

These services also act as aggregators, since end users only need to install the service once in their coding agent tooling to access documentation for many libraries. The vendors encourage the documentation providers to advertise their services to end users by using badges in their documentation and repositories.

It is difficult to predict which of the many coding agents your users will use, never mind which of these retrieval engines they will subscribe to. Companies paying for MCP services bet that their customers will install the product-specific MCP endpoints in their tooling. On the other hand, the startups offering a single endpoint for many products are betting on developers' willingness to pay extra for their specialized services. The default model for many users, however, is still to use the web search capabilities of their coding agents. It is unclear what percentage of users also use MCP endpoints for documentation, but the number is likely much smaller than the number of users of coding agents overall.

Registering public documentation with one or more of the documentation MCP services could certainly be worth the effort, and using a badge can signal to customers that your company is "AI-ready". Since the effort in registering is relatively low and there are no costs involved, this seems like an obvious thing to do. However, it is important to note that mere registration does not guarantee a good user experience, and effort will likely be needed to ensure that the documentation can be effectively ingested and used by these tools.

Another important distinction among services is the source material they ingest. Some crawl the published documentation. This means they index what users see, but need to handle HTML-to-text conversion. Other services work off public repositories and ingest the source Markdown. This can be problematic when the Markdown contains include statements or macros, or is otherwise pre-processed when the documentation is built.

Critique of MCP¶

Anthropic announced the MCP protocol in November 2024¹⁸ and it has gained widespread adoption amid the rise of agentic AI. There have been voices criticizing the design of MCP, especially the fact that it can be verbose, both in how MCP servers ship their tool schema to the client on connection and in how many MCP servers produce responses with data that will not ultimately be used. Some of the critique applies to the MCP protocol per se, some is more of a question of how it is used. Specifically:

As a generic protocol, MCP necessarily supports dynamic tool discovery, so an MCP server provides a description of the tools it offers at startup. Depending on what the MCP server wraps, processing just this description can burn through tens of thousands of tokens¹⁹.
To call a tool via MCP, the LLM has to infer the data to send to the tool²⁰, which replicates information already in its context, costing tokens and leading to context rot, where relevant information is pushed out of the context window.
Composing different MCP endpoints means passing the data they return through the LLM's context window. Doing so eats up tokens and can lead to context rot.
Many services are not very specific in the data they return. So, a request for a particular data item may result in a response that consumes too many tokens and too much of the context window.

MCP is an attempt to solve an MxN mapping problem, mapping multiple LLMs or agent implementations to multiple services for integration. MxN mapping solutions tend to be complex.²¹ The question is whether, in the age of agentic AI and coding assistants, this MxN mapping is even necessary since we can now automatically generate more specific and efficient 1:1 mappings. Instead of letting an LLM repeatedly parse the self-description of an MCP endpoint, let it write code to access whatever service got wrapped in MCP.²²

Consider, for example, problem 4 above: if a weather service returns the temperature, barometric pressure, GPS coordinates, and other data when only the wind speed for a given location is needed, then all this information ends up in the context. An LLM calling an MCP endpoint will extract the correct information, but the token cost must be paid for all returned data items.

At the same time, MCP is the protocol that is pretty much universally supported by vendors. When using an LLM via a web interface or mobile app, MCPs are the only thing the user can install to connect the LLM to a documentation site.

Documentation authoring tools, SaaS, and AI¶

Many documentation platforms advertise themselves as "AI-ready". This typically boils down to the platform delivering Markdown to agents (via the accept: text/markdown header), providing an MCP endpoint, and supporting the creation and delivery of an llms.txt file²³.

Unlike the MCP services discussed above, documentation platforms that support both authoring and hosting can offer adaptations for agentic AI at the content level.

Fern provides <llms-only> and <llms-ignore> tags to vary the output for human readers vs. LLMs. Mintlify has a visibility component that does the same job.
Fern supports agent directives that are inserted at the top of the page content and provide instructions for agents on how to use the documentation. While this does consume space in this context, it works without requiring the user to instruct their agent, unlike skills⁸ or MCP endpoints.

There are currently significant differences across platforms in the features they support, how they implement them, and how they price them. As this field is very young, platform vendors are still working on their positioning and pricing strategies.

More traditional documentation authoring tools and CCMSs²⁴ are lagging behind in their support, leaving the market open for third-party solutions to fill the gap. Even though DITA-based²⁵ documentation, in particular, contains structured and semantically rich content, vendors have left it to third parties to build pipelines that leverage it. However, it seems likely that vendors will move in the coming months, either announcing their own solutions or integrating third-party add-ons and services for their products.

Context files: llms.txt¶

An attempt to help LLMs use websites is the proposal to create llms.txt files that contain Markdown content specifically for LLMs and AI agents. These files are deployed at the root of a documentation site, and the idea is that LLMs and agents would access them to learn how to use a product and its documentation.

However, the proposal has not yet seen widespread adoption, and none of the frontier model vendors have declared that their tooling supports it. It is also unclear, as yet, how effective the provision of these files is when used.

AGENTS.md and CLAUDE.md files are similar except that they are typically provided in source code repositories to guide agents working on the code contained. A recent study²⁶ suggested that generated AGENTS.md files may actually be detrimental to model performance compared to the same model simply using existing documentation. Handwritten files yielded only small improvements in this study. However, both resulted in a significant increase in token usage. It is worth noting, however, that this was just one study and has not yet been peer-reviewed or replicated.

Agent skills¶

Agent skills²⁷ package instructions, code, and resources that an AI agent can use on demand to handle specific tasks. Each skill lives in a folder containing a SKILL.md file that describes when to use it and what it covers. The code distributed with a skill can be run locally by the agent to perform tasks using the provided resources. The resources can also be used directly by the agent. What does this mean for the delivery and use of agent-focused documentation?

One way that skills are typically used is to ship agent-specific instructions to teach an agent how to use a product. However, we can produce skills that include not just a reduced set of instructions but also more extensive documentation, including a search index to ensure the agent can retrieve relevant parts as needed. The search engine tooling can be included directly in the skill.

How much of a product's agent-oriented documentation a skill contains is a decision that documentation teams can make based on project needs and hosting arrangements. If they include only the search index, then agents will follow the URLs to the actual content. On the other hand, the index can be populated with content that can then be delivered locally to the LLM. This enables offline use.

Skills can be distributed with the product or through their own distribution channels. For Python or JavaScript developers, the library-skills tool makes it easy to install skills included in the packages their project uses.

One convenient side effect of distributing documentation as part of the product's skills is that it ensures the correct version is used. Like documentation that is searchable and retrievable via MCP, documentation delivered as a skill can be used to update the out-of-date training data that LLMs have encoded in their parameter sets.

Another advantage is that skills are a transparent mechanism that runs locally and is under the user's control. It can be adapted to their needs. For example, users may extend the documentation by adding their own content.

Topic-based authoring¶

One aspect that has not been addressed so far is that machines do not read documentation for their own purposes. They read it on behalf of a human with a specific intent, role, and task in mind. The relevant dimension, therefore, is not human vs. LLM.

Rather, documentation needs to be written with specific audiences, their intents, roles, and tasks in mind. Topic-based authoring allows documentation to be written as sets of self-contained topics. From a single set of topics, intentionally crafted target outputs get assembled (cf. ZAP 006 and 008). Different collections of topics make up user manuals, installation manuals, reference material, and product brochures. Conditional processing addresses additional dimensions, such as different programming languages, operating systems, or regulatory requirements.

The human vs. AI agent distinction adds another layer to this. A decision-maker looking to compare products using an LLM will want to ensure the machine is accessing the kind of information they would find in a product brochure. An administrator asking an AI agent to put together a plan for installing a system would want the agent to draw on the kind of information typically found in an installation manual. Intentionally designed outputs focused on these broad goal contain more relevant information, more signal than noise.

The information architecture that documentation teams design with great care is more relevant than ever in the age of AI agents, even as those agents usually ignore navigation structures. It is relevant because it reflects the needs of humans using AI agents.

Splitting documentation into topics allows each topic to focus on a specific task, concept, or reference material. Each topic is authored to be self-contained but minimal. This authoring style prevents exhausting a human reader's attention span just as it avoids filling up an LLM's context window with unnecessary information. It ensures an optimal signal-to-noise ratio and reduces context rot.

Finally, there is often a need to distinguish among different versions of a product, the operating systems it can run on, or the programming languages it supports. If the differences between outputs are relatively minor, they can be modeled as conditional elements. The mechanisms for conditional processing can also be used to distinguish content written for human readers from that written for LLMs. While both have a limited "attention span", the same content for LLMs can sometimes be written in more concise, dry language that would make human readers give up.

Topic-based authoring ensures that the information on a single page (or in a single LLM-targeted Markdown file) is self-contained, with preconditions clearly stated instead of implied or carried over from other pages browsed previously. This makes sense for human readers who use search engines to arrive at a page. It is necessary for LLMs, which do not rely on information architecture to find the relevant information.

Zensical will support topic-based authoring as described above, see our roadmap. This will allow us to leverage the semantic structure of content to produce outputs that cater to each combination of agent (human or LLM), intent, role, and specific task.

Adaptation	What it does	Delivery	Requirements	Notes
none	N/A	N/A	N/A	Typically wasteful of context window/tokens
llms.txt	Markdown "sitemap" for agents	Part of static site	Can be hand-crafted or generated	No evidence of usage
llms-full.txt	Full documentation in one Markdown file	Part of static site	Generated by documentation tooling	No evidence of usage
SKILL.md	Instructions, scripts, resources for agents	Downloadable package	Hand-crafted, possibly adapted to different agent implementations	Proposed standard, interpreted differently by vendors
Conditional text	Adapts content to machine usage	Build-time adaptation of content	Requires documentation tooling support	Adaptation only to agents
Topic-based authoring	Allows systematic reuse and assembly of content	Build-time adaptation of content	Requires documentation tooling support	Adaptation to agent x intent x role x task.
Copy markdown	Allows user to copy Markdown for inclusion in LLM prompts.	Directly through the UI.	Markdown version needs to be available.	Very simple, but can be effective as it leaves people in control.
accept:text/markdown	Alternative content format for agents	Needs to be configured in the web server	Needs web server configuration and generation of Markdown by the docs tooling	Used in practice, uses fewer tokens then HTML
RAG pipeline (chatbot)	Provides a chat interface for users	A widget that plugs into the documentation site	Typically, a service Installed in the site with extra JS	Actual implementation vendor-specific
RAG pipeline (through MCP)	Helps agents find relevant content for context	Web service, added to agent/LLM by the user	Needs to be hosted or procured as a service	MCP has been shown to be quite wasteful (context, token usage); widely used though
Search interface	Content retrieval for humans and machines	Web service, configure via SKILL.md	Can be an existing search interface with additional options, possibly with vector search and facets	Can be more efficient than MCP

Considerations¶

As the landscape of AI models and tools is constantly shifting, it is difficult to predict which adaptations to machines listed in the table above will be relevant. None of them provide a comprehensive and universally adopted solution, and different projects may require different combinations. Below, we identify aspects of a coherent design that Zensical will offer.

1. Values¶

Zensical values simplicity and portability; any solution we recommend should work across the range of deployment scenarios Zensical supports, including static hosting without dynamic backends. We are committed to topic-based authoring and Docs-as-Code workflows, so any machine-readable affordances must integrate with existing source structures rather than requiring parallel content trees.

There is also potentially tension between optimizing for current agent behavior and preserving writing quality for humans. Community voices have raised the concern that documentation optimized for LLMs may degrade for human readers, analogous to cities redesigned for cars becoming hostile to pedestrians.²⁸ Our approach should avoid producing content that reads as boilerplate or machine-voiced, even while making it machine-readable.

2. Token-efficient operation¶

AI vendors are increasingly seeking sustainable business models, adapting their subscriptions, and moving enterprise customers to token-based billing. It seems certain that token-efficient operation will be a major topic in 2026 and beyond. Beyond the economic argument, limited context window sizes also mean that token efficiency is the order of the day. Context rot is a major problem with reasoning models and agentic AI.

This is a cross-cutting concern that will run through the other consideration sections.

3. Information retrieval¶

With Disco, we already have a highly modular search engine that can incorporate semantic search and be hosted on a highly scalable content delivery/edge computing infrastructure. Its small size also allows it to be delivered as part of an AI agent skill. That is, the same information retrieval engine can serve both the local deployment case and the hosted case to provide an API or an MCP endpoint. Disco is already in use in the static sites built with Zensical. Additionally:

Progressive disclosure: search operations should return limited information that is best suited to answering the question at hand or enabling a subsequent, more focused search. Returning exhaustive search results would lead to context pollution.
Disco supports faceted search, allowing information retrieved to be highly specific. For example, information can be filtered by operating system or programming language.
Disco's highly modular design enables the implementation of specific functionality, such as progressive disclosure.
Similarly, the search algorithm can be optimized for use by AI agents. For use by human readers, an algorithm was implemented to produce stable lookahead results. We are currently exploring what the ideal ranking algorithm is for agentic use.

4. Markdown as the target format¶

We have established that topic-based authoring helps produce specifically targeted documentation from a single set of source topics. Now, we need to consider what the topics are compiled into. As already mentioned, simply shipping the source Markdown would not work. Not least because, in topic-based authoring, conditional processing occurs and key references²⁹ are resolved.

For this reason, content needs to run through the entire processing pipeline up to the point where it is rendered to HTML. At this point, it needs to be rendered to Markdown instead. Zensical's modularity makes this possible, and the move to a CommonMark parser with an AST representation³⁰ will make it significantly easier to achieve this goal.

5. MCP support¶

As MCP is the only way to extend many tools (Claude.ai or the Claude mobile app), support for MCP will be necessary, regardless of the critique discussed above. The question is not whether to build an MCP endpoint, but how to do so. The kinds of problems MCP presents point in the following directions:

Progressive disclosure - see under information retrieval above.
Limited endpoints: One criticism of MCP is that the protocol is often used to wrap many tools into a single endpoint, increasing the size of the description that the MCP endpoint must send to the LLM. We will seek to minimize this surface area.
Local tool alternative: for agentic tooling that runs local code, it is more efficient to use the Disco CLI rather than MCP.

6. Topic-based authoring¶

Topic-based authoring presents challenges and creates opportunities to support human readers and AI agents.

Faceted search: conditionals such as variants for different operating systems or programming languages that are not resolved at build time but turned into content tabs, for example, should be turned into facets in the search, so that AI agents can specify the relevant parameters and get back focused content.
Markdown output: Markdown has to be an output format, see above.

7. Air-gapped deployment¶

Some of our users run air-gapped systems. It must be possible for them to benefit from our work on supporting agentic AI. The deployment via skills with integrated search serves this use case, but it should also be possible for organizations to host a local Disco deployment and use the Disco CLI for search.

8. Monitoring/analytics¶

As AI agents increasingly operate autonomously, reading and acting on documentation to complete tasks, the question of what they're actually consuming becomes surprisingly important. An agent that repeatedly fetches the same API reference page, or that consistently misreads a particular section and triggers downstream errors, is telling you something useful about your docs, but only if you're watching.

Both deployment scenarios (MCP and skills) should enable data collection on the effectiveness of agents' use of the documentation.
In the skills case, this information will be useful to the end user, who can choose to share it with the Zensical team or the documentation authors for analysis.
In the MCP case, analytics can be used at an aggregate level across many users, producing detailed statistics that can guide further development of the Zensical skill and the documentation.
A hosted service will include a suitable privacy policy.
It must be possible for documentation teams to identify which parts of the documentation are ambiguous or outdated, causing agents to retry or fail, and which sections are polluting the context.

Evaluation and content-quality tooling is getting more attention, partly because RAG systems have exposed how poorly structured much of the existing documentation is for retrieval.

Scenario 1: GeoProc¶

Kumiko maintains GeoProc, an Open Source geospatial processing library with a few thousand users, most of them data engineers and scientists. She shipped a significant API redesign in v2.0 that is cleaner and more consistent, but includes breaking changes.

The problem is immediate: coding agents keep suggesting v1 patterns. Users get error messages and turn to the agents for help, but none of the fixes work because they're written for the old API as well. The issues queue fills up with questions that are answered in the v2 migration guide, which the agents have never seen.

Kumiko is spending an hour a day on support. She remembers seeing that Zensical can ship agent-focused documentation as part of skills embedded in Python packages. She builds a skill and uses the Zensical tooling to include it in the next release's build.

Kumiko is deliberate about what goes into the index. Her docs include a set of introductory tutorials aimed at people new to geospatial data. The tutorials cover the basics, such as what a coordinate reference system is, how projections work, and why geometries need to be in the same CRS before you can compare them. For LLMs, this information is noise, as they already encode it in their parameters. Kumiko flags the tutorial files with metadata, and they are excluded from the agent index at build time.

Then, she provides instructions to help users add the skill to their coding agents, and after the release, she posts about it on the library's Discord server to spread the word. The GeoProc project is hosted entirely on GitHub and runs no infrastructure of its own, so Kumiko is happy that the skill is deployed via the library package on PyPI.

Scenario 2: ACME Shield¶

Dmitri is a senior engineer at defense contractor ACME Shield. His team writes tooling for mission-planning systems: internal frameworks, proprietary data formats, and classified APIs. None of it has ever appeared in a training corpus, and none of it ever will.

The team runs QWEN3.6 on workstations inside the secure enclave. The model is useful for general programming tasks, but useless for anything that touches the internal stack. Every new engineer spends weeks reverse-engineering conventions that are documented somewhere in a sprawling Confluence instance, if they know to look. The senior engineers field the same questions repeatedly. In the last month alone, Dmitri has written the same explanation of the coordinate system abstraction layer four times in the team chat across different threads.

There is no network path to a remote endpoint, as any dependency on external services is a non-starter from a security standpoint. Deploying an internal MCP server would have been an option, but Dmitri knows that getting approval for this would have taken forever.

The solution is to build a documentation skill for the internal tooling. The Confluence pages are exported to Markdown, the index is built, and the skill is bundled into the internal Python package that every project already depends on. The developers install the skill in their coding agent harnesses, and from here on, the AI can answer questions about the internal stack.

Scenario 3: ACME Logistics¶

Amara runs developer relations at ACME Logistics, a company that provides logistics solutions and planning software. Their customers are operations teams at retailers and freight companies that use the logistics planning software, which is provided as a SaaS product with a REST API.

They want to use agentic frameworks like Copilot Studio to automate scenarios they regularly handle. They generate scripts they can run themselves or as part of CoPilot automations. The problem is that the models behind CoPilot were trained on an early version of Amara's API documentation and are unaware of some of the recently added functionality. They also often use deprecated endpoints. Amara's team closes six to ten support tickets of these types a week.

Amara is looking for a way to inject the updated documentation into CoPilot. One thing she is worried about is that the ACME Logistics Planner has recently been differentiated into three different versions targeting different market segments. She is worried that CoPilot will confuse the features available in the different versions, all of which are covered by a single integrated documentation site. Feature availability is indicated by icons.

A colleague on LinkedIn points her to Zensical, which supports topic-based authoring and provides a documentation MCP service that ensures agents provide information about the product version used in the request. All searches contain a facet selector for the version, and only relevant results are returned.

Evaluation¶

Desirability: We have evidence from talking with organizations and professionals and issues submitted to both Material for MkDocs in the past and now to Zensical that the problems described in this ZAP are pressing for documentarians. The growth of agentic AI and its use of documentation featured prominently in this year's State of Docs report¹⁵. Discussions in the Write the Docs community also document a strong appetite for practical, standards-based approaches over elaborate vendor-specific integrations. The direction outlined in this ZAP derives from Zensical's broader vision and commitment to simplicity and author productivity.

Feasibility: Key elements are in place through the Disco search engine. A proof-of-concept of the skills-based solution exists. Producing Markdown output is feasible and will be easier once we switch to CommonMark and a Markdown parser that produces an AST. As mentioned in the State of Docs report, the most important adaptation to agentic AI lies in how content is authored: adding context so pages are self-contained and investing in structured data. This is exactly what topic-based authoring is about, as covered in ZAPs 006 and 008.

Viability: Given that core elements such as Disco and Zensical's extensible architecture are in place, we are in an excellent position to implement this functionality. Offering an MCP endpoint will entail setting up infrastructure, and keeping things aligned with an evolving AI landscape means a maintenance burden. However, the coherent approach outlined in this ZAP can serve as a competitive advantage, facilitating the adoption of Zensical across a wide range of projects.

Usability: The proposal does not affect the authoring or developer experience. It aims to make a significant positive impact on the end-user experience when working with agentic AI or through LLM chat interfaces.

Join the discussion in Zensical Spark

We discuss ZAPs with our members in Zensical Spark. To gain further insights or provide feedback to ensure alignment with your organization's needs, get a Zensical Spark membership.

A software system that uses an LLM to act on behalf of a user, often with tools for web access, file manipulation, or API calls. AI agents differ from chatbots in that they typically operate with some autonomy over multiple steps and can access resources and modify data on the user's machine. ↩
The generation of plausible-sounding but factually incorrect content by an LLM. Hallucinations occur because LLMs predict likely text rather than retrieving verified facts; they are more likely when the model lacks reliable information, for example, when its training data does not sufficiently cover a product or topic. In agentic contexts, a hallucinated API call or configuration value can cause silent failures or data loss. ↩
The basic unit of text that an LLM processes. Common short words are typically a single token, while longer or less frequent words are split into several. LLMs measure all input and output in tokens: context windows are sized in tokens, and token-based billing charges by the number processed. Because every piece of content injected into an agent's context has a token cost, minimizing unnecessary content directly reduces both expense and the risk of context rot. ↩
The fixed amount of text (measured in tokens) that an LLM can consider at one time. Content that would otherwise be relevant but exceeds this window is truncated, summarized, or omitted, often silently. ↩
A variant of LLM that generates an internal chain of reasoning steps before producing its final output. This extra computation improves performance on complex tasks but significantly increases latency and token usage. Because the reasoning chain occupies the context window, reasoning models are particularly sensitive to context pollution and context rot. ↩
Irrelevant or misleading content that gets introduced into the context window and adversely affects subsequent answers or actions by the LLM. By its very nature, context pollution can lead to context rot. ↩
A writing approach that breaks content into small, self-contained units, each covering a single subject or task, that can stand alone and be reused across multiple documents or deliverables. Instead of writing a linear manual from start to finish, authors create discrete topics (a concept, a task, a reference table) that a CCMS or other build pipeline can assemble into whatever output the audience needs. ↩
A packaged set of instructions that tells an LLM how to approach a specific task or domain. In Anthropic's framing, skills instruct a model to consult specific documentation, use a particular search strategy, or ignore certain training data. ↩
A protocol published by Anthropic for exposing tools and data to LLMs in a structured way. Requires a server component and per-client configuration. ↩
A type of AI model trained on large volumes of text to predict and generate language. LLMs are the engine behind both chatbots and AI agents. Modern LLMs can perform reasoning tasks by generating an internal chain of reasoning steps before producing their final output. ↩
The pattern of fetching relevant content at query time and injecting it into the LLM's context rather than relying on training data alone. ↩
https://www.gitbook.com/blog/ai-docs-data-2025 ↩
https://www.mintlify.com/blog/state-of-ai ↩
This is reflected in the fact that there is a new AI and documentation consumption section in the 2026 State of Docs report. ↩
https://www.stateofdocs.com/2026/ai-and-documentation-consumption ↩
A failure mode in long LLM conversations. Context rot is the gradual degradation of model behavior as a conversation grows longer. The model's attention gets diluted across more and more tokens, so earlier instructions, established facts, or agreed-upon constraints carry less weight than they did at the start. ↩
An extension of Markdown that allows JSX (JavaScript XML) syntax to be embedded directly in Markdown files. MDX can serve as the basis for a component model in a static site generator like Zensical, replacing Markdown extensions and macros while also opening up more opportunities, such as implementing topic-based authoring. ↩
https://www.anthropic.com/news/model-context-protocol ↩
Mario Zechner, "What If You Don't Need MCP At All?" (November 2025) ↩
Manuel Odendahl, "MCPs are Boring" (June 2025) ↩
Corba was famous for this. It tried to solve a similar problem for multiple programming languages and operating systems. XML can be seen in a similar light. ↩
Armin Ronacher, "Tools: Code Is All You Need" (July 2025) ↩
A proposed (unadopted) file format, typically hosted at the site root, that provides a curated index of documentation URLs for LLM consumption. ↩
A content management system that stores, manages, and reuses information at a component level. A component is typically a self-contained topic or another reusable element at a finer level of granularity, such as a paragraph or an admonition. CCMSs support topic-based authoring and are typically based on the DITA standard, but there are important exceptions. ↩
The Darwin Information Typing Architecture (DITA) is an XML-based open standard for structuring technical content as reusable topics (see topic-based authoring). DITA enforces a strict separation between content and format, and classifies every topic by type so that content can be written once, assembled into different target documents, and published to multiple formats and channels without reformatting. ↩
Gloaguen et al. "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?" arXiv preprint arXiv:2602.11988, 2026 https://arxiv.org/abs/2602.11988 ↩
https://agentskills.io/home ↩
mcc, Mastodon post comparing AI-optimized documentation to car-oriented urban redesign, 29 March 2026. https://mastodon.social/@mcc/116314231162423866 ↩
In topic-based authoring, keys are named placeholders that reference content or links. Within the content, authors use keys instead of hardcoding a product name or URL directly into a topic. Crucially, there are mechanisms to assign different values to keys depending on which target document is built. For example, the contact information may be a technical support forum for a reference manual and a support hotline for the user manual. ↩
A tree-shaped data structure that represents the grammatical structure of a document or program after it has been parsed. Working with an AST makes it straightforward to transform content without fragile search-and-replace operations. ↩