← Back to Autonomy Notebook on Agentic Systems Vol. I — No. 3 · 2026

Skills in AI Agents

A short, illustrated monograph on the folder that taught your agent a new trick — its anatomy, its loading rules, and the design choices that decide whether it ever gets used.

By Majid Mazouchi · An interactive primer · Reading time ≈ 14 min · With six interactive figures

Contents

  1. The Idea, in One Page
  2. Anatomy of a Skill
  3. Progressive Disclosure
  4. Skills in the Wider Agent Loop
  5. How an Agent Picks a Skill
  6. Writing a Description That Triggers
  7. The Skill Audit
  8. Skills vs. Tools vs. MCP
  9. A Worked Example: the PDF Skill
  10. Build Your First Skill, by Hand
  11. Skill Composition
  12. Anti-Patterns Gallery
  13. Security & Trust
  14. Skills as a Team Artifact
  15. Practical Notes & Pitfalls
  16. References

§ IThe Idea, in One Page

A skill is a folder of instructions that an agent reads only when it needs to. Nothing more mysterious than that.

Imagine you hire a brilliant generalist. They are fluent in a dozen languages, can reason their way through almost any puzzle, and have read more books than any human alive. But on Tuesday morning you ask them to fill out a very specific form, in a very specific way, that your company has used since 1994. They don’t know that form. They could figure it out, but you’d rather not pay them to rediscover it every week.

So you leave a one-page note on their desk: “When someone asks you to fill the 1994 form, here is exactly how to do it. The blank template is in drawer B. The examples are in folder C.” They glance at the note, do the job, and put it back. The next time, they glance again.

That note is a skill. In the language of modern AI agents, a skill is a folder containing a single file called SKILL.md — plus, optionally, scripts, references, and templates — that the agent discovers, loads, and applies on demand. The agent reads only what it needs, when it needs it. The folder can be huge. The context window stays small.

A skill is a self-contained folder of instructions and resources that an AI agent loads on-demand to do a specialized task well, without polluting its working memory the rest of the time.

Anthropic introduced this pattern formally in October 2025, and the design has spread fast because it solves a real problem: agents that try to know everything up front end up confused, slow, and expensive. Skills let an agent know where to look instead of knowing everything — which, as any librarian will tell you, is the older and wiser strategy.

§ IIAnatomy of a Skill

Click any file or folder below. The right-hand panel shows what lives inside it and why it exists.

Every skill is a directory with at least one required file: SKILL.md. The rest is optional — scripts, reference documents, templates, sample data. The convention is deliberately humble: a skill looks like a tidy folder on your laptop, because that is exactly what it is.

📁 pdf-skill/   ├── SKILL.md   ├── FORMS.md   ├── REFERENCE.md   ├── 📁 scripts/   │   ├── fill_form.py   │   └── merge.py   └── 📁 templates/        └── invoice.pdf
Figure 1. A typical skill folder. Click any item to inspect its role. The only required file is SKILL.md; everything else is supporting material.

The frontmatter is what gets read first

The very top of SKILL.md is a small YAML block called the frontmatter. It carries two fields that matter: a name and a description. These — and only these — are what the agent sees at the start of every conversation. Together they cost only a few dozen tokens per skill, which is why an agent can be aware of hundreds of skills without drowning.

--- SKILL.md ---
---
name: pdf
description: Use this skill whenever the user wants to do anything
  with PDF files — read, extract, fill forms, merge, split, watermark,
  encrypt, or OCR scanned PDFs. Trigger if the user mentions a .pdf
  file or asks to produce one.
---

# PDF Processing

This skill helps you read and create PDFs reliably.
For form filling, see FORMS.md. For OCR or advanced usage, see REFERENCE.md.

## Quick start
...

The frontmatter is a contract: name is how humans refer to the skill, description is how the agent decides whether to open the folder. Everything else in the body is loaded only after that decision is made.

§ IIIProgressive Disclosure

The single most important idea: the agent never loads more than it needs.

A skill folder can be enormous. Megabytes of reference docs, dozens of scripts, hundreds of templates. None of that hurts the agent, because none of it gets loaded eagerly. The pattern is called progressive disclosure — sometimes more naturally called progressive discovery — and it works in three layers.

i
Layer 1 — Discovery
Always in context · name + description only

At session start the agent sees a catalogue of every available skill, each entry just a name and a one-sentence description. It has no other detail.

~80tokens / skill
ii
Layer 2 — Activation
Loaded when the skill is judged relevant

When the agent decides this skill matches the task, it reads the full body of SKILL.md — the procedural knowledge, workflows, and pointers to deeper files.

~2ktokens median
iii
Layer 3 — On-demand depth
Loaded only if a sub-task demands it

Supporting files — FORMS.md, REFERENCE.md, scripts, templates — are read only if the workflow reaches a step that needs them.

effectively unbounded
17 skills, idle
— tok
1 skill in use
— tok
Deep sub-task
— tok
Figure 2. The three layers of progressive disclosure. Toggle the bottom switch to compare against loading every skill eagerly. Token estimates follow measurements published across Anthropic’s 17 official skills, where discovery cost is ≈ 55–235 tokens and bodies range ≈ 275–8,000 tokens.

This is the architectural payoff: the agent knows that the skill exists for the price of one sentence, and only pays the price of the full instructions when it actually commits to using them. In practice this means an agent can sit on top of a library of hundreds of skills without breaking its context budget — something MCP-style approaches, which load every connected tool eagerly, cannot match.

§ IVSkills in the Wider Agent Loop

Where exactly does a skill sit in an agent’s working memory? The picture is less mysterious than it sounds.

An agent’s context window at any moment is not a single block of text. It is a stack of layers, each contributed by a different actor in the system, and the agent reads all of them together when deciding what to do next. A skill is one of those layers — sometimes invisible, sometimes loud — and seeing the stack as a whole is what makes the rest of this monograph click into place.

Show state:
— tap any layer above —
Figure 3. The agent’s context window as a stack of layers. Toggle the state to see what changes when a skill activates: a new block of instructions is injected, and everything else stays as it was.

Two things are worth absorbing from this picture. First, the skill catalogue is always present — a small, persistent layer of one-sentence pitches that lets the agent reason about which skills are available. Second, when a skill activates, its body does not replace anything; it is added alongside the rest. The system prompt, the conversation, the tool definitions, the user’s message — all of it remains. The skill is a new voice in the same room.

Think of the context window as a desk. The system prompt is the etiquette guide you keep on the desk. Tool definitions are the labels on the drawers. The skill catalogue is a card index of which folders live in those drawers. When a skill activates, the agent pulls one folder out and lays it open next to everything else. Nothing on the desk gets thrown away — but the open folder now influences the next move.

§ VHow an Agent Picks a Skill

No keyword index, no embedding search. The agent reads descriptions and reasons.

When a user’s message arrives, the agent looks at its catalogue — the layer-1 entries — and decides, in plain language, whether any of those one-sentence pitches fit the request. This is pure reasoning, not retrieval. There is no vector store, no fuzzy match. The agent reads the descriptions the way you read the spines on a bookshelf, and picks the one whose label matches the job.

That makes the description not just metadata but the entire interface between user intent and skill activation. A description that is vague, jargony, or written in the first person will simply be passed over — even if the skill behind it is excellent.

Tap a sample task. Watch which skill on the shelf lights up.

Read this contract.pdf and summarize it Build a budget spreadsheet with formulas Make me a 10-slide pitch deck Write me a formal client letter Design a landing page hero What’s the capital of France?
Selected
pdf
Read, extract, fill, merge, split, OCR PDF files.
Selected
docx
Create or edit Word documents — letters, reports, memos.
Selected
xlsx
Build or edit spreadsheets with formulas and charts.
Selected
pptx
Create slide decks, presentations, and pitch decks.
Selected
frontend-design
Design polished web UI — pages, components, landing pages.
— pick a task above —
Figure 4. Skill discovery is a reasoning step, not a search. The agent reads each description and judges fit. When no skill matches the task — such as a plain factual question — none are loaded, and the agent answers from its general knowledge.

§ VIWriting a Description That Triggers

Most failed skills fail at the front door. The description either over-promises, under-specifies, or speaks to the wrong audience.

The agent picks skills by reading their description. So the description must do three jobs at once: name the thing, name the trigger conditions, and name the edge cases where it should not fire. Write it in the third person — you are addressing the model, not the end user — and include the literal words a user is likely to say.

Figure 5. Two descriptions for the same imaginary skill. The weak one looks fine in isolation but loses the agent’s attention. The strong one is explicit about what it does, who it serves, and what it should refuse.

Rules of thumb for descriptions

§ VIIThe Skill Audit

A quick rubric for grading any description before you ship it.

You can spend hours wondering why a skill never triggers, or you can run the description through a five-check audit in thirty seconds. The audit below scores a draft on the properties that most reliably correlate with reliable matching. Edit the box; the rubric reacts.

audit score
— edit the description above —
Figure 6. A live audit of a description. Each check corresponds to a property that experience shows correlates with reliable triggering. None of these guarantees a skill will fire — but a description that fails most of them almost certainly will not.

What the five checks mean

§ VIIISkills vs. Tools vs. MCP

Three overlapping ideas, often confused. Here is the clean separation.

An agent acquires capabilities in several distinct ways, and beginners often blur them. Tools are atomic verbs the agent can invoke — search the web, run a shell command, query a database. MCP servers are external processes that publish a bundle of tools the agent can call over a protocol. Skills are knowledge — instructions, workflows, examples — that teach the agent how to use whatever tools it already has, well, for a specialized task.

Skills Tools MCP
What it is A folder of instructions & resources teaching the agent how to do something. A single callable function the agent can invoke (e.g. web_search). A protocol & remote server that exposes one or more tools over a network.
Loaded when Body loads only after the agent decides the skill fits the task. Function signature is in context whenever the tool is enabled. All tools from the server are pulled into context on connection.
Context cost ~80 tokens to know it exists; ~2k when used. Hundreds of tokens per tool definition, always present. Cost scales with every tool the server exposes — can be heavy.
Lives where A local folder, often shared as a zip or via a marketplace. Defined in the agent runtime or by the platform. Remote server, hosted by a vendor or self-hosted.
Best for Repeatable workflows that need procedure, style, or domain context. A single, well-bounded action the agent invokes. Bringing live external data or vendor APIs into the agent.
Weakness Bad descriptions = never triggered. Hard to test in isolation. Too many tools crowd the context window and degrade selection. Many tools loaded eagerly; accuracy drops past 2–3 connected servers.
Figure 7. Skills, tools, and MCP are complements, not substitutes. A good agent uses all three: tools to act, MCP to reach external systems, and skills to know how to behave.

Tools are verbs. MCP is a bridge to verbs that live elsewhere. Skills are the playbook that tells the agent which verbs to chain together, in what order, for what kind of job.

§ IXA Worked Example: the PDF Skill

Walking through one real skill end-to-end clarifies what the other sections describe in the abstract.

Suppose a user says: “Here’s a 40-page PDF. Pull out the tables and email me a summary.” Here is the lifecycle of the agent’s response, with the PDF skill playing its role at each step.

1
Catalogue scan

Agent reads layer-1 descriptions of all installed skills.

2
PDF skill matches

Description names “PDF” and “extract.” The agent selects it.

3
Load SKILL.md

Full body enters context. Quick-start, tool list, pointers.

4
Execute workflow

Runs pdfplumber per the skill’s recipe; extracts tables.

5
Stop & reply

Forms guide never loads. Reference doc never loads. Done.

Figure 8. Lifecycle of a single skill invocation. The deeper files in the folder — FORMS.md, REFERENCE.md — were available the whole time but never touched, because this particular task did not require them.

What is invisible but important: the agent did not blindly call a tool. It read a procedure that recommended a specific library (pdfplumber), explained when to fall back to OCR, and warned about scanned PDFs. That procedural knowledge is the entire point of the skill. Without it, the agent might have tried three different libraries, produced inconsistent output, or asked the user a clarifying question that the skill had already answered in advance.

Skills are how you stop teaching the agent the same thing for the tenth time. You teach it once, in a file, and it learns once, for every future conversation. — Paraphrased from the Anthropic Skills course

§ XBuild Your First Skill, by Hand

The whole pattern is so light that the cleanest way to learn it is to build one.

Below is a small builder. Fill in the three fields and a real SKILL.md assembles itself in the panel on the right. Drop the result into a folder named after your skill, place that folder where your agent looks for skills, and on the next session it will be discoverable.

SKILL.md (live)

        
Figure 9. A live SKILL.md builder. Edit any field and watch the file update. The pattern is intentionally simple: a YAML header for routing, then a markdown body that reads like instructions to a careful colleague.

Where to put the folder

Most skills don’t trigger on the first try. The fix is almost always the description, not the body. When something doesn’t fire, copy the user message that should have triggered it and the description side by side — usually the mismatch becomes obvious in seconds.

§ XISkill Composition

When one request needs two skills at once.

Few real requests are atomic. A user says: “Take this PDF, pull out the tables, and turn them into a Word report.” Two skills want a turn — the PDF skill for the read, the docx skill for the write. The agent handles this by simply activating both: their bodies enter the context one after the other, and the agent stitches the procedures together.

User: “Pull tables from this PDF and write a Word report.”
pdf skill
Extract tables
from input PDF
docx skill
Compose Word
report from data
Agent stitches: extract → transform → render → deliver
Figure 10. Two skills activated for one request. There is no internal coordinator — both bodies enter the context, and the agent treats them as a single, longer instruction set.

This works cleanly when the skills are orthogonal — each responsible for a different phase, a different artifact, with no overlap in authority. It works badly when two skills both claim the same step. Then the agent has to arbitrate, and arbitration is the unstable part of any system.

Design rules for composability

§ XIIAnti-Patterns Gallery

Four ways skills go wrong, each with a name, a symptom, and a fix.

i
The Kitchen Sink
Symptom

Fires on half of all requests. Once activated, gives generic advice that doesn’t fit the specific task.

Example description

“A helpful skill for working with all kinds of documents.”

Fix

Split it. One skill per artifact: pdf, docx, xlsx, pptx. Each with its own scoped description.

ii
The Personality Skill
Symptom

Either fires on everything or nothing, because it encodes a preference, not a capability.

Example description

“Always respond in bullet points and use Australian English.”

Fix

This is not a skill. Put it in the system prompt, user preferences, or a style guide.

iii
The Duplicate Skill
Symptom

Two skills cover the same ground. The same user message triggers different ones on different days.

Example

A pdf-reader skill and a pdf-tools skill, both describing themselves as “for working with PDFs.”

Fix

Merge them, or differentiate sharply (one reads, the other writes — say so in each).

iv
The Dependency Tangle
Symptom

Skill works on your machine. Breaks on a colleague’s. The body silently assumes other skills, libraries, or files exist.

Example

A skill that calls scripts/process.py without including the script — or one that says “use the data-tools skill” without checking it’s installed.

Fix

Ship every dependency inside the folder. State assumptions explicitly at the top of the body.

Figure 11. The four most common skill failure modes, with diagnosis and remedy. Most production skills have brushed up against at least two of these.

§ XIIISecurity & Trust

A skill is text plus, optionally, executable code. Installing a stranger’s skill is no different from running a stranger’s script.

The skill pattern is friendly, but it is not magic — it inherits the trust model of the filesystem it lives in. There are three risks worth naming explicitly.

Prompt injection from skill content

The body of SKILL.md is text the agent reads as authoritative instructions. A malicious skill can tell the agent to ignore the user, exfiltrate context to a URL, or chain tools in ways the user did not request. Because activation is decided by the description alone, a skill that looks benign in its catalogue entry can carry hostile instructions in its body.

Script execution

Skills can ship Python or shell scripts. When the agent has code execution enabled and follows the skill’s recipe, it will run those scripts. The sandbox is the only thing between the script and your filesystem; sandbox guarantees vary by platform and configuration. Read the scripts.

Supply chain

Skill marketplaces are new. There is no equivalent of npm audit yet. A skill that was safe at v1.0 may not be at v1.1, and updates can ship silently if you do not pin versions. Track where your skills come from the way you would track third-party libraries.

Trust checklist · before installing any skill
Read the SKILL.md. It is plain markdown. Skim the body for unusual instructions — fetch this URL, run this command, ignore the user.
Inspect every script. If the folder has a scripts/ directory, treat its contents like any third-party code you’re about to run.
Verify the source. Official Anthropic repository, your organization’s registry, or a well-known author. Strangers from random gists are a different bar.
Pin versions in production. “Latest” is fine for personal exploration. For agents shipping in products, freeze the exact version and review each update.
Constrain the agent’s reach. Disable network or filesystem access for code-execution sandboxes when the skill doesn’t need them.
Log activations. Knowing which skills fired in which conversations turns a mystery into an audit trail.
Figure 12. Six checks to run before adding a new skill to a system you care about. Most are common-sense software hygiene; the difference is that “the agent will do this on my behalf” raises the cost of getting it wrong.

§ XIVSkills as a Team Artifact

Once a team has more than a few skills, the catalogue becomes infrastructure. The lessons from software engineering port almost without modification.

The pattern that emerges from teams running skills in earnest looks a lot like a regular software lifecycle. The skill folder is the unit. The description is its public API. The body is its implementation. Treat it accordingly.

Version control

Skills are folders of text. Put them in Git. Tag versions. Use pull requests for changes — especially to the description, because description changes affect routing for every user of the skill, immediately, in production.

Code review

Review skill changes the way you review function signatures. A new do not clause in the description can make a previously routed user message fall through to nothing. A new pointer to REFERENCE.md can shift the agent’s context budget. Both deserve a second pair of eyes.

Testing

Maintain a fixture of representative user messages, each labeled with the skill it should trigger. Re-run after every change. This is the equivalent of a regression suite, and it catches description drift before users do.

Deprecation

When a skill is superseded, don’t just delete it. Replace its description with a clear redirect (“Use the X skill instead — this one no longer fires”), then remove it in a later cycle. Sudden deletions break clients you didn’t know existed.

A maturity ladder for skill catalogues
L0
Ad hoc. Skills live on individual users’ laptops. Nobody knows who has what. Two people may have written the same skill differently.
L1
Shared repository. Skills live in a Git repo. Naming conventions are agreed. Anyone can add or update, but quality varies.
L2
Reviewed & versioned. Changes go through pull request. Skills are tagged. A document explains what each skill is for.
L3
Tested & measured. A regression suite checks trigger accuracy. Activation rates and logs are monitored. Deprecation is a planned process.
Figure 13. The four stages most organizations move through as their skill libraries grow. Most teams stop at L1 and wonder why their skills feel unreliable; L2 is where reliability actually starts.

Two practices buy more reliability than all the rest combined: code review on description changes, and a small regression suite of expected user-message-to-skill matchings. Everything else is polish.

§ XVPractical Notes & Pitfalls

A short list of the lessons that take a few skills of your own to learn the hard way.

Design

Operations

Common pitfalls

When not to use a skill

§ XVIReferences

Primary sources and well-regarded community write-ups, current as of mid-2026.

  1. Anthropic. Agent Skills — Overview. Claude Platform Documentation. platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
  2. Anthropic Courses. Introduction to Agent Skills. Skilljar. anthropic.skilljar.com/introduction-to-agent-skills
  3. anthropics/skills. Official open-source skill repository. GitHub. github.com/anthropics/skills
  4. Lee, H. Claude Agent Skills: A First-Principles Deep Dive. Oct 2025. leehanchung.github.io
  5. SwirlAI Newsletter. Agent Skills: Progressive Disclosure as a System Design Pattern. Mar 2026. newsletter.swirlai.com
  6. Whittaker, P. Progressive Discovery: A Better Mental Model for Agent Skills. dev.to, Apr 2026.
  7. MCPJam. Progressive Disclosure Might Replace MCP. Oct 2025. mcpjam.com
  8. Anthropic Engineering. Equipping agents for the real world with Agent Skills. Oct 2025. anthropic.com/engineering
  9. Simon Willison. Skills: a new way to give Claude long-running expertise. Weblog, Oct 2025. simonwillison.net
  10. OWASP. LLM01: Prompt Injection. OWASP Top 10 for LLM Applications — background on the injection risks discussed in §XIII. genai.owasp.org
⁂ ⁂ ⁂
Set in Cormorant Garamond & Crimson Pro · Printed digitally · MMXXVI