Skills in AI Agents

A short, illustrated monograph on the folder that taught your agent a new trick — its anatomy, its loading rules, and the design choices that decide whether it ever gets used.

By Majid Mazouchi · An interactive primer · Reading time ≈ 14 min · With six interactive figures

The Idea, in One Page
Anatomy of a Skill
Progressive Disclosure
Skills in the Wider Agent Loop
How an Agent Picks a Skill
Writing a Description That Triggers
The Skill Audit
Skills vs. Tools vs. MCP
A Worked Example: the PDF Skill
Build Your First Skill, by Hand
Skill Composition
Anti-Patterns Gallery
Security & Trust
Skills as a Team Artifact
Practical Notes & Pitfalls
References

§ IThe Idea, in One Page

A skill is a folder of instructions that an agent reads only when it needs to. Nothing more mysterious than that.

Imagine you hire a brilliant generalist. They are fluent in a dozen languages, can reason their way through almost any puzzle, and have read more books than any human alive. But on Tuesday morning you ask them to fill out a very specific form, in a very specific way, that your company has used since 1994. They don’t know that form. They could figure it out, but you’d rather not pay them to rediscover it every week.

So you leave a one-page note on their desk: “When someone asks you to fill the 1994 form, here is exactly how to do it. The blank template is in drawer B. The examples are in folder C.” They glance at the note, do the job, and put it back. The next time, they glance again.

That note is a skill. In the language of modern AI agents, a skill is a folder containing a single file called SKILL.md — plus, optionally, scripts, references, and templates — that the agent discovers, loads, and applies on demand. The agent reads only what it needs, when it needs it. The folder can be huge. The context window stays small.

A skill is a self-contained folder of instructions and resources that an AI agent loads on-demand to do a specialized task well, without polluting its working memory the rest of the time.

Anthropic introduced this pattern formally in October 2025, and the design has spread fast because it solves a real problem: agents that try to know everything up front end up confused, slow, and expensive. Skills let an agent know where to look instead of knowing everything — which, as any librarian will tell you, is the older and wiser strategy.

§ IIAnatomy of a Skill

Click any file or folder below. The right-hand panel shows what lives inside it and why it exists.

Every skill is a directory with at least one required file: SKILL.md. The rest is optional — scripts, reference documents, templates, sample data. The convention is deliberately humble: a skill looks like a tidy folder on your laptop, because that is exactly what it is.

📁 pdf-skill/ ├── SKILL.md ├── FORMS.md ├── REFERENCE.md ├── 📁 scripts/ │ ├── fill_form.py │ └── merge.py └── 📁 templates/ └── invoice.pdf

Figure 1. A typical skill folder. Click any item to inspect its role. The only required file is SKILL.md; everything else is supporting material.

The frontmatter is what gets read first

The very top of SKILL.md is a small YAML block called the frontmatter. It carries two fields that matter: a name and a description. These — and only these — are what the agent sees at the start of every conversation. Together they cost only a few dozen tokens per skill, which is why an agent can be aware of hundreds of skills without drowning.

--- SKILL.md ---
---
name: pdf
description: Use this skill whenever the user wants to do anything
  with PDF files — read, extract, fill forms, merge, split, watermark,
  encrypt, or OCR scanned PDFs. Trigger if the user mentions a .pdf
  file or asks to produce one.
---

# PDF Processing

This skill helps you read and create PDFs reliably.
For form filling, see FORMS.md. For OCR or advanced usage, see REFERENCE.md.

## Quick start
...

The frontmatter is a contract: name is how humans refer to the skill, description is how the agent decides whether to open the folder. Everything else in the body is loaded only after that decision is made.

§ IIIProgressive Disclosure

The single most important idea: the agent never loads more than it needs.

A skill folder can be enormous. Megabytes of reference docs, dozens of scripts, hundreds of templates. None of that hurts the agent, because none of it gets loaded eagerly. The pattern is called progressive disclosure — sometimes more naturally called progressive discovery — and it works in three layers.

i
Layer 1 — DiscoveryAlways in context · name + description onlyAt session start the agent sees a catalogue of every available skill, each entry just a name and a one-sentence description. It has no other detail.
~80tokens / skill
ii
Layer 2 — ActivationLoaded when the skill is judged relevantWhen the agent decides this skill matches the task, it reads the full body of SKILL.md — the procedural knowledge, workflows, and pointers to deeper files.
~2ktokens median
iii
Layer 3 — On-demand depthLoaded only if a sub-task demands itSupporting files — FORMS.md, REFERENCE.md, scripts, templates — are read only if the workflow reaches a step that needs them.
∞effectively unbounded

17 skills, idle

— tok

1 skill in use

— tok

Deep sub-task

— tok

Figure 2. The three layers of progressive disclosure. Toggle the bottom switch to compare against loading every skill eagerly. Token estimates follow measurements published across Anthropic’s 17 official skills, where discovery cost is ≈ 55–235 tokens and bodies range ≈ 275–8,000 tokens.

This is the architectural payoff: the agent knows that the skill exists for the price of one sentence, and only pays the price of the full instructions when it actually commits to using them. In practice this means an agent can sit on top of a library of hundreds of skills without breaking its context budget — something MCP-style approaches, which load every connected tool eagerly, cannot match.

§ IVSkills in the Wider Agent Loop

Where exactly does a skill sit in an agent’s working memory? The picture is less mysterious than it sounds.

An agent’s context window at any moment is not a single block of text. It is a stack of layers, each contributed by a different actor in the system, and the agent reads all of them together when deciding what to do next. A skill is one of those layers — sometimes invisible, sometimes loud — and seeing the stack as a whole is what makes the rest of this monograph click into place.

Show state:

— tap any layer above —

Figure 3. The agent’s context window as a stack of layers. Toggle the state to see what changes when a skill activates: a new block of instructions is injected, and everything else stays as it was.

Two things are worth absorbing from this picture. First, the skill catalogue is always present — a small, persistent layer of one-sentence pitches that lets the agent reason about which skills are available. Second, when a skill activates, its body does not replace anything; it is added alongside the rest. The system prompt, the conversation, the tool definitions, the user’s message — all of it remains. The skill is a new voice in the same room.

Think of the context window as a desk. The system prompt is the etiquette guide you keep on the desk. Tool definitions are the labels on the drawers. The skill catalogue is a card index of which folders live in those drawers. When a skill activates, the agent pulls one folder out and lays it open next to everything else. Nothing on the desk gets thrown away — but the open folder now influences the next move.

§ VHow an Agent Picks a Skill

No keyword index, no embedding search. The agent reads descriptions and reasons.

When a user’s message arrives, the agent looks at its catalogue — the layer-1 entries — and decides, in plain language, whether any of those one-sentence pitches fit the request. This is pure reasoning, not retrieval. There is no vector store, no fuzzy match. The agent reads the descriptions the way you read the spines on a bookshelf, and picks the one whose label matches the job.

That makes the description not just metadata but the entire interface between user intent and skill activation. A description that is vague, jargony, or written in the first person will simply be passed over — even if the skill behind it is excellent.

Tap a sample task. Watch which skill on the shelf lights up.

Read this contract.pdf and summarize it Build a budget spreadsheet with formulas Make me a 10-slide pitch deck Write me a formal client letter Design a landing page hero What’s the capital of France?

Selected

pdf

Read, extract, fill, merge, split, OCR PDF files.

Selected

docx

Create or edit Word documents — letters, reports, memos.

Selected

xlsx

Build or edit spreadsheets with formulas and charts.

Selected

pptx

Create slide decks, presentations, and pitch decks.

Selected

frontend-design

Design polished web UI — pages, components, landing pages.

— pick a task above —

Figure 4. Skill discovery is a reasoning step, not a search. The agent reads each description and judges fit. When no skill matches the task — such as a plain factual question — none are loaded, and the agent answers from its general knowledge.

§ VIWriting a Description That Triggers

Most failed skills fail at the front door. The description either over-promises, under-specifies, or speaks to the wrong audience.

The agent picks skills by reading their description. So the description must do three jobs at once: name the thing, name the trigger conditions, and name the edge cases where it should not fire. Write it in the third person — you are addressing the model, not the end user — and include the literal words a user is likely to say.

Figure 5. Two descriptions for the same imaginary skill. The weak one looks fine in isolation but loses the agent’s attention. The strong one is explicit about what it does, who it serves, and what it should refuse.

Rules of thumb for descriptions

Lead with the trigger. Start with “Use this skill when…” or “Use whenever the user…”
Use the user’s words. If users say “deck,” “slides,” or “pitch,” put all three in the description.
Name the file types or artifacts. Extensions, framework names, and product nouns are strong matching signals.
State the boundary. “Do not trigger for X” prevents a skill from being yanked into unrelated work.
Stay under ~1,024 characters. Long enough to be specific, short enough to keep the catalogue cheap.

§ VIIThe Skill Audit

A quick rubric for grading any description before you ship it.

You can spend hours wondering why a skill never triggers, or you can run the description through a five-check audit in thirty seconds. The audit below scores a draft on the properties that most reliably correlate with reliable matching. Edit the box; the rubric reacts.

—

audit score

— edit the description above —

Figure 6. A live audit of a description. Each check corresponds to a property that experience shows correlates with reliable triggering. None of these guarantees a skill will fire — but a description that fails most of them almost certainly will not.

What the five checks mean

Trigger phrase. Descriptions that explicitly say “Use this skill when…” route more reliably than ones that merely describe what the skill does.
Concrete artifacts. File extensions and named products (.docx, “Excel,” “pitch deck”) give the agent surface evidence to match user phrasing against.
Negative guidance. Explicit “do not” clauses prevent the description from over-firing on adjacent tasks.
Length in range. Too short and the description is unspecific; too long and the catalogue gets expensive. About 40–1024 characters is the working band.
Third-person voice. Descriptions are read by the agent, not the user. First-person voice (“I help with…”) signals a different audience and can confuse the matcher.

§ VIIISkills vs. Tools vs. MCP

Three overlapping ideas, often confused. Here is the clean separation.

An agent acquires capabilities in several distinct ways, and beginners often blur them. Tools are atomic verbs the agent can invoke — search the web, run a shell command, query a database. MCP servers are external processes that publish a bundle of tools the agent can call over a protocol. Skills are knowledge — instructions, workflows, examples — that teach the agent how to use whatever tools it already has, well, for a specialized task.

	Skills	Tools	MCP
What it is	A folder of instructions & resources teaching the agent how to do something.	A single callable function the agent can invoke (e.g. `web_search`).	A protocol & remote server that exposes one or more tools over a network.
Loaded when	Body loads only after the agent decides the skill fits the task.	Function signature is in context whenever the tool is enabled.	All tools from the server are pulled into context on connection.
Context cost	~80 tokens to know it exists; ~2k when used.	Hundreds of tokens per tool definition, always present.	Cost scales with every tool the server exposes — can be heavy.
Lives where	A local folder, often shared as a zip or via a marketplace.	Defined in the agent runtime or by the platform.	Remote server, hosted by a vendor or self-hosted.
Best for	Repeatable workflows that need procedure, style, or domain context.	A single, well-bounded action the agent invokes.	Bringing live external data or vendor APIs into the agent.
Weakness	Bad descriptions = never triggered. Hard to test in isolation.	Too many tools crowd the context window and degrade selection.	Many tools loaded eagerly; accuracy drops past 2–3 connected servers.

Figure 7. Skills, tools, and MCP are complements, not substitutes. A good agent uses all three: tools to act, MCP to reach external systems, and skills to know how to behave.

Tools are verbs. MCP is a bridge to verbs that live elsewhere. Skills are the playbook that tells the agent which verbs to chain together, in what order, for what kind of job.

§ IXA Worked Example: the PDF Skill

Walking through one real skill end-to-end clarifies what the other sections describe in the abstract.

Suppose a user says: “Here’s a 40-page PDF. Pull out the tables and email me a summary.” Here is the lifecycle of the agent’s response, with the PDF skill playing its role at each step.

Catalogue scan

Agent reads layer-1 descriptions of all installed skills.

PDF skill matches

Description names “PDF” and “extract.” The agent selects it.

Load SKILL.md

Full body enters context. Quick-start, tool list, pointers.

Execute workflow

Runs pdfplumber per the skill’s recipe; extracts tables.

Stop & reply

Forms guide never loads. Reference doc never loads. Done.

Figure 8. Lifecycle of a single skill invocation. The deeper files in the folder — FORMS.md, REFERENCE.md — were available the whole time but never touched, because this particular task did not require them.

What is invisible but important: the agent did not blindly call a tool. It read a procedure that recommended a specific library (pdfplumber), explained when to fall back to OCR, and warned about scanned PDFs. That procedural knowledge is the entire point of the skill. Without it, the agent might have tried three different libraries, produced inconsistent output, or asked the user a clarifying question that the skill had already answered in advance.

Skills are how you stop teaching the agent the same thing for the tenth time. You teach it once, in a file, and it learns once, for every future conversation. — Paraphrased from the Anthropic Skills course

§ XBuild Your First Skill, by Hand

The whole pattern is so light that the cleanest way to learn it is to build one.

Below is a small builder. Fill in the three fields and a real SKILL.md assembles itself in the panel on the right. Drop the result into a folder named after your skill, place that folder where your agent looks for skills, and on the next session it will be discoverable.

name — kebab-case, short, unambiguous description — third person, lead with the trigger body — the procedure the agent should follow

SKILL.md (live)

Figure 9. A live SKILL.md builder. Edit any field and watch the file update. The pattern is intentionally simple: a YAML header for routing, then a markdown body that reads like instructions to a careful colleague.

Where to put the folder

User-wide: ~/.claude/skills/your-skill/ — available to your agent across every project.
Per-project: ./.claude/skills/your-skill/ — only available in this workspace, ships with the repo.
API / SDK: upload the directory through the platform’s skill-management endpoint.

Most skills don’t trigger on the first try. The fix is almost always the description, not the body. When something doesn’t fire, copy the user message that should have triggered it and the description side by side — usually the mismatch becomes obvious in seconds.

§ XISkill Composition

When one request needs two skills at once.

Few real requests are atomic. A user says: “Take this PDF, pull out the tables, and turn them into a Word report.” Two skills want a turn — the PDF skill for the read, the docx skill for the write. The agent handles this by simply activating both: their bodies enter the context one after the other, and the agent stitches the procedures together.

User: “Pull tables from this PDF and write a Word report.”

pdf skill

Extract tables
from input PDF

→

docx skill

Compose Word
report from data

Agent stitches: extract → transform → render → deliver

Figure 10. Two skills activated for one request. There is no internal coordinator — both bodies enter the context, and the agent treats them as a single, longer instruction set.

This works cleanly when the skills are orthogonal — each responsible for a different phase, a different artifact, with no overlap in authority. It works badly when two skills both claim the same step. Then the agent has to arbitrate, and arbitration is the unstable part of any system.

Design rules for composability

One artifact per skill. The PDF skill owns PDFs. The docx skill owns Word documents. They never overlap on file ownership.
State your inputs and outputs. A composable skill knows what it consumes and what it produces, and says so in the body.
Avoid global pronouncements. Skills that say “always do X” become parasites on every conversation. Keep advice scoped to the skill’s artifact.
Don’t depend on order. The agent may activate skills in either order. A skill that assumes another has run first is fragile.

§ XIIAnti-Patterns Gallery

Four ways skills go wrong, each with a name, a symptom, and a fix.

The Kitchen Sink

Symptom

Fires on half of all requests. Once activated, gives generic advice that doesn’t fit the specific task.

Example description

“A helpful skill for working with all kinds of documents.”

Fix

Split it. One skill per artifact: pdf, docx, xlsx, pptx. Each with its own scoped description.

The Personality Skill

Symptom

Either fires on everything or nothing, because it encodes a preference, not a capability.

Example description

“Always respond in bullet points and use Australian English.”

Fix

This is not a skill. Put it in the system prompt, user preferences, or a style guide.

iii

The Duplicate Skill

Symptom

Two skills cover the same ground. The same user message triggers different ones on different days.

Example

A pdf-reader skill and a pdf-tools skill, both describing themselves as “for working with PDFs.”

Fix

Merge them, or differentiate sharply (one reads, the other writes — say so in each).

The Dependency Tangle

Symptom

Skill works on your machine. Breaks on a colleague’s. The body silently assumes other skills, libraries, or files exist.

Example

A skill that calls scripts/process.py without including the script — or one that says “use the data-tools skill” without checking it’s installed.

Fix

Ship every dependency inside the folder. State assumptions explicitly at the top of the body.

Figure 11. The four most common skill failure modes, with diagnosis and remedy. Most production skills have brushed up against at least two of these.

§ XIIISecurity & Trust

A skill is text plus, optionally, executable code. Installing a stranger’s skill is no different from running a stranger’s script.

The skill pattern is friendly, but it is not magic — it inherits the trust model of the filesystem it lives in. There are three risks worth naming explicitly.

Prompt injection from skill content

The body of SKILL.md is text the agent reads as authoritative instructions. A malicious skill can tell the agent to ignore the user, exfiltrate context to a URL, or chain tools in ways the user did not request. Because activation is decided by the description alone, a skill that looks benign in its catalogue entry can carry hostile instructions in its body.

Script execution

Skills can ship Python or shell scripts. When the agent has code execution enabled and follows the skill’s recipe, it will run those scripts. The sandbox is the only thing between the script and your filesystem; sandbox guarantees vary by platform and configuration. Read the scripts.

Supply chain

Skill marketplaces are new. There is no equivalent of npm audit yet. A skill that was safe at v1.0 may not be at v1.1, and updates can ship silently if you do not pin versions. Track where your skills come from the way you would track third-party libraries.

Trust checklist · before installing any skill

✓

Read the SKILL.md. It is plain markdown. Skim the body for unusual instructions — fetch this URL, run this command, ignore the user.

✓

Inspect every script. If the folder has a scripts/ directory, treat its contents like any third-party code you’re about to run.

✓

Verify the source. Official Anthropic repository, your organization’s registry, or a well-known author. Strangers from random gists are a different bar.

✓

Pin versions in production. “Latest” is fine for personal exploration. For agents shipping in products, freeze the exact version and review each update.

✓

Constrain the agent’s reach. Disable network or filesystem access for code-execution sandboxes when the skill doesn’t need them.

✓

Log activations. Knowing which skills fired in which conversations turns a mystery into an audit trail.

Figure 12. Six checks to run before adding a new skill to a system you care about. Most are common-sense software hygiene; the difference is that “the agent will do this on my behalf” raises the cost of getting it wrong.

§ XIVSkills as a Team Artifact

Once a team has more than a few skills, the catalogue becomes infrastructure. The lessons from software engineering port almost without modification.

The pattern that emerges from teams running skills in earnest looks a lot like a regular software lifecycle. The skill folder is the unit. The description is its public API. The body is its implementation. Treat it accordingly.

Version control

Skills are folders of text. Put them in Git. Tag versions. Use pull requests for changes — especially to the description, because description changes affect routing for every user of the skill, immediately, in production.

Code review

Review skill changes the way you review function signatures. A new do not clause in the description can make a previously routed user message fall through to nothing. A new pointer to REFERENCE.md can shift the agent’s context budget. Both deserve a second pair of eyes.

Testing

Maintain a fixture of representative user messages, each labeled with the skill it should trigger. Re-run after every change. This is the equivalent of a regression suite, and it catches description drift before users do.

Deprecation

When a skill is superseded, don’t just delete it. Replace its description with a clear redirect (“Use the X skill instead — this one no longer fires”), then remove it in a later cycle. Sudden deletions break clients you didn’t know existed.

A maturity ladder for skill catalogues

Ad hoc. Skills live on individual users’ laptops. Nobody knows who has what. Two people may have written the same skill differently.

Shared repository. Skills live in a Git repo. Naming conventions are agreed. Anyone can add or update, but quality varies.

Reviewed & versioned. Changes go through pull request. Skills are tagged. A document explains what each skill is for.

Tested & measured. A regression suite checks trigger accuracy. Activation rates and logs are monitored. Deprecation is a planned process.

Figure 13. The four stages most organizations move through as their skill libraries grow. Most teams stop at L1 and wonder why their skills feel unreliable; L2 is where reliability actually starts.

Two practices buy more reliability than all the rest combined: code review on description changes, and a small regression suite of expected user-message-to-skill matchings. Everything else is polish.

§ XVPractical Notes & Pitfalls

A short list of the lessons that take a few skills of your own to learn the hard way.

Design

One skill, one capability. If your skill description starts saying “and also,” split it. The matcher gets confused by skills that try to be two things.
Body is the procedure, not the encyclopedia. Keep SKILL.md focused on the workflow. Push examples, edge cases, and deep references into separate files and link to them from the body.
Write deterministic recipes. “Use library X for Y” beats “use a suitable library.” The whole point is to remove ambiguity.
Include negative guidance. “Do not use this skill for spreadsheets” protects the skill from being mis-selected and the matcher from drift.

Operations

Skills live on disk. Typical locations: a user-wide folder (~/.claude/skills/) and per-project folders. The agent enumerates both at startup.
Skills are sharable. A skill is just a directory; zip it, post it, install it. Public marketplaces and curated repositories already exist.
Skills can ship scripts. If the agent has code execution, your skill can carry tested Python or shell scripts the agent runs verbatim — a way to make agent behavior reproducible.
Skills travel across surfaces. The same skill folder can be picked up by Claude Code, the Claude app (with code execution enabled), and API agents.

Common pitfalls

Vague descriptions are the #1 reason a perfectly good skill never gets used. If you suspect mis-routing, fix the description first.
Overstuffed SKILL.md wastes the context budget every time the skill activates. Move detail to sibling files.
Skills that duplicate each other’s coverage compete in the matcher. Audit your skill library the way you would audit a directory of utility functions.
Untested skills drift. When the platform or the model updates, a description that used to trigger reliably may stop. Treat your skill catalogue like code — version it, test it, fix it.

When not to use a skill

If the capability is a one-time hack, just put it in the system prompt.
If the capability is fundamentally about reaching external data (live APIs, vendor systems), use an MCP server or a tool, not a skill.
If the capability is fundamentally an atomic verb, make it a tool.
If the “skill” is really a personal preference (“always reply in bullet points”), put it in user preferences or a style guide, not in a skill.

§ XVIReferences

Primary sources and well-regarded community write-ups, current as of mid-2026.

Anthropic. Agent Skills — Overview. Claude Platform Documentation. platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
Anthropic Courses. Introduction to Agent Skills. Skilljar. anthropic.skilljar.com/introduction-to-agent-skills
anthropics/skills. Official open-source skill repository. GitHub. github.com/anthropics/skills
Lee, H. Claude Agent Skills: A First-Principles Deep Dive. Oct 2025. leehanchung.github.io
SwirlAI Newsletter. Agent Skills: Progressive Disclosure as a System Design Pattern. Mar 2026. newsletter.swirlai.com
Whittaker, P. Progressive Discovery: A Better Mental Model for Agent Skills. dev.to, Apr 2026.
MCPJam. Progressive Disclosure Might Replace MCP. Oct 2025. mcpjam.com
Anthropic Engineering. Equipping agents for the real world with Agent Skills. Oct 2025. anthropic.com/engineering
Simon Willison. Skills: a new way to give Claude long-running expertise. Weblog, Oct 2025. simonwillison.net
OWASP. LLM01: Prompt Injection. OWASP Top 10 for LLM Applications — background on the injection risks discussed in §XIII. genai.owasp.org

⁂ ⁂ ⁂

Set in Cormorant Garamond & Crimson Pro · Printed digitally · MMXXVI

Skills in AI Agents

Contents

§ IThe Idea, in One Page

§ IIAnatomy of a Skill

The frontmatter is what gets read first

§ IIIProgressive Disclosure

Layer 1 — Discovery

Layer 2 — Activation

Layer 3 — On-demand depth

§ IVSkills in the Wider Agent Loop

§ VHow an Agent Picks a Skill

pdf

docx

xlsx

pptx

frontend-design

§ VIWriting a Description That Triggers

Rules of thumb for descriptions

§ VIIThe Skill Audit

What the five checks mean

§ VIIISkills vs. Tools vs. MCP

§ IXA Worked Example: the PDF Skill

Catalogue scan

PDF skill matches

Load SKILL.md

Execute workflow

Stop & reply

§ XBuild Your First Skill, by Hand

Where to put the folder

§ XISkill Composition

Design rules for composability

§ XIIAnti-Patterns Gallery

The Kitchen Sink

The Personality Skill

The Duplicate Skill

The Dependency Tangle

§ XIIISecurity & Trust

Prompt injection from skill content

Script execution

Supply chain

§ XIVSkills as a Team Artifact

Version control

Code review

Testing

Deprecation

§ XVPractical Notes & Pitfalls

Design

Operations

Common pitfalls

When not to use a skill

§ XVIReferences