A short, illustrated monograph on the folder that taught your agent a new trick — its anatomy, its loading rules, and the design choices that decide whether it ever gets used.
A skill is a folder of instructions that an agent reads only when it needs to. Nothing more mysterious than that.
Imagine you hire a brilliant generalist. They are fluent in a dozen languages, can reason their way through almost any puzzle, and have read more books than any human alive. But on Tuesday morning you ask them to fill out a very specific form, in a very specific way, that your company has used since 1994. They don’t know that form. They could figure it out, but you’d rather not pay them to rediscover it every week.
So you leave a one-page note on their desk: “When someone asks you to fill the 1994 form, here is exactly how to do it. The blank template is in drawer B. The examples are in folder C.” They glance at the note, do the job, and put it back. The next time, they glance again.
That note is a skill. In the language of modern AI agents, a skill is a folder containing a single file called SKILL.md — plus, optionally, scripts, references, and templates — that the agent discovers, loads, and applies on demand. The agent reads only what it needs, when it needs it. The folder can be huge. The context window stays small.
A skill is a self-contained folder of instructions and resources that an AI agent loads on-demand to do a specialized task well, without polluting its working memory the rest of the time.
Anthropic introduced this pattern formally in October 2025, and the design has spread fast because it solves a real problem: agents that try to know everything up front end up confused, slow, and expensive. Skills let an agent know where to look instead of knowing everything — which, as any librarian will tell you, is the older and wiser strategy.
Click any file or folder below. The right-hand panel shows what lives inside it and why it exists.
Every skill is a directory with at least one required file: SKILL.md. The rest is optional — scripts, reference documents, templates, sample data. The convention is deliberately humble: a skill looks like a tidy folder on your laptop, because that is exactly what it is.
SKILL.md; everything else is supporting material.The very top of SKILL.md is a small YAML block called the frontmatter. It carries two fields that matter: a name and a description. These — and only these — are what the agent sees at the start of every conversation. Together they cost only a few dozen tokens per skill, which is why an agent can be aware of hundreds of skills without drowning.
--- SKILL.md --- --- name: pdf description: Use this skill whenever the user wants to do anything with PDF files — read, extract, fill forms, merge, split, watermark, encrypt, or OCR scanned PDFs. Trigger if the user mentions a .pdf file or asks to produce one. --- # PDF Processing This skill helps you read and create PDFs reliably. For form filling, see FORMS.md. For OCR or advanced usage, see REFERENCE.md. ## Quick start ...
The frontmatter is a contract: name is how humans refer to the skill, description is how the agent decides whether to open the folder. Everything else in the body is loaded only after that decision is made.
The single most important idea: the agent never loads more than it needs.
A skill folder can be enormous. Megabytes of reference docs, dozens of scripts, hundreds of templates. None of that hurts the agent, because none of it gets loaded eagerly. The pattern is called progressive disclosure — sometimes more naturally called progressive discovery — and it works in three layers.
At session start the agent sees a catalogue of every available skill, each entry just a name and a one-sentence description. It has no other detail.
When the agent decides this skill matches the task, it reads the full body of SKILL.md — the procedural knowledge, workflows, and pointers to deeper files.
Supporting files — FORMS.md, REFERENCE.md, scripts, templates — are read only if the workflow reaches a step that needs them.
This is the architectural payoff: the agent knows that the skill exists for the price of one sentence, and only pays the price of the full instructions when it actually commits to using them. In practice this means an agent can sit on top of a library of hundreds of skills without breaking its context budget — something MCP-style approaches, which load every connected tool eagerly, cannot match.
Where exactly does a skill sit in an agent’s working memory? The picture is less mysterious than it sounds.
An agent’s context window at any moment is not a single block of text. It is a stack of layers, each contributed by a different actor in the system, and the agent reads all of them together when deciding what to do next. A skill is one of those layers — sometimes invisible, sometimes loud — and seeing the stack as a whole is what makes the rest of this monograph click into place.
Two things are worth absorbing from this picture. First, the skill catalogue is always present — a small, persistent layer of one-sentence pitches that lets the agent reason about which skills are available. Second, when a skill activates, its body does not replace anything; it is added alongside the rest. The system prompt, the conversation, the tool definitions, the user’s message — all of it remains. The skill is a new voice in the same room.
Think of the context window as a desk. The system prompt is the etiquette guide you keep on the desk. Tool definitions are the labels on the drawers. The skill catalogue is a card index of which folders live in those drawers. When a skill activates, the agent pulls one folder out and lays it open next to everything else. Nothing on the desk gets thrown away — but the open folder now influences the next move.
No keyword index, no embedding search. The agent reads descriptions and reasons.
When a user’s message arrives, the agent looks at its catalogue — the layer-1 entries — and decides, in plain language, whether any of those one-sentence pitches fit the request. This is pure reasoning, not retrieval. There is no vector store, no fuzzy match. The agent reads the descriptions the way you read the spines on a bookshelf, and picks the one whose label matches the job.
That makes the description not just metadata but the entire interface between user intent and skill activation. A description that is vague, jargony, or written in the first person will simply be passed over — even if the skill behind it is excellent.
Tap a sample task. Watch which skill on the shelf lights up.
Most failed skills fail at the front door. The description either over-promises, under-specifies, or speaks to the wrong audience.
The agent picks skills by reading their description. So the description must do three jobs at once: name the thing, name the trigger conditions, and name the edge cases where it should not fire. Write it in the third person — you are addressing the model, not the end user — and include the literal words a user is likely to say.
A quick rubric for grading any description before you ship it.
You can spend hours wondering why a skill never triggers, or you can run the description through a five-check audit in thirty seconds. The audit below scores a draft on the properties that most reliably correlate with reliable matching. Edit the box; the rubric reacts.
.docx, “Excel,” “pitch deck”) give the agent surface evidence to match user phrasing against.Three overlapping ideas, often confused. Here is the clean separation.
An agent acquires capabilities in several distinct ways, and beginners often blur them. Tools are atomic verbs the agent can invoke — search the web, run a shell command, query a database. MCP servers are external processes that publish a bundle of tools the agent can call over a protocol. Skills are knowledge — instructions, workflows, examples — that teach the agent how to use whatever tools it already has, well, for a specialized task.
| Skills | Tools | MCP | |
|---|---|---|---|
| What it is | A folder of instructions & resources teaching the agent how to do something. | A single callable function the agent can invoke (e.g. web_search). |
A protocol & remote server that exposes one or more tools over a network. |
| Loaded when | Body loads only after the agent decides the skill fits the task. | Function signature is in context whenever the tool is enabled. | All tools from the server are pulled into context on connection. |
| Context cost | ~80 tokens to know it exists; ~2k when used. | Hundreds of tokens per tool definition, always present. | Cost scales with every tool the server exposes — can be heavy. |
| Lives where | A local folder, often shared as a zip or via a marketplace. | Defined in the agent runtime or by the platform. | Remote server, hosted by a vendor or self-hosted. |
| Best for | Repeatable workflows that need procedure, style, or domain context. | A single, well-bounded action the agent invokes. | Bringing live external data or vendor APIs into the agent. |
| Weakness | Bad descriptions = never triggered. Hard to test in isolation. | Too many tools crowd the context window and degrade selection. | Many tools loaded eagerly; accuracy drops past 2–3 connected servers. |
Tools are verbs. MCP is a bridge to verbs that live elsewhere. Skills are the playbook that tells the agent which verbs to chain together, in what order, for what kind of job.
Walking through one real skill end-to-end clarifies what the other sections describe in the abstract.
Suppose a user says: “Here’s a 40-page PDF. Pull out the tables and email me a summary.” Here is the lifecycle of the agent’s response, with the PDF skill playing its role at each step.
Agent reads layer-1 descriptions of all installed skills.
Description names “PDF” and “extract.” The agent selects it.
Full body enters context. Quick-start, tool list, pointers.
Runs pdfplumber per the skill’s recipe; extracts tables.
Forms guide never loads. Reference doc never loads. Done.
FORMS.md, REFERENCE.md — were available the whole time but never touched, because this particular task did not require them.What is invisible but important: the agent did not blindly call a tool. It read a procedure that recommended a specific library (pdfplumber), explained when to fall back to OCR, and warned about scanned PDFs. That procedural knowledge is the entire point of the skill. Without it, the agent might have tried three different libraries, produced inconsistent output, or asked the user a clarifying question that the skill had already answered in advance.
Skills are how you stop teaching the agent the same thing for the tenth time. You teach it once, in a file, and it learns once, for every future conversation. — Paraphrased from the Anthropic Skills course
The whole pattern is so light that the cleanest way to learn it is to build one.
Below is a small builder. Fill in the three fields and a real SKILL.md assembles itself in the panel on the right. Drop the result into a folder named after your skill, place that folder where your agent looks for skills, and on the next session it will be discoverable.
~/.claude/skills/your-skill/ — available to your agent across every project../.claude/skills/your-skill/ — only available in this workspace, ships with the repo.Most skills don’t trigger on the first try. The fix is almost always the description, not the body. When something doesn’t fire, copy the user message that should have triggered it and the description side by side — usually the mismatch becomes obvious in seconds.
When one request needs two skills at once.
Few real requests are atomic. A user says: “Take this PDF, pull out the tables, and turn them into a Word report.” Two skills want a turn — the PDF skill for the read, the docx skill for the write. The agent handles this by simply activating both: their bodies enter the context one after the other, and the agent stitches the procedures together.
This works cleanly when the skills are orthogonal — each responsible for a different phase, a different artifact, with no overlap in authority. It works badly when two skills both claim the same step. Then the agent has to arbitrate, and arbitration is the unstable part of any system.
Four ways skills go wrong, each with a name, a symptom, and a fix.
Fires on half of all requests. Once activated, gives generic advice that doesn’t fit the specific task.
“A helpful skill for working with all kinds of documents.”
Split it. One skill per artifact: pdf, docx, xlsx, pptx. Each with its own scoped description.
Either fires on everything or nothing, because it encodes a preference, not a capability.
“Always respond in bullet points and use Australian English.”
This is not a skill. Put it in the system prompt, user preferences, or a style guide.
Two skills cover the same ground. The same user message triggers different ones on different days.
A pdf-reader skill and a pdf-tools skill, both describing themselves as “for working with PDFs.”
Merge them, or differentiate sharply (one reads, the other writes — say so in each).
Skill works on your machine. Breaks on a colleague’s. The body silently assumes other skills, libraries, or files exist.
A skill that calls scripts/process.py without including the script — or one that says “use the data-tools skill” without checking it’s installed.
Ship every dependency inside the folder. State assumptions explicitly at the top of the body.
A skill is text plus, optionally, executable code. Installing a stranger’s skill is no different from running a stranger’s script.
The skill pattern is friendly, but it is not magic — it inherits the trust model of the filesystem it lives in. There are three risks worth naming explicitly.
The body of SKILL.md is text the agent reads as authoritative instructions. A malicious skill can tell the agent to ignore the user, exfiltrate context to a URL, or chain tools in ways the user did not request. Because activation is decided by the description alone, a skill that looks benign in its catalogue entry can carry hostile instructions in its body.
Skills can ship Python or shell scripts. When the agent has code execution enabled and follows the skill’s recipe, it will run those scripts. The sandbox is the only thing between the script and your filesystem; sandbox guarantees vary by platform and configuration. Read the scripts.
Skill marketplaces are new. There is no equivalent of npm audit yet. A skill that was safe at v1.0 may not be at v1.1, and updates can ship silently if you do not pin versions. Track where your skills come from the way you would track third-party libraries.
scripts/ directory, treat its contents like any third-party code you’re about to run.Once a team has more than a few skills, the catalogue becomes infrastructure. The lessons from software engineering port almost without modification.
The pattern that emerges from teams running skills in earnest looks a lot like a regular software lifecycle. The skill folder is the unit. The description is its public API. The body is its implementation. Treat it accordingly.
Skills are folders of text. Put them in Git. Tag versions. Use pull requests for changes — especially to the description, because description changes affect routing for every user of the skill, immediately, in production.
Review skill changes the way you review function signatures. A new do not clause in the description can make a previously routed user message fall through to nothing. A new pointer to REFERENCE.md can shift the agent’s context budget. Both deserve a second pair of eyes.
Maintain a fixture of representative user messages, each labeled with the skill it should trigger. Re-run after every change. This is the equivalent of a regression suite, and it catches description drift before users do.
When a skill is superseded, don’t just delete it. Replace its description with a clear redirect (“Use the X skill instead — this one no longer fires”), then remove it in a later cycle. Sudden deletions break clients you didn’t know existed.
Two practices buy more reliability than all the rest combined: code review on description changes, and a small regression suite of expected user-message-to-skill matchings. Everything else is polish.
A short list of the lessons that take a few skills of your own to learn the hard way.
SKILL.md focused on the workflow. Push examples, edge cases, and deep references into separate files and link to them from the body.~/.claude/skills/) and per-project folders. The agent enumerates both at startup.Primary sources and well-regarded community write-ups, current as of mid-2026.