When you onboard a new engineer, you don't hand them a 1,200-page manual and tell them to read it cover to cover before writing a line of code. You give them a short README, a pointer to the runbooks that matter for their first ticket, and a script or two they can just run instead of re-deriving the steps by hand. They pull in the rest — the edge cases, the architecture docs, the historical incident write-ups — only when a specific task actually needs it.
That is, almost exactly, what an Agent Skill is for a coding agent. A skill is a packaged unit of expertise — a procedure, a house style, a domain checklist — that an agent can discover, decide is relevant, and load into its working context only at the moment it's needed. It is the difference between an agent that's smart in general and an agent that knows your team's specific way of doing a code review, a database migration, or a release.
Key takeaways
- A skill is three things: SKILL.md (always loaded, tells the agent when to reach for the rest), references/ (loaded on demand), and scripts/ (executed, not re-derived).
- Progressive disclosure — reading the frontmatter first, the body second, the references third — is what keeps a library of skills from silently eating the whole context window.
- The lifecycle is discovery → trigger matching → load → execute. Most skills fail at the second step because the description wasn't written for a matcher, it was written for a human.
- The best anti-skill test: if you can't say in one sentence when NOT to use it, the trigger is too vague and it will fire (or fail to fire) unpredictably.
- A description is a budget, not free text — a few concrete nouns plus one negative case beats a paragraph, because every installed skill's frontmatter gets scanned on every task whether it matches or not.
- Split into two skills when the trigger conditions differ, not when the topic differs. Keep one skill with several references when it's the same trigger with more depth underneath it.
The anatomy of a skill#
Every well-formed skill follows the same three-part shape. It maps almost one-to-one onto how you'd structure documentation for a new hire: an entry point that's always visible, deeper material that's there if you go looking for it, and executable tooling that replaces a page of instructions with a single command.
Figure 1
Skill anatomy
SKILL.md — the entry point#
Every skill has exactly one SKILL.md, and it is the only part of the skill an agent loads by default, before any task has been matched to it. It opens with a small YAML frontmatter block — just a name and a description — followed by the actual procedure in Markdown: numbered steps, decision points, and pointers to the files that hold the rest.
---name: database-migration-reviewdescription: Reviews SQL migration files for reversibility, lock duration, and backward compatibility before they merge. Use when a PR touches files under migrations/ or db/changes/.---
## When to use this skillAny PR that adds or modifies a file under `migrations/` or `db/changes/`.
## Steps1. Run `scripts/check_reversibility.py` on the changed files.2. If the migration adds a NOT NULL column without a default, read `references/backward-compat.md` before approving.3. Flag any `ALTER TABLE` that isn't wrapped for online DDL — see `references/lock-duration.md` for the checklist.4. Summarize findings using the template in `references/review-template.md`.Notice what's missing from that file: the actual backward-compat checklist, the lock-duration rules, the review template. Those live one directory down, and the agent has no idea what's in them until step 2 or 3 tells it to go look. That's the whole point.
references/ — knowledge on demand#
The references/ folder holds the material that's too detailed to keep in the always-loaded entry point but too important to leave out entirely: checklists, schema notes, prior incident write-ups, style guides. Nothing in here is loaded untilSKILL.md explicitly points the agent to a specific file for a specific reason. A skill can carry a dozen reference files without costing anything until one of them is actually relevant to the task in front of the agent.
scripts/ — code the agent runs, not prose it re-derives#
This is the part teams skip most often, and it's the one with the highest leverage. If a step in your procedure is "compute X, then check that it's under threshold Y," don't describe that computation in Markdown and hope the model reproduces it correctly every time — write it as a script, put it in scripts/, and have SKILL.md tell the agent to run it. A script that exits 0 or 1 is an unambiguous signal; three paragraphs explaining how to eyeball a migration for lock risk is not.
Why progressive disclosure saves context#
Context is the scarcest resource an agent has. Every token spent on a reference file you didn't need is a token not spent reasoning about the actual task, and past a certain point extra context doesn't just cost money — it measurably degrades recall and increases the odds the agent latches onto an irrelevant detail. A skills library with even a modest number of skills, each with a handful of reference files, would blow well past any reasonable context budget if everything loaded at once.
Progressive disclosure is the fix, and it's a strict ordering: frontmatter first, body second, references and scripts only on demand. The frontmatter for ten skills costs a few hundred tokens combined. The full contents of those same ten skills, references included, could easily run into the tens of thousands. Loading everything "just in case" isn't more thorough — it's mostly waste, and it's waste that competes with the task at hand for the model's attention.
Figure 2
Context budget: load everything vs. progressive disclosure
This is why file size limits exist
SKILL.md at a few hundred lines for exactly this reason: the entry point is the part every skill pays for on every request, whether it's triggered or not. Push detail down into references/ and the tax for having the skill installed at all stays tiny.The trigger-and-load lifecycle#
Once a skill exists, an agent moves through the same four stages every time it considers using it. Understanding this pipeline is the single most useful thing you can do before writing your own skill, because most of the failure modes below live in one specific stage.
Figure 3
Trigger and load lifecycle
- Discovery — the agent scans the frontmatter of every installed skill. This is cheap and happens whether or not any skill ends up relevant.
- Trigger matching — the agent compares the current task against each skill's
description. This is the step almost everything below in this post is really about: the description is the only thing the matcher has to go on. - Load — once a skill matches, its
SKILL.mdbody, and then whicheverreferences/andscripts/files it points to, get read into context. - Execute — the agent follows the steps, running scripts where the skill tells it to rather than approximating them.
Writing skills that actually get triggered#
A skill that never fires is worse than no skill at all — it's maintenance burden with zero payoff. The gap between a skill that gets used and one that quietly sits idle almost always comes down to how it was written, not what it does.
- Write the description for a matcher, not a reader. "Helps with databases" tells a human roughly what a skill is about and tells a matcher almost nothing. "Reviews SQL migration files for reversibility and lock duration before they merge" gives the matcher concrete nouns to compare against the actual task.
- Name it like a role or a noun, not a verb phrase.
database-migration-reviewreads clearly next to nine other skill names in a directory listing.help-with-db-stuffdoesn't, and it invites scope creep because nothing about the name constrains what belongs inside it. - Make scripts exit with a clear signal. Exit
0for pass, non-zero for fail, and print the specific reason to stdout. An agent can act on that reliably; it can't reliably act on a script that always exits0and buries the real answer in a log line. - State the negative case. A line like "Does not apply to read-only reporting queries" in the description does as much work as the positive trigger — it stops the skill from firing on adjacent tasks it wasn't built for.
Anti-patterns to avoid#
- The one-file monolith. Cramming everything — entry point, checklists, edge cases — into a single 900-line
SKILL.mddefeats progressive disclosure entirely. If the file is long enough that you'd scroll to find something in it, it belongs inreferences/instead. - Vague, overlapping triggers. Two skills whose descriptions both plausibly match "working with the API layer" will compete unpredictably, and which one actually fires can depend on wording you don't control. Narrow the descriptions until they don't overlap.
- Docs and skills drifting apart. A skill that duplicates content already living in your team wiki will eventually contradict it. Point
references/at the source of truth, or generate it from the same place, rather than copy-pasting a snapshot that goes stale. - Prose where a script belongs. If the procedure says "count the lines changed and flag anything over 500," that's three lines of shell, not three lines of instructions for the model to carry out by hand every single time.
Skills are not a place to hide secrets
scripts/ or references/ can end up quoted back in a transcript once loaded. Treat a skill directory the same way you'd treat a repo committed to version control — no embedded credentials, no customer data, nothing you wouldn't want printed in a log.Worked example: a migration-review skill, end to end#
The skeleton above is easy to nod along to in the abstract. Here is the same skill built all the way out — a real reference file, a script that actually runs and actually fails sometimes — and then a trace of what happens the first time a PR trips it.
The reference file the script can't replace#
A script can tell you whether a down migration exists. It can't tell you whether a given ALTER TABLE holds a lock for the two seconds a small table takes or the twenty minutes a hot one does — that judgment lives in a reference file the SKILL.md sends the agent to only once the mechanical check has already passed.
# Lock duration checklist
Before approving an ALTER TABLE on a table with more than ~100k rows:
- Adding a column: only safe without a table rewrite if it has NO default and is nullable. A NOT NULL DEFAULT column forces a full rewrite on Postgres < 11 and on MySQL without INSTANT ADD COLUMN support.- Adding an index: must use CREATE INDEX CONCURRENTLY (Postgres) or ALGORITHM=INPLACE, LOCK=NONE (MySQL). A plain CREATE INDEX holds a write lock for the duration of the build.- Dropping a column: safe to run directly, but coordinate with the app deploy — the previous app version must already have stopped reading it.- Renaming a column: never do it in one migration. Add the new column, backfill, dual-write, then drop the old one in a later migration.
If none of the above applies, say so explicitly in the review rather thanleaving the section blank — an empty checklist reads as "not checked,"not "not applicable."The script: a computation, not a paragraph#
"Check whether a down migration exists and actually does something" is a file-system lookup and a string check, not a judgment call. Writing it as a script means the answer is the same whether the agent is careful or rushed, on turn one or turn forty.
#!/usr/bin/env python3"""Exit 0 if every *.up.sql passed on argv has a matching *.down.sqlthat isn't a no-op. Exit 1 and print each offending file to stderr."""import reimport sysfrom pathlib import Path
def down_path_for(up_path): return up_path.with_name(up_path.name.replace(".up.sql", ".down.sql"))
def is_meaningful(sql_text): body = re.sub(r"--.*", "", sql_text).strip() return len(body) > 0 and body.lower() not in {"select 1;", "-- noop"}
def main(argv): failures = [] for raw in argv: up_path = Path(raw) if not up_path.name.endswith(".up.sql"): continue down_path = down_path_for(up_path) if not down_path.exists(): failures.append(f"{up_path.name}: no down migration found ({down_path.name})") elif not is_meaningful(down_path.read_text()): failures.append(f"{down_path.name}: exists but is empty or a no-op")
if failures: for failure in failures: print(f"FAIL: {failure}", file=sys.stderr) return 1
print("PASS: every migration has a real down migration", file=sys.stdout) return 0
if __name__ == "__main__": sys.exit(main(sys.argv[1:]))Running it against a real PR#
PR #482 adds a NOT NULL column to orders with no default, in db/changes/0047_add_customer_id.up.sql. There is no matching .down.sql file. Here's what actually happens, mapped onto the four-stage lifecycle from earlier in this post.
- Discovery. The agent scans every installed skill's frontmatter, including
database-migration-review, as part of picking up the PR. - Trigger matching. The task touches a file under
db/changes/, which is exactly the noun phrase in the skill's description — it matches, and the skill loads. - Load.
SKILL.md's body loads; step 1 tells the agent to runscripts/check_reversibility.pybefore doing anything else. - Execute. The script exits 1 with
FAIL: 0047_add_customer_id.up.sql: no down migration foundon stderr. Because the migration also adds aNOT NULLcolumn, step 2 sends the agent toreferences/backward-compat.md, and because it's anALTER TABLE, step 3 sends it toreferences/lock-duration.md.
The review the agent posts is built from the template in step 4, filled in with what the script and the two reference files actually found — not a summary the model reconstructed from memory of what a good migration review usually says:
## Migration review — db/changes/0047_add_customer_id.up.sql
- Reversibility: FAIL — no db/changes/0047_add_customer_id.down.sql found- Lock duration: FAIL — NOT NULL column with no default forces a table rewrite on this table size; add nullable + backfill + a follow-up migration to enforce NOT NULL instead- Backward compatibility: FAIL — the previous app version doesn't set customer_id and will fail its next insert once this lands
Blocking. See references/lock-duration.md for the two-step column pattern.How this fails in practice#
Most of the ways a skill quietly stops earning its keep aren't exotic — they're the same handful of failures showing up on a different skill each time. Here's what each one actually looks like from the outside, and what's usually behind it.
The description reads well but never fires#
You wrote a perfectly reasonable sentence — "Helps review database changes before they ship" — and three PRs later that touch db/changes/ go through without the skill ever loading. The description reads fine to a person skimming a docs page, but a matcher isn't skimming for tone, it's comparing the current task against concrete nouns. "Database changes" doesn't contain the path, the file extension, or the word "migration" that would actually tie it to the task in front of the agent. The fix is almost always mechanical: replace the vague phrase with the literal directory, file pattern, or trigger condition a matcher can compare directly, the same way you'd replace a vague variable name with one that says what it holds.
The wrong skill fires instead#
A second skill, written for API-layer reviews, also mentions "changes to the data layer" somewhere in its description. A migration PR comes in, and the API-review skill fires instead of (or alongside) the migration-review skill, because both descriptions plausibly cover the same task and nothing in either one rules the other out. The symptom is a review that checked the wrong things with total confidence. The cause is two descriptions with overlapping territory and no negative case in either. The fix is to add the sentence that says what the skill is not for — "does not apply to read-only reporting queries" earns its place in a description exactly as much as the positive trigger does.
The script's signal gets ignored#
The reversibility script above exits 1 and writes to stderr, and the agent still reports the migration as reviewed and clean. Usually that means the script wasn't actually wired the way SKILL.md describes — it prints its verdict to stdout buried among setup logs instead of a clean pass/fail line, or it exits 0 unconditionally and leaves the real result to be inferred from output the agent has to parse itself. An unambiguous signal only stays unambiguous if the exit code and a short, specific stderr message are the only things the agent has to read to know what happened — anything more and you've reintroduced the same interpretation problem a script was supposed to remove.
References drift from the source of truth#
Six months in, references/lock-duration.md still says Postgres < 11 needs a rewrite for a defaulted column, the team upgraded to Postgres 16 a year ago, and the skill keeps flagging migrations that are actually fine. Nobody updated the reference file because nobody thought of it as documentation that needed updating — it read as a one-time checklist, written once and forgotten. Point reference files at something that already has an owner and a review cadence — a generated doc, a linked wiki page, a file that's part of the same PR process as the schema itself — rather than a standalone snapshot that only gets touched when the skill misfires loudly enough for someone to notice.
Design decisions: sizing and splitting a skill#
How long should a description actually be#
Every installed skill's frontmatter gets scanned on every task, whether that skill ends up relevant or not — so the description is a cost paid in aggregate, not a cost paid only when it matches. Treat it like a function signature, not a paragraph: one or two sentences, built from the concrete nouns that actually appear in the task (file paths, table names, command names), plus one sentence stating the case it deliberately doesn't cover. A description that reads like marketing copy — "comprehensive database change review" — costs the same tokens as a precise one but gives the matcher nothing to compare against. Longer than three or four sentences and you're probably duplicating what belongs in the body of SKILL.md, which only loads once the match has already happened.
One skill, two references — or two skills?#
The question that actually decides this: do the two pieces of content share the same trigger, or do they trigger on different things? lock-duration.md and backward-compat.md both apply to the exact same event — any file changing under migrations/ or db/changes/ — they're depth under one trigger, so they stay as two references inside one skill. A hypothetical seed-data-review skill, even though it's topically adjacent (still "databases"), triggers on a completely different condition — changes under seeds/, not migrations/ — so it earns its own skill with its own description, rather than becoming a third reference bolted onto a description that would now have to cover two unrelated trigger conditions at once. The rule of thumb: split when the trigger differs, consolidate when only the depth differs.
Skills in Noddle Deck packs#
You don't have to write your first skill from scratch to feel the difference progressive disclosure makes. Every Noddle Deck persona pack ships a curated set of skills alongside slash commands, already structured the way this post describes — entry point, references, scripts, each scoped to a real task for that role. Install one and they land in ~/.claude/skills/noddle-deck-{pack}/, ready for Claude Code to discover on the next session.
noddle-deck pack install developerFrom there, opening a skill file in the pack is the fastest way to see the anatomy above applied to something real — and a reasonable template for the next skill you write for your own team.