Vol. I · No. 1
A field manual for the AI-collaborating engineer
← Autonomy
A Practitioner's Compendium

The Art of Vibe Coding

A complete and comprehensive guide to building software in collaboration with artificial intelligence — methodically, securely, and with taste.

By Majid Mazouchi

Four Books · Forty-Two Chapters Field-Tested Practices Six Reference Appendices
Chapter the First

What Is Vibe Coding?

On the practice of collaborating with an artificial mind to build software.

Vibe coding is the practice of building software in continuous dialogue with an AI assistant — describing intent in natural language, letting the model generate, then steering, refining, and verifying the result. The term was coined informally to describe a mode of work in which the human sets vibe and direction while the machine handles much of the keystroke-level production.

It is not, as the name might suggest, an excuse for sloppiness. The best vibe coders are more disciplined, not less, than traditional engineers. The AI is fast but credulous; it will happily produce a confidently-wrong answer, hard-code a secret, or grow a file to two thousand lines because it has no instinct for restraint. Your job is to supply that instinct.

Think of yourself as a director rather than a typist. You are still responsible for the architecture, the correctness, and — above all — the consequences. The AI is a remarkably capable collaborator, but it is a collaborator, not an oracle.

Vibe coding rewards the engineer who knows what good looks like. A guiding principle
Chapter the Second

The Vibe Coder Mindset

Attitudes and dispositions that distinguish the practitioner from the dabbler.

Before any technique, there is a posture. Vibe coding is less a set of tricks than a way of relating to the machine — one part patient teacher, one part skeptical reviewer, one part curious experimenter. Internalize these dispositions and the rest of the manual will feel like common sense.

Cardinal Dispositions
  • Treat the AI as a fast junior engineer. Capable, willing, sometimes brilliant — but in need of supervision, context, and review.
  • Stay in the loop on every important decision. Architecture, dependencies, data models, security boundaries: read every line that touches these.
  • Distrust confidence. The model's tone is uncorrelated with its correctness. Verify, especially when it sounds most sure.
  • Reward small wins. Short loops — small task, run, verify, commit — beat heroic mega-prompts every time.
  • Stay curious about why. If the AI fixes a bug, understand the fix before moving on. Magic accumulates as debt.
  • Be willing to start over. A bad direction compounds. Cheap experiments, ruthless deletes.
The fastest vibe coders I know are slow on purpose. They re-read every diff. They commit before they would have to. They ask the model to explain its own work. The speed comes from never having to back out of a hole.
Chapter the Third

The Instruments of the Trade

A survey of the major AI-assisted coding tools, and when each is the right choice.

The landscape evolves quickly, but the major categories are stable. A serious practitioner keeps at least two of these in active rotation — one as a primary, one as a sanity check.

i.General-Purpose Agents


Claude Code Agent
Terminal-native agent with strong reasoning, long-context comprehension, and excellent multi-file edits. Best for substantial refactors and architectural work.
Cursor IDE
A VS Code fork with deep, low-friction AI integration. Excellent for iterative editing where the model needs to see your cursor and selection.
GitHub Copilot IDE
Mature, enterprise-friendly inline assistant. With the Agent mode and Chat, increasingly competitive for end-to-end tasks. Strong in regulated environments.
Windsurf IDE
Cascade-based agent IDE with strong autonomous editing. Good middle ground between Cursor's interactivity and Claude Code's autonomy.
Codex / OpenAI CLI
OpenAI's coding agent stack. Useful when you want the GPT family's particular tendencies and reasoning style.
Gemini Code Assist IDE
Google's offering, with very long context windows and tight Google Cloud integration. Strong for data and infrastructure tasks.

ii.Frontend-Focused Generators


v0 by Vercel UI
Generates production-grade React + Tailwind + shadcn components from prompts or images. Ideal for spinning up polished UIs quickly.
Lovable App
Full-stack app builder. Good for going from idea to running prototype in an afternoon, especially for non-engineers and proofs of concept.
A practical rule: use a generator for the first ninety percent of the UI, then move the code into a real IDE with a real agent to handle the last ten percent, where every design decision is one you actually have to live with.
Chapter the Fourth

Plan Before You Code

The discipline of thinking in phases — and why it pays off twice.

The single greatest difference between a productive vibe coding session and a frustrating one is whether you took fifteen minutes to plan. Without a plan the AI will produce something — it always does — but the something will rarely be what you wanted, and the cost of redirection grows with every line written.

A plan does not need to be a document. It can be a numbered list in a scratch file, a CLAUDE.md, or a conversation with the model in which you ask it to help you plan before generating a single line of production code.

Planning Practices
  • Define the MVP — the smallest thing that would be useful and complete.
  • Break the work into phases: data model, core logic, UI, polish, deployment.
  • Work step by step; never ask the AI to build everything at once.
  • Give the model examples — mockups, reference files, sample inputs and outputs.
  • Consider Spec-Driven Development (SDD): write a specification first, then have the AI implement against it.
Example · Planning Prompt copy and adapt
I want to build a small CLI tool that watches a directory of CSV files and emits a Parquet file whenever any source file changes. # Before writing any code: 1. Ask me clarifying questions about the data schema, performance requirements, and target platforms. 2. Propose an MVP scope and three subsequent phases. 3. Suggest a minimal tech stack with rationale. 4. Sketch the file/module layout. # Do not write code yet. We will iterate on the plan first.
Spec-Driven Development is not bureaucracy. A two-paragraph spec, written before the code, will save you two hours of arguing with the model about what you meant.
Chapter the Fifth

Tech Stack & Coding Standards

Why your first ten files set the tone for every file that follows.

AI models are pattern-matching engines. Whatever style they encounter in your codebase early, they will reproduce — amplify, even — in every subsequent file. This means two things: pick a tech stack the model knows well, and review the first few outputs with surgical care.

Stack & Style Practices
  • Pick a popular, well-documented stack rather than a fashionable niche one. The model has read more about React than about your favorite obscure framework.
  • If you have style or architecture preferences, document them in a CLAUDE.md, .cursorrules, or equivalent.
  • Ask the AI to keep code modular; aim for small files and small functions.
  • Regularly request refactoring passes — extract modules, remove dead code, improve names.
  • Adopt useful skills or prompts created by others rather than reinventing every wheel.
Habit to avoid

Accepting the model's default tendency to grow one giant app.py or index.tsx because each addition "fits." By file 600 nothing is testable.

Habit to cultivate

Asking, every few sessions: "Review the project structure. Suggest where files should be split, what is duplicated, and what can be deleted."

Establishing standards early is not pedantry — it is leverage. Every habit you let the model form on day one will appear in every file forever after.
Chapter the Sixth

Security Best Practices

On the things the AI will cheerfully do that you must absolutely prevent.

Of all the categories in this manual, this is the one where a single oversight has the largest blast radius. AI models, left unsupervised, will hard-code secrets, log sensitive data, write unsanitized SQL, and leave authentication as an exercise for the reader. They do these things not from malice but from helpfulness — they want to make the example runnable.

Non-Negotiables
  • Never let credentials touch source code. Use environment variables, secret managers, or a vault — never string literals.
  • Run an explicit security audit on the codebase periodically. Ask the AI: "Audit this project for security issues: hard-coded secrets, injection risks, missing auth checks, unsafe deserialization, CORS misconfiguration."
  • Stop the model the instant you see API_KEY = "sk-..." in a diff. Redirect it to os.environ or equivalent.
  • Treat any code that handles user input as suspect until you have personally reviewed validation and escaping.
  • Be deliberate about dependencies. AI is happy to install whatever it remembers; you must check what is actually pulled in.
Example · Catching a hardcoded secret in flight stop and redirect
// What the AI wrote: const client = new OpenAI({ apiKey: "sk-proj-abc123..." }); // What you immediately ask for instead: const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); // Then: add OPENAI_API_KEY to .env, .env.example with a placeholder, // and ensure .env is in .gitignore. Verify with `git status`.
Chapter the Seventh

The Art of Prompting

Composing requests that the model can actually fulfill.

If planning is the strategy of vibe coding, prompting is its tactics. A well-formed prompt eliminates a class of failure modes before they occur. A vague one invites the model to fill in your unstated assumptions with its own — usually badly.

Prompting Practices
  • One task at a time. A prompt with five requests will be served at five percent quality each.
  • Be specific. Replace "make it better" with "extract the validation logic into a separate module, add type hints, and write three unit tests."
  • State what NOT to do, especially based on past failures: "Do not add try/except blocks that swallow exceptions. Do not introduce new dependencies."
  • Give materials: mockups, reference files, sample data. Models reason better from concrete inputs than abstract description.
  • Use "act as" framing for unfamiliar domains: "Act as a security reviewer and read this auth flow."
  • Keep your context document current. Update CLAUDE.md as the project's conventions evolve.
  • For hard problems, ask the model to think first: "Before writing any code, brainstorm three approaches and their trade-offs."
Vague prompt

"Improve this function."

The model will pick any of a hundred axes of "improvement" — and probably not the one you cared about.

Surgical prompt

"Reduce this function's cyclomatic complexity by extracting the three validation branches into named helpers. Preserve behavior exactly. Add a unit test for each branch."

The best prompts read like work orders for a careful colleague. The worst read like wishes.
Chapter the Eighth

Managing Context

On knowing what the model knows — and what it has forgotten.

Context is the working memory of the model. It is finite, expensive, and easily polluted. Managing it deliberately is one of the highest-leverage skills in vibe coding. Most session-long frustrations — the model contradicting itself, repeating earlier mistakes, forgetting which file you are in — are context-management failures.

Context Practices
  • Use a long context window when the task genuinely requires it — large refactors, cross-file reasoning — but do not load it speculatively.
  • If the AI fails three times on the same problem, stop. Start a fresh chat. The context is poisoned; more turns make it worse, not better.
  • When switching to an unrelated task, proactively start a new session.
  • Ask the model to use subagents when available — small, focused contexts beat one bloated one.
  • After a substantive session, ask the model to summarize what was learned and update CLAUDE.md.
  • Clear context regularly. It costs tokens to keep stale information, and it costs quality.
Example · A minimal CLAUDE.md scaffold project root
# Project: Real-time motor diagnostics dashboard ## Stack - Python 3.11, FastAPI, asyncio - Postgres 15 with TimescaleDB extension - React 18 + TypeScript + Vite, Tailwind, shadcn/ui ## Conventions - All async functions use the suffix _async. - DB access goes through repositories/, never directly from routes/. - All endpoints return Pydantic models, never raw dicts. ## Things NOT to do - Do not introduce ORMs other than the existing SQLAlchemy setup. - Do not add try/except that swallows exceptions silently. - Do not write tests using unittest; this project uses pytest. ## How to run - Backend: `uvicorn app.main:app --reload` - Frontend: `pnpm dev` in /web - Tests: `pytest -q` and `pnpm test`
Chapter the Ninth

Debugging With AI

The model is a powerful debugger — provided you do not accept fixes you do not understand.

Debugging is where vibe coding can either save you hours or quietly accumulate technical debt that detonates later. The model is genuinely good at reading stack traces, hypothesizing causes, and proposing fixes. It is also good at making symptoms go away without addressing causes — and that is the trap.

Debugging Practices
  • Paste the error, the relevant code, and the command that produced it. Then let the model reason.
  • If a fix does not work after a turn or two, ask the model to list possible causes rather than guess at more fixes.
  • For elusive bugs, instruct the AI to add logging first, reproduce, then diagnose from the logs.
  • Install and ask the model to use MCP servers (e.g. Playwright for browser issues, filesystem MCPs, database MCPs) for problems that need real-world inspection.
  • Never accept a fix you do not understand. Ask: "Explain in simple terms what was wrong and why this fixes it."
Example · A diagnostic prompt for a stubborn bug when "fix it" stops working
Here is the error: <paste full traceback> Here is the function involved: <paste minimal reproduction> # Do NOT propose a fix yet. 1. List five plausible root causes, from most to least likely. 2. For each, describe one quick experiment that would confirm or refute it. 3. Tell me which experiment to run first and what output to look for.
A fix you do not understand is a bug that has changed its hiding place. Make the model teach you what it learned.
Chapter the Tenth

Mastering Version Control

Git is the safety net beneath the trapeze. Use it lavishly.

The pace of AI-assisted coding makes traditional Git hygiene more important, not less. The model can produce a hundred lines in a minute; if any of those lines are bad, you need to know exactly which working state you were in before. Git is the only mechanism that lets you experiment fearlessly.

Version Control Practices
  • Commit after every working slice — a passing test, a fixed bug, a finished feature. Not every hour; every success.
  • Ask the AI to write the commit message. It is good at this. Specify your style (Conventional Commits, sentence case, < 72 chars on the subject line).
  • Start each significant new feature on a clean branch. Branches are free.
  • When something goes wrong, revert with Git — not by asking the AI to "undo." Git is deterministic; AI memory is not.
  • Delegate routine Git and GitHub CLI tasks to the AI: PR descriptions, changelog drafts, rebase choreography. Review before pushing.
Example · A standing instruction in CLAUDE.md commit discipline
## Git workflow After each completed unit of work: 1. Run the tests. Do not commit if anything fails. 2. Stage only the files relevant to this change. 3. Propose a commit message in Conventional Commits format (feat:, fix:, refactor:, docs:, chore:, test:). 4. Show me the diff and the message before committing. Never commit secrets, .env files, or anything in /scratch.
Chapter the Eleventh

Testing as a Practice

The most underused superpower of AI-assisted coding.

Left to its defaults, an AI agent will write code first and tests later, if at all. This is precisely backwards for the kind of fast, iterative editing that vibe coding encourages. Tests are not bureaucracy — they are the only way to know, at velocity, that you have not just broken something.

Testing Practices
  • Force tests by default. When the AI writes a feature, require that it also writes basic tests in the same turn.
  • Lean toward end-to-end tests for product-level confidence and unit tests for hot logic. Both matter.
  • Consider Test-Driven Development as a vibe-coding amplifier: "Write a failing test for behavior X. Show me. Then implement until it passes."
  • When you find a bug, ask the AI to write a failing test that reproduces it before any fix is applied.
  • Once tests are in place, refactor freely. Tests are what make refactor sessions cheap.
Example · TDD prompt loop red, green, refactor
# Round 1 — Red "Write a failing pytest test for a function `parse_motor_log(path)` that returns a list of dicts with keys: timestamp, rpm, current_a, current_b, current_c. Do not implement the function yet. Run the test; confirm it fails." # Round 2 — Green "Now implement parse_motor_log minimally so the test passes. No extra features." # Round 3 — Refactor "Refactor for readability. Extract helpers. Add type hints. Re-run tests."
The model has no instinct to write tests unprompted. Until you make this instinct yours and pass it on through standing instructions, you will accumulate untested code at frightening speed.
Chapter the Twelfth · End of Book the First

Putting It All Together

A worked workflow for a single feature, from intent to deployment.

The chapters above can read like a checklist. In practice, they compose into a rhythm. Here is one such rhythm — not the only one, but a good default for serious work.

i.Open the session deliberately


Start a fresh chat. Confirm the AI has read CLAUDE.md (or equivalent). State the goal of the session in one sentence.

ii.Plan in dialogue


Ask the model to propose an approach, list assumptions, and identify open questions. Resolve the questions. Lock the plan.

iii.Work in small slices


One task at a time. After each: run it, eyeball the diff, commit if it works. If three attempts fail, stop, summarize what you tried, and restart with a fresh context.

iv.Test as you go


Before considering a slice done, require tests. For bugs, require a failing test first.

v.Refactor and audit periodically


Every few sessions: ask for a refactor pass, a security audit, and a context-document update. These are the compound-interest activities.

vi.Close the session deliberately


Ask the model to summarize what was done, what is left, and what the next session should pick up. Save that into a NOTES.md or directly into CLAUDE.md. The next session will start with momentum instead of confusion.

The vibe coder's edge is not speed; it is the compounding of small disciplines, exercised at every turn. Closing maxim

Read this manual again in a month. The parts that seemed obvious will have become muscle memory; the parts that seemed fussy will be the ones you wish you had adopted earlier. The medium is new, but the underlying craft — taste, discipline, skepticism, curiosity — is the same one engineers have always practiced. The AI does not replace it. It rewards it.

Book the Second

From Prototype
to Production

✦ ✦ ✦

The first book taught you to work with the AI. The second teaches you to ship with it.

There is a wide chasm between a working prototype and a serious software product. On the production side of that chasm live the unglamorous arts: architecture decisions that survive the next refactor, dependencies you can audit, errors you can recover from, deploys you can roll back, and documentation a teammate can read at three in the morning when something has broken.

The AI accelerates the prototype side enormously. It does not, by itself, cross the chasm. The chapters that follow are the practices that do.

Chapter the Thirteenth

Architecture & System Design

On using the AI as a sounding board rather than an oracle.

Architecture is the set of decisions that are expensive to change later. Pick the wrong web framework and you swap it in a week; pick the wrong data store and you fight it for a year. These are precisely the decisions where the AI's confident, fluent prose is most dangerous — because the model has no skin in the game when the consequences arrive eighteen months from now.

Treat architectural questions as conversations with the AI, not as requests. Force the model to enumerate options, weigh trade-offs, identify failure modes. Then make the decision yourself, write it down, and version-control that decision so the team — including future you — can remember why.

Architectural Practices
  • Sketch before you code. Ask the AI to propose two or three architectures and compare them on latency, complexity, cost, and team familiarity.
  • State non-functional requirements early. Latency budgets, availability targets, data residency, audit needs — these reshape architecture more than features do.
  • Write Architecture Decision Records. One file per significant decision: context, options, decision, consequences. Keep them in the repo.
  • Make the AI argue against its first answer. "Now critique that design as a sceptical principal engineer. What breaks first under load?"
  • Diagram in code. Mermaid, PlantUML, D2 — text-based diagrams that version-control naturally and survive renames.
  • Postpone distributed-systems complexity until you have measured the actual need. Most products are not yet at the scale that justifies microservices.
Example · Architecture exploration prompt two or three options, with teeth
I need to design the data layer for a system ingesting ~5M sensor readings per day from ~1000 vehicles. Queries: real-time per-vehicle dashboards, daily fleet aggregates, weekly trend analysis. Propose three architectures: A. Single Postgres + TimescaleDB B. Postgres (hot) + S3 + DuckDB (cold tier) C. Fully managed columnar store (e.g. ClickHouse Cloud) For each, give me: - Estimated cost at this scale and at 10x - Operational complexity (who pages at 2am) - Failure modes and recovery procedures - What changes when we double write volume Then critique each as a sceptical principal engineer.
The AI is a brilliant whiteboard partner. It is not a chief architect. The difference is accountability for what happens after the meeting.
Chapter the Fourteenth

Reviewing AI-Generated Code

The diff discipline that separates a prototype from a product.

The model produces code that looks right. Looking-right is not the same as being-right. The single most important discipline in serious vibe coding — harder than prompting, harder than planning — is to actually read every line the AI proposes, with the same scrutiny you would bring to a stranger's pull request. The cost of skimming compounds invisibly. By the time you notice the bug, it is wearing camouflage from a dozen subsequent edits.

Skim is your enemy. The AI will produce a 200-line diff in ten seconds; if you scan it in thirty, you will catch only what is obviously broken. The subtle problems — a swallowed exception, a fabricated library function, a unit test that asserts nothing meaningful — pass through. These are not edge cases. They are the dominant failure mode of unsupervised AI coding.

Review Practices
  • Read every line before merging. There is no shortcut, and the search for one is how production incidents are born.
  • Keep a personal "tells" list — the recurring bad habits the model exhibits in your codebase. Spot-check for them.
  • Watch specifically for: bare try/except, fabricated function or API names, dead code, magic numbers, over-abstraction, tests that always pass regardless of behavior.
  • Use the AI for second-pass self-review in a fresh context: "Review this diff as a sceptical senior engineer. List concerns by severity."
  • Verify claims. If the model says "this improves performance," run the benchmark. If it says "all tests pass," run them yourself.
  • No exemptions for team work. AI-generated code goes through the same PR process as human-written code.
The skim

Glance at the diff. Notice the obvious shape is right. Click merge. Discover three weeks later that a test file was modified to make a failing test pass instead of fixing the bug.

The review

Read every file. Run the tests yourself. Question the surprising changes. Ask the model, in a fresh chat, to find issues with its own diff. Then decide.

A code review that takes ninety seconds is not a code review. It is a vote of confidence in the model. Those votes accumulate into incidents.
Chapter the Fifteenth

Documentation as Infrastructure

The half of the codebase that everyone forgets.

Documentation is the part of a serious software product that is invisible until it is missing — at which point everything else slows down by a factor of three. A teammate joining the project, an on-call engineer at two in the morning, a future version of yourself revisiting code from six months ago: each of these consumers depends on documentation no one was paid to write.

The AI changes this economy entirely. Documentation is now nearly free to produce — provided you ask. A README that gets a new contributor running in ten minutes, an ADR for each significant decision, an API reference generated from the code: minutes of work, not days. What remains is purely a matter of discipline.

Documentation Practices
  • A README that lets a new contributor clone, install, and run within ten minutes. Test this on a fresh machine periodically.
  • Architecture Decision Records (ADRs): one short file per significant decision. Context, options considered, decision, consequences.
  • API documentation generated from code (OpenAPI, docstrings, JSDoc) — not hand-written and drifting out of sync.
  • Runbooks for operational tasks: deploy, rollback, restore from backup, rotate credentials, handle the common incidents.
  • Inline comments for why, not what. Code shows what; comments explain motivation, constraints, and surprises.
  • Have the AI update docs as part of every change. Stale documentation is actively worse than no documentation.
Example · A minimal ADR template docs/adr/0007-event-bus.md
# ADR 0007: Use a single Postgres LISTEN/NOTIFY channel for the event bus ## Context We need pub/sub between three internal services. Volumes are low (< 100 events/sec). We already operate Postgres in production. ## Options considered 1. Redis Streams — new dependency, new ops surface. 2. NATS — feature-rich, but adds a cluster to monitor. 3. Postgres LISTEN/NOTIFY — reuses existing infrastructure. ## Decision Option 3. Re-evaluate if event volume exceeds 1k/sec sustained. ## Consequences - No new infrastructure to operate. - Limited message durability; consumers must be idempotent. - Schema for events lives in shared types/ package.
The best documentation is the documentation that exists when you need it. Cheap and present beats elegant and absent.
Chapter the Sixteenth

Error Handling & Resilience

On the failure modes the AI will hide if you let it.

The model has a strong default: make the example runnable. This default is at war with serious error handling. Left to its instincts, the AI will wrap risky code in bare try/except blocks that swallow exceptions silently, return None on failure, or — worse — print a friendly message and continue as if nothing had happened. Each of these patterns transforms a recoverable error into an invisible bug.

Resilient code distinguishes among three categories of error: those it can handle locally, those it should propagate, and those that indicate a violated invariant and warrant a controlled crash. The AI will not make these distinctions unless you make them explicit.

Resilience Practices
  • Categorize errors deliberately: handled, propagated, fatal. Different categories deserve different code paths.
  • Never accept a bare except clause that swallows exceptions. Replace immediately with specific exception types and explicit handling.
  • Timeouts on every operation that crosses a process or network boundary. No timeout is itself a bug.
  • Retries with exponential backoff and jitter — and only for operations you know to be idempotent.
  • Circuit breakers for unreliable external services. Fail fast when the downstream is sick.
  • Idempotency keys for state-changing operations called by clients that may retry.
  • Graceful degradation paths. When the recommendation engine is down, show defaults, not a stack trace.
Example · Replacing a swallowed exception before / after
# What the AI produced (don't accept): try: result = fetch_remote_data(vehicle_id) except Exception: result = None # silent failure, downstream code now lies # What you ask for instead: try: result = fetch_remote_data(vehicle_id, timeout=5.0) except RemoteTimeoutError as e: logger.warning("remote_timeout", vehicle_id=vehicle_id, err=str(e)) raise ServiceDegradedError("vehicle data temporarily unavailable") from e except RemoteAuthError: # auth failure is a config bug, not a user-facing condition raise
"Quietly continuing" is the most expensive habit in software. Every silent failure is a future debugging session you have agreed to fund.
Chapter the Seventeenth

Observability

You cannot fix what you cannot see.

A serious software product must be inspectable from outside. When something goes wrong — and something always does — your ability to diagnose without rebuilding, redeploying, or re-running depends entirely on what the system was emitting at the time. Observability is not glamorous. It is what makes two in the morning survivable.

The AI is helpful here but tends to under-instrument: a print statement here, a log line there, no structure, no levels, no correlation IDs. Production-grade observability has to be specified up front, or it will never appear at all.

Observability Practices
  • Structured logs from the start. JSON, with fields, never f-strings shoved into a stream.
  • The four golden signals for any service: latency, traffic, errors, saturation. Instrument all four.
  • Error tracking with full stack traces and request context (Sentry, Rollbar, or equivalent). Stack traces without context are riddles.
  • Distributed tracing for any service-to-service call. Propagate correlation IDs everywhere.
  • Health endpoints that mean something. A 200 from /health should imply the service can actually serve requests, not merely that the process is alive.
  • Add observability as part of each feature, not as cleanup. Otherwise it never happens.
Example · Structured logging pattern enforce in CLAUDE.md
# Standing instruction: All log lines use the project's structured logger and include: - request_id (propagated from request context) - user_id when authenticated - the operation name - relevant identifiers (vehicle_id, order_id, etc.) - duration_ms for operations of interest Never use print(). Never log raw exception objects without a message. Never log secrets, tokens, PII, or full request bodies.
The instrumentation you add when the system is healthy is the instrumentation you have when it is not. There is no retroactive observability.
Chapter the Eighteenth

Performance & Profiling

On measuring before optimizing — and the AI's enthusiasm for the reverse.

The model is eager to optimize. Ask it to make code faster and it will produce a confident diff with no measurements behind it. Some of those diffs are correct. Some are noise. A few are net-negative. Without a baseline, you cannot tell which.

The discipline of performance work is unromantic: measure, hypothesize, change, measure again. The AI can accelerate each step, but it cannot replace the measurement. A "faster" function without a benchmark is a guess in a fluent voice.

Performance Practices
  • Always measure first. Profile before you guess. cProfile, py-spy, perf, browser performance panels, flamegraphs — pick the right tool and use it.
  • Keep a benchmark suite for hot paths. Run it in CI. Catch regressions before users do.
  • Load tests for user-facing systems. Not once. As part of release readiness.
  • The AI is good at micro-optimizations (rewriting tight loops) and weak at systemic ones (rearchitecting around N+1 queries). Use it accordingly.
  • Beware premature distributed-systems complexity. Many "scaling" problems are inefficient queries on a single node.
  • Refuse "this will be faster" claims without numbers.
Example · A profile-first performance prompt numbers before opinions
The endpoint /reports/daily is taking 4–7 seconds. I have a py-spy profile attached and the relevant code. Do NOT propose optimizations yet. 1. Read the profile. Identify the top three time sinks by total %. 2. For each, state the likely root cause based on the code. 3. Propose one experiment per root cause that would confirm it without changing behavior — e.g. adding a counter, capturing a query plan, varying input size. 4. Rank the three by expected impact-per-effort. Only after we agree on root cause will we discuss fixes.
Optimization without measurement is decoration. The system may look improved; only the benchmark knows.
Chapter the Nineteenth

Dependencies & Supply Chain

Every import is a long-term relationship.

The AI loves dependencies. Ask it to parse a CSV and it reaches for pandas; ask for a date and dateutil appears; ask for HTTP retries and a new library is suggested. Each of these is a long-term relationship — with maintenance burden, security exposure, license obligations, and upgrade pain — that has been negotiated on your behalf in a tenth of a second.

A serious software product has a deliberate dependency posture. The standard library is preferred when reasonable. New dependencies are added only after the alternatives have been considered. The supply chain is audited, locked, and monitored.

Supply-Chain Practices
  • Vet every new dependency. Maturity, maintenance pace, license, transitive footprint, install size, last commit date.
  • Lock files committed and reviewed in every PR. A lockfile diff is not a rubber stamp.
  • Automated vulnerability scanning (Dependabot, Snyk, OSV-Scanner). Treat alerts as work, not noise.
  • License compliance audit, especially for code that ships externally. Some "free" licenses carry obligations.
  • Generate an SBOM (Software Bill of Materials) for production systems. Increasingly required by regulators and customers.
  • Prefer the standard library when the gain from a third-party package is small.
Every dependency is a small bet that someone else will keep maintaining their code. Place those bets sparingly, and only after reading the maintainer's recent activity.
Chapter the Twentieth

Data, Schemas & Migrations

Database changes are forever. Treat them accordingly.

Code can be rolled back. Data, with very few exceptions, cannot. A bad schema change applied in production becomes a permanent feature of your system, and every subsequent decision must accommodate it. This asymmetry deserves a level of caution that the AI, by default, does not bring.

Migrations should be small, reversible, and reviewed by a human who understands the production state. The AI is excellent at writing the migration; it is not the right party to decide whether to apply it.

Data Practices
  • Reversible migrations only. Every up has a down. If down is impossible, the change deserves an ADR.
  • Backward-compatible schema changes via the expand-contract pattern: add new column, dual-write, backfill, switch reads, drop old column. Never all at once.
  • AI proposes; human reviews and applies. Migration application is not an automation candidate for serious systems.
  • Backfills run as separate, idempotent, resumable jobs — never inside the migration itself.
  • Index thoughtfully. Every index speeds reads and slows writes. Both effects are permanent until removed.
  • Ask the AI for query plans (EXPLAIN ANALYZE) before accepting any "performance fix" on a query.
Example · Expand-contract migration plan renaming a column safely
# Goal: rename users.full_name → users.display_name without downtime Phase 1 (Expand): - Migration adds users.display_name as NULL-allowed. - Application writes both columns; reads from full_name. - Deploy. Verify dual-writes in production. Phase 2 (Backfill): - Separate job copies full_name → display_name in batches. - Idempotent, resumable, monitored for rate and errors. Phase 3 (Cut over reads): - Application reads from display_name. - Deploy. Monitor for missing-data errors. Phase 4 (Contract): - Application stops writing full_name. - After a settling period, migration drops full_name.
The most dangerous code in a serious system is a migration written confidently. Slow down; there is no rollback in the sense that applies elsewhere.
Chapter the Twenty-First

CI/CD & Automation

The pipeline is the guard rail of the production system.

Continuous integration is the smallest unit of trust in a serious software product. Every commit is asked: do you compile, do you pass tests, do you satisfy our standards? Code that cannot answer yes does not advance. This is not bureaucracy — it is the only mechanism that makes velocity safe.

The AI writes CI configurations, GitHub Actions workflows, Dockerfiles, and deployment scripts with fluency. What it cannot do is understand your specific operational context — your secret management, your compliance constraints, your existing tooling. Review accordingly.

Automation Practices
  • Every commit triggers lint, type-check, test, and build. Non-negotiable.
  • Pre-commit hooks for fast local feedback. Don't wait for CI to learn that you forgot to format.
  • Reproducible builds: pinned base images, locked dependencies, deterministic outputs.
  • Automated security scanning in CI: secret detection, vulnerability scanning, license checks, SAST.
  • AI can write workflow files; you must understand every step. CI bugs are deployment bugs in disguise.
  • Deploy on green only. A failed CI is a stop sign, not a suggestion.
A pipeline that always passes is not a successful pipeline; it is one that is no longer asking enough questions.
Chapter the Twenty-Second

Release Engineering

How serious software gets to users without breaking them.

A deploy is a moment of risk. Even a perfectly written change, applied to a healthy system, can interact with production conditions — load patterns, data shapes, dependency versions — that no test environment fully captures. The serious practice of release engineering treats every deploy as an experiment that could fail, and provides the means to detect and recover quickly when it does.

The AI can help draft feature flag scaffolding, canary configurations, and rollback scripts. The judgment of when to ship, how fast to ramp, and when to halt remains stubbornly human.

Release Practices
  • Feature flags for risky changes. Decouple deploy from release.
  • Canary releases: a small percentage of traffic first, ramp up only on green metrics.
  • Blue/green or rolling deploys for zero-downtime systems.
  • A documented rollback procedure for every deploy, rehearsed at least once before it is needed.
  • Dark launches for performance validation: run new code paths against production traffic without exposing results to users.
  • A post-deploy monitoring window. Stay watching for a defined period before declaring success.
The deploy is not the end of the feature. It is the start of the monitoring window.
Chapter the Twenty-Third

Working With Legacy Code

On the vast majority of real engineering work.

The romantic image of vibe coding — a developer and an AI conjuring a system from a blank repository — accounts for perhaps a tenth of professional software work. The other nine-tenths is brownfield: adding a feature to a fifteen-year-old codebase, fixing a bug in a service no one understands, refactoring a module written by someone who left the company. The AI is enormously useful here, in different ways than in greenfield work.

Use the model first to understand, then to change. Understanding-before-changing is the difference between a careful surgeon and an enthusiastic intern.

Brownfield Practices
  • Explain before edit. "Walk me through this function as if I am a new engineer. What is its contract? Who calls it? What is the riskiest part?"
  • Characterization tests first. Pin existing behavior with tests before refactoring. Then refactor with confidence.
  • Narrow, reversible changes. Resist the urge — and the AI's urge — to "improve" surrounding code.
  • Do not accept unsolicited refactors. If the model widens scope, narrow it back. Diffs grow; reviews shrink.
  • The Boy Scout Rule — leave it cleaner than you found it — applies in moderation. Not every visit warrants a renovation.
  • Document what you learn in the same PR. The next engineer is grateful.
Legacy code is not a problem to be solved. It is a system that has survived. Respect what it has been doing.
Chapter the Twenty-Fourth

Team Collaboration in the AI Era

On working with humans who are also working with machines.

A single developer with an AI assistant is a manageable system. A team of developers, each with their own AI assistant, is a different system entirely. Conventions diverge. Quality variance widens. Review burden shifts. The interpersonal contract of "we will all do approximately the same thing in approximately the same way" can quietly dissolve into incompatible styles, each one internally consistent.

The remedy is not less AI; it is more shared scaffolding. Conventions move into the repository. Prompts become team assets. Review becomes more important, not less.

Team Practices
  • Shared conventions in repository-level config: CLAUDE.md, .cursorrules, .github/copilot-instructions.md. Version-controlled. Reviewed in PRs.
  • Maintain a team prompt library — proven prompts for common tasks. Treat it like documentation.
  • Code ownership does not disappear because the AI wrote the code. The submitter owns it.
  • AI-generated PRs go through the same review as human PRs. No exemptions, no shortcuts.
  • Disclose AI assistance when it materially changes the review burden — large mechanical refactors, generated boilerplate, mass renames.
  • Pair-vibing for high-stakes work: two engineers, one session, four eyes on every diff.
A team's coding style is now downstream of its prompting style. Choose deliberately, write it down, and review it together.
Chapter the Twenty-Fifth

The Long Life of Software

Software outlives its writers. Plan for it.

A serious software product is not the version that ships on day one. It is the system that, three years later, is still running, still being modified, still being depended upon. Most of the cost of software is in this long tail. Most of the AI advantage shows up at the start.

To build software that lives long, build for the people who will maintain it — including future-you, who will have forgotten everything. This requires a kind of discipline that is unrewarded in the short term and indispensable in the long one.

Stewardship Practices
  • Maintenance budget from day one. Treat it as a first-class cost, not an oversight.
  • Model selection as economics. Token costs are a real line item. Use smaller models for routine work; reserve larger ones for hard problems.
  • Deprecation discipline. Sunset features deliberately, with notice, a migration path, and a deadline. Half-deprecated features rot the codebase.
  • Postmortems for significant incidents. Blameless. Written. Shared. Indexed. The library of past failures is one of the most valuable team artifacts.
  • Runbooks for operational tasks. Tested at least annually. A runbook no one has run is a runbook that does not work.
  • Do not build what you cannot operate. A clever system that no one understands is a future incident waiting for its trigger.
Software is a story you are writing for someone else to finish. Make it possible for them to read.
Book the Third

Beyond
the Code

✦ ✦ ✦

A serious software product does not live alone. It calls other systems and is called by them. It carries legal weight. It serves humans of different abilities, in different languages, under different regulations. It exists inside an ecosystem of tools, teams, and constraints that the code itself never sees.

The third book turns outward — to the wider concerns the practitioner must hold even when they are not, in the moment, typing code. Some of these chapters are about the world around the software. The last and most important is about the practitioner's own limits.

Chapter the Twenty-Sixth

AI Tool Orchestration

On choosing models, calling tools, and composing agents.

Early vibe coding was one human, one model, one chat. The mature practice is a small symphony: a powerful model for hard reasoning, a cheaper one for boilerplate, an agent that can run shell commands, sub-agents that work in parallel on isolated contexts, and MCP servers that grant the assistant access to your databases, file systems, browsers, or calendars. Composition is now its own skill.

The right composition depends on the task. A long refactor wants a capable model with a generous context window and shell access. A quick autocomplete wants a fast, cheap model in your editor. A research task wants a model that can browse, plus sub-agents to parallelize. Match the instrument to the work.

Orchestration Practices
  • Choose the model deliberately for each task. The most capable model is not the right model for everything.
  • Track cost per session. Reserve expensive models for hard problems; let cheaper ones handle the routine.
  • Use sub-agents to parallelize independent work and to isolate noisy context from the main thread.
  • Install MCP servers thoughtfully. Each grants the model real access — to your filesystem, your database, your browser, your calendar. Treat the permission like a production credential.
  • Audit what your agents can do. A development MCP with shell access can do everything you can. That is a feature and a risk.
  • Build skills for repeating workflows. A skill is a reusable bundle of instructions and tools; it pays back compounding interest.
  • Prefer fewer, well-chosen tools over many partial ones. Tool sprawl makes the agent's choices worse, not better.
The most overlooked skill in 2026 is not prompting; it is deciding which model, which tools, and which agents should touch a given task. The composition is the answer.
Chapter the Twenty-Seventh

API & Contract Design

On the parts of your software that other people's software depends on.

APIs are contracts. Once published, they constrain what you can change without breaking your consumers — whether those consumers are other teams, paying customers, or future versions of yourself. The AI will gladly write an API; it has no instinct for the long-term consequences of doing so casually.

Treat every endpoint, every event schema, every gRPC service definition as a versionable artifact with explicit compatibility guarantees. The AI is useful for drafting, testing, and generating client SDKs. It is not the right party to decide what to commit to.

Contract Practices
  • Version explicitly — URL path, header, or media type. Plan for v2 the day you ship v1.
  • Distinguish public, internal, and experimental APIs. Each has different stability promises and review burden.
  • Backward compatibility by default. Additions yes; removals only through deprecation cycles.
  • Contract tests. Pin the schema. Run against every PR. Surface breaking changes as PR failures, not production incidents.
  • Generate clients from the schema, never hand-write them. Drift between server and client is a recurring source of bugs.
  • For event-driven systems, a schema registry (Avro, Protobuf, JSON Schema) is not optional.
  • Document error responses with the same rigor as success cases. Consumers must know what they will see when things go wrong.
Every public API is a promise to the future. Make small promises. Keep them.
Chapter the Twenty-Eighth

Authentication & Authorization

On who can do what, and the AI's enthusiasm for skipping the question.

AuthN — who you are — and AuthZ — what you are allowed to do — are the two questions every real product must answer correctly for every request. The AI, in its eagerness to produce runnable examples, will gladly generate code that skips both: handing the prototype to you with a TODO where the auth check should be, or implementing auth so naively that it would not survive five minutes of adversarial scrutiny.

Authorization in particular is not a feature you add later. The shape of your permission model is one of the most consequential architectural decisions you will make. Get it right early; bolted-on auth is a permanent source of pain.

Auth Practices
  • Choose AuthN deliberately: sessions versus tokens (JWT, opaque), OAuth flows for third parties, passwordless options. Each has consequences.
  • Choose AuthZ deliberately: RBAC (roles), ABAC (attributes), ReBAC (relationships, Zanzibar-style). The model dictates the shape of much else.
  • Never roll your own crypto. Use vetted libraries. The AI will happily implement HMAC from first principles; do not let it.
  • Every endpoint declares its auth requirements. None defaults to open. Make the absence of a declaration a CI failure.
  • Audit authorization decisions: who accessed what, who changed permissions, what was attempted and denied.
  • Test the negative cases. What should be denied is as important as what should be allowed — and harder to remember to verify.
  • Rate-limit auth endpoints aggressively. Brute force is cheap; defense must be cheaper.
In any sufficiently large codebase, the authorization checks the AI forgot to add are statistically certain to include the one that matters.
Chapter the Twenty-Ninth

Compliance, Privacy & Regulated Software

When the law has opinions about your code.

A serious software product often operates in a regulated environment — GDPR for data handling, HIPAA for health, SOC 2 for enterprise customers, PCI-DSS for payments, ISO 26262 or DO-178C for safety-critical systems. These are not abstract documents. They impose specific requirements on what your code must do, what records you must keep, and what processes you must follow to be allowed to ship.

The AI is unaware of your regulatory context. It will not flag that the PII you just logged is reportable; it will not refuse to write code that stores credit card numbers; it will not insist on the audit trail your regulator expects. That awareness must come from you and from the conventions you encode.

Compliance Practices
  • Map regulatory obligations to specific code requirements. Write them into CLAUDE.md so the AI cannot accidentally violate what it does not know.
  • Data classification: know which fields are PII, PHI, financial, or otherwise restricted. Tag them in code and treat them accordingly.
  • Audit logs for every action that touches regulated data. Immutable, exportable, retained for the period the regulator requires.
  • Data residency: know where your data is allowed to live. Encode it into infrastructure, not into hope.
  • Right to erasure (GDPR Art. 17) requires deletion that actually deletes — backups, caches, derived data, log streams.
  • For safety-critical domains, AI assistance must respect the qualification process: tool qualification per the relevant standard, reviews by certified engineers, traceability between requirements, code, and tests.
  • When uncertain, ask counsel. The AI is not your lawyer, and neither is this manual.
Compliance is not paperwork. It is the shape your code must hold when the consequences of getting it wrong are larger than the engineering team.
Chapter the Thirtieth

Provenance, Licensing & IP

On where the code came from, and who owns it.

AI-generated code occupies a peculiar legal space. The model was trained on millions of repositories under thousands of licenses. The code it produces is, in most jurisdictions and most providers' terms, yours to use — but the exact contours are evolving, uneven across jurisdictions, providers, and use cases, and not always what intuition would suggest.

For serious products, AI output cannot be treated as identical to human output. There are practical, hygienic steps that protect the project without slowing it down meaningfully.

Provenance Practices
  • Read your AI provider's terms of use. Know what is claimed about IP ownership, indemnification, and what data the provider may train on.
  • For high-stakes code — open-source releases, regulated products, licensed software — keep a record of what was AI-assisted. Commit tags or a changelog flag work well enough.
  • Use tools that detect verbatim reproduction from known sources. Some providers offer this; third-party scanners exist.
  • License-incompatible suggestions deserve scrutiny. If your codebase is MIT and the model emits something that closely mirrors a GPL project, investigate.
  • Internal tools vs distributed code have different exposure profiles. Internal tooling: low risk. Distributed binaries: careful audit.
  • For very high-stakes IP — a patentable algorithm, a trade-secret implementation — consider whether AI assistance is appropriate at all. Some prompts can become part of the provider's training corpus unless explicitly excluded.
  • Document the AI tools in use in the project's NOTICE or equivalent. Transparency is rarely the wrong choice.
In a few years, the question "where did this code come from?" will have a better answer than today's. Until then, leave yourself the breadcrumbs to reconstruct it if you must.
Chapter the Thirty-First

Accessibility & Internationalization

Two domains the AI defaults to neglecting.

A product that does not work for users with screen readers is broken. A product that displays 1,234.56 to a German user who needs 1.234,56 is broken. A product whose interactive elements have no keyboard handlers is broken. These are not edge cases. They are categories of users and contexts that the AI, optimizing for what looks reasonable in a demo, will quietly fail to serve.

Accessibility (a11y) and internationalization (i18n) are not the polish at the end. They are constraints on architecture from the beginning. Retrofitting either is dramatically more expensive than getting it right the first time.

a11y & i18n Practices
  • Semantic HTML first. ARIA where semantics are insufficient. Keyboard handlers on every interactive element.
  • Color contrast meets WCAG AA at minimum. Test with simulators. Verify with real users where possible.
  • Every user-facing string lives in a translation file. No string literals in templates or components.
  • Date, time, number, currency formatting goes through the platform's i18n APIs. Never hand-format.
  • Test with a screen reader. Test with a keyboard only. Test in a right-to-left locale. Test at 200% zoom.
  • The AI can generate ARIA attributes and translation scaffolding; it cannot verify the actual experience. Verification is human work.
  • Make a11y and i18n part of Definition of Done. Otherwise they become "next sprint" forever.
Accessible and translatable software is not built by adding features at the end. It is built by holding two extra constraints from the first commit.
Chapter the Thirty-Second

Specialized Domains: ML, Data & Embedded

Three places vibe coding looks different from web work.

Most of this manual has implicitly assumed conventional software: web services, data pipelines, command-line tools. The principles still apply elsewhere — but the texture changes substantially in three domains worth calling out.

i.Machine Learning & Data Science


Notebooks are not engineering artifacts. The AI writes models with fluency but skips the boring rigor that separates a paper-quality result from a deployable one. Track experiments. Version data, not just code. Validate distributions, not just shapes. Productionize beyond the notebook — the same code, run from a script, against a test set, with the same outputs, is the minimum threshold of "real."

ii.Embedded & Safety-Critical Software


AI suggestions face additional scrutiny when memory is bounded, real-time deadlines exist, or certification standards apply (ISO 26262, DO-178C, IEC 62304). Generated code is rarely directly admissible into qualified pipelines without review. Treat the AI as a research assistant for the architect, not as the author of the certified artifact. Static analysis is your friend; so is the hard requirement to read every generated line.

iii.Data Engineering & Pipelines


Pipelines fail differently from services: not with a stack trace, but with quietly wrong numbers in a dashboard. Data quality checks (Great Expectations, dbt tests, custom assertions) are the unit tests of data work. Schema-on-read can be a feature; it can also be a quiet way to lose data. The AI writes transformations quickly; you must specify the assertions that prove they did what was meant.

Cross-Domain Practices
  • For ML: seed everything, version data and models, validate distributions, productionize beyond the notebook.
  • For embedded: respect timing and memory budgets; verify against the target hardware, not the development machine.
  • For safety-critical: AI assists; humans certify. Maintain traceability between requirements, code, and tests.
  • For data: assertions are not optional. Every transformation has an expected output shape and distribution.
  • In all cases: the model has read far more about web work than about your domain. Calibrate trust accordingly.
The further you are from the model's training distribution, the more carefully you must read what it gives you. Out here, "looks right" is the most dangerous compliment a diff can earn.
Chapter the Thirty-Third

Building Your Own Tools & Skills

On the small automations that compound.

The most productive vibe coders eventually start building their own tools — small MCP servers that expose internal APIs to the assistant, reusable skills that encode common workflows, slash-commands that compress a multi-step procedure into a single invocation. This is leverage on leverage. A skill built once saves time on every subsequent session that uses it.

Start with the workflows you find yourself repeating. If you have explained the same project layout to the AI three times this week, that explanation belongs in CLAUDE.md. If you have walked the model through the same testing procedure repeatedly, that procedure belongs in a skill or a script.

Tool-Building Practices
  • Notice repetition. The third time you do something, automate it. The first and second times, just suffer.
  • Build small. A fifty-line MCP server that lists your internal services is more useful than a grand framework you never finish.
  • Document the tools you build. Future you will not remember the contract. Treat your own skills as software for users.
  • Share within the team. A team prompt library plus a team skill set plus a team MCP toolkit is a real productivity multiplier.
  • Do not build what your provider already offers. Use first; build only where the gap is real and persistent.
  • Treat tool permissions seriously. An MCP that can write to your filesystem can also delete from it. The blast radius is yours.
The boundary between user and developer of an AI workflow is dissolving. The serious practitioner sits on both sides of it on the same afternoon.
Chapter the Thirty-Fourth

The Limits of Vibe Coding

On knowing when the AI is a hindrance — and turning it off.

This manual has spent thirty-three chapters on how to use AI to write software well. It would be incomplete without a chapter on when not to use it at all.

The AI is a hindrance when the problem is novel enough that the model will pattern-match to something superficially similar and steer you wrong. It is a hindrance when the constraint that matters cannot be expressed in code — when you need to argue, persuade, or feel your way to an answer. It is a hindrance when learning is the point — when you need the difficulty of solving the problem yourself in order to be able to solve the next one. It is a hindrance when the cost of a confidently-wrong answer exceeds the cost of being slow.

Mature practitioners notice these moments. They close the chat, open a blank file, and think.

Knowing When to Stop
  • For genuinely novel research-level problems, write a draft yourself before consulting the model. Anchor your thinking before being anchored by its.
  • For learning — especially a new language or framework — struggle first. The AI removes the struggle that makes the learning stick.
  • For high-consequence decisions (architecture, hiring, security posture, regulatory interpretation), use the AI for inputs, not outputs.
  • For creative work where voice matters (technical writing, documentation tone, code style), let your own voice form before borrowing one.
  • When the chat is making you feel rushed, slow down. The pace of suggestions is not the pace of correct decisions.
  • When you notice yourself nodding along without understanding, stop. Close the chat. Re-open the file. Read what is actually there.
The most expensive habit a junior engineer can develop in the AI era is to never have struggled. Struggle is how taste develops. Without taste, all the leverage in the world points in the wrong direction. A warning, gently given
Book the Fourth

The Practitioner's
Life

✦ ✦ ✦

Three books have explored what the work is. The fourth turns to who the worker is — and how they continue, deepen, and survive a long career in software.

These chapters concern the human side of the practice: how engineers are hired and mentored, how juniors develop taste when the AI does the typing, how to work with the people who do not write code, how to think clearly about the ethical weight of what we build, and what shape a healthy career and a healthy team might take in the decades ahead.

The technical books were necessarily prescriptive. This one is necessarily more tentative. The practice it describes is younger and the right answers are less settled. Read it as one practitioner's notes, offered in good faith.

Chapter the Thirty-Fifth

Hiring & Interviewing in the AI Era

On finding people who will still be valuable when the AI is even better.

The technical interview was already an imperfect filter for engineering ability. The AI has made many traditional questions almost useless. A candidate's ability to produce a working solution to a familiar problem now reveals very little about their judgment, taste, or capacity to grow. Worse, the easiest interview questions to write — the ones that test syntax, algorithms, and pattern matching — are precisely the ones the AI can solve in seconds.

The mature practice is to interview for the skills the AI does not replicate: clear thinking under uncertainty, the willingness to question requirements, debugging from incomplete information, communication, taste in design choices. None of these are easy to assess. All of them matter more than they did before.

Interview Practices
  • Replace algorithm trivia with open-ended design discussions. Give a real-ish problem and let the candidate ask questions. What they ask is more revealing than what they answer.
  • Pair-program with the AI explicitly included. Watch how they prompt, what they accept, what they reject, what they verify.
  • Take-homes that require judgment, not implementation effort. The interesting question is what they chose to do, not whether they could type it.
  • Ask candidates to debug code they did not write. The AI will produce buggy code on demand; how do they find and characterize the bug?
  • Reference checks for taste and collaboration, not just competence. "How does this person decide what is good?"
  • Calibrate seniority by judgment shown under uncertainty, not depth of memorized knowledge.
The question "can you write a binary search by hand?" filters for one skill. The question "this binary search is wrong; how do you find out?" filters for ten.
Chapter the Thirty-Sixth

Onboarding & Mentorship

On the first thirty days, and what they should leave behind.

A new engineer joining a team in the AI era faces an unusual challenge: the tools will make them productive almost immediately, before they have built any of the context that would make their work durably correct. They will ship code in their first week. Whether that code is genuinely good or merely runs is a question the team's onboarding determines.

The remedy is not less AI. It is more deliberate scaffolding around it: shared conventions, explicit pair-vibing, structured review, and a clear sequence of small wins that build context before they require it.

Onboarding Practices
  • Week one: read CLAUDE.md, run the project locally, deploy a tiny supervised change. Build muscle memory before responsibility.
  • Pair-vibing as teaching method: the new engineer drives the prompts; the mentor reviews and explains. Reverse the roles as confidence grows.
  • Starter projects that require reading — documentation fixes, small bugs, test improvements. Comprehension before composition.
  • Twice-weekly reviews for the first month, not of correctness but of approach: "Why did you accept this suggestion? What did you verify? What did you skip?"
  • Encourage struggle on at least one problem per week before reaching for the AI. The struggle is the curriculum.
  • Document onboarding as it happens. The new engineer's questions become tomorrow's CLAUDE.md and runbooks.
An engineer who joined a year before the AI got good is structurally different from one who joined a year after. Neither is better; both are different. Mentorship must account for this.
Chapter the Thirty-Seventh

The Junior Engineer's Path

On developing taste when the AI does the typing.

A junior engineer in this era inhabits a peculiar professional moment. The barrier to producing working code is lower than ever. The ceiling on what they can accomplish in a week is dramatically higher. And yet: the path to becoming a senior engineer — the kind whose judgment can be trusted on important decisions — appears to require something the AI has made optional, namely the slow accumulation of taste from a thousand small struggles.

This is real. It is also addressable. Junior engineers who consciously preserve struggle in their practice will arrive at senior judgment on roughly traditional timelines. Junior engineers who outsource all struggle will arrive somewhere else, and the somewhere is not where the senior jobs will be.

A Junior's Curriculum
  • Pick one or two areas per quarter to learn deeply, without AI assistance. Reading, writing, struggling.
  • After every AI-assisted task, write a one-paragraph reflection: what did I learn? what would I have learned if I had done it myself?
  • Read code you did not write — open-source projects, your organization's older codebase. Reading is how taste develops.
  • Develop opinions about good and bad code, then defend them in writing. The opinions matter less than the practice of forming them.
  • Seek out senior engineers to disagree with. Their disagreement is your curriculum.
  • Resist the temptation to ship more, faster, just because you can. Going slowly on the right things is a senior skill rehearsed in junior years.
The senior engineers of 2035 are the junior engineers of today who refused to skip the struggle. For the long road ahead
Chapter the Thirty-Eighth

Working With Non-Engineers

On PMs, designers, executives, and the AI mythology they have absorbed.

Non-engineers have been told a great deal about what AI can do. Some of it is true. Much of it is wishful, distorted by enthusiastic sales pitches and confident demos. Working effectively with stakeholders now requires gentle, repeated, accurate communication about what is and is not possible — and a willingness to push back on demands premised on the wishful version.

The defining sentence of stakeholder management in this era is "but ChatGPT can do this in five minutes." Sometimes it can. More often it can produce something that looks like the thing in five minutes, which is not the same. Distinguishing these two cases, and explaining the distinction without condescension, is now part of the job.

Stakeholder Practices
  • Be specific about what AI helps with and what it does not. Vague reassurance breeds vague expectations.
  • When a stakeholder says "but the AI can do this", gently ask: what is the AI producing, and what would it take to make that production-ready? The gap is the work.
  • Estimate with production cost in mind, not prototype cost. The first version is roughly ten percent of the total.
  • Document decisions in writing, especially those involving claimed AI capability. The mythology is fluid; the record is not.
  • Offer demos when realistic; decline when not. A successful demo with unrealistic claims is more expensive than a missed meeting.
  • Build trust through accuracy. Be the engineer whose estimates the stakeholder learns to believe.
The most important career skill in the AI era may be the ability to set expectations that survive contact with reality.
Chapter the Thirty-Ninth

AI Ethics & Responsibility

On the questions the model will not ask for you.

The AI does not ask whether you should build the thing. It builds. This is convenient. It is also a transfer of responsibility you must consciously decline to accept.

Every piece of software embeds choices: what data to collect, what to surveil, what to optimize for, whose interests to serve, what risks to externalize. The AI accelerates the construction of these choices without raising any of them. The engineer remains the only party who can. The reduction in friction makes the asking more important, not less.

This chapter cannot tell you what is right. It can only argue that you must ask.

Two Questions, and What Follows
  • Before building, ask two questions: who is this for, and who might it harm? If you cannot answer the second, you have not understood the system.
  • Bias check: would this code produce different outcomes for different groups? If yes, is that intended? Documented? Auditable?
  • Dark patterns: is the design optimizing for users' interests, or against them? The AI will produce either with equal fluency.
  • Surveillance: what does this collect, and what would happen if that data leaked? Build only what you would be willing to defend.
  • Refusal: there are projects you should not work on. Being able to recognize them, and to walk away, is part of the job.
  • When the ethics get complicated, talk to humans — colleagues, ethicists, lawyers, affected communities. The AI is not equipped for this conversation.
The engineer's signature is on the code. Whose code, exactly, is a question the AI does not change.
Chapter the Fortieth

A Personal Practice

On the daily disciplines that compound over decades.

The technical practices in this book are useful. The deeper practices — the ones that sustain a career — are different in kind. They concern attention, reflection, deliberate practice, and the small habits that distinguish an engineer who improves from one who merely accumulates years.

What follows is not prescriptive. It is one practitioner's set of disciplines, offered as a starting point. The right shape of a personal practice is one you will arrive at through experimentation. The wrong shape is the absence of one.

Daily, Weekly, and Slower Rhythms
  • Daily: a brief end-of-day note. What did I learn? What surprised me? What would I do differently?
  • Weekly: review the week's commits. Are you proud of them? Which ones would you rewrite if asked?
  • Monthly: pick one skill to deepen — a language, a concept, a tool. Schedule the deepening time on the calendar, not in "spare" time that never arrives.
  • Quarterly: read one book that has nothing to do with your immediate work. Adjacent fields and far-flung ones both contribute.
  • A decision journal: when you make a significant decision, write down the reasoning. Revisit a year later. Calibrate.
  • A personal prompt library: prompts that work for you, evolved over time. Keep it in version control like any other code.
  • A reading log. Books, papers, posts, talks. Searchable. The compounding interest of a decade of notes is substantial.
The compounding interest of a small daily reflection, sustained for a decade, outperforms almost any other professional investment available to an engineer.
Chapter the Forty-First

The Engineering Career, in Long Perspective

On what to invest in, when investments are uncertain.

A career in software has always involved betting on which skills would matter. The bet is harder now: the AI is reshaping the value of individual skills faster than careers can be planned. Some skills are appreciating; others are quietly depreciating. The challenge is to invest in the appreciating ones without abandoning the depreciating ones too soon.

A general thesis: skills that are easy to articulate but hard to acquire — judgment, taste, communication, debugging from incomplete information, system design under uncertainty — are appreciating. Skills that are easy to acquire but tedious to apply — boilerplate, mechanical refactoring, syntax memorization — are depreciating. The middle band is uncertain, and the prudent practitioner watches it carefully.

Career Investments
  • Invest in judgment, not implementation speed. The latter is converging across the industry; the former remains rare.
  • Generalize within a domain, then specialize within the generalization. T-shaped beats both pure breadth and pure depth.
  • Maintain a portfolio: things you have built, things you have understood, things you have repaired. Your value is the sum of these.
  • Cultivate the meta-skills: writing, speaking, teaching, reviewing. They compound across every role.
  • Stay engaged with the fundamentals — data structures, networking, operating systems, statistics. They are the bedrock against which new layers are evaluated.
  • Plan in five-year intervals, not five-week ones. Career decisions made under sprint pressure age badly.
  • Build relationships across roles. Engineers who work well with PMs, designers, ops, and executives outlast those who only work well with code.
The engineer who, in 1995, learned only HTML had a fine year. The engineer who, in 1995, learned to learn had a fine career.
Chapter the Forty-Second

The Healthy AI-Native Team

On building together with humans and machines.

A team is more than the sum of its engineers. The norms, rituals, and shared understandings that develop within a team can make ordinary engineers excellent and excellent ones great. The opposite is also true. The arrival of pervasive AI has stressed many of these norms, often invisibly. A healthy team in this era is one that has consciously rebuilt its norms for the new conditions.

There is no single template. But there are patterns visible in the teams that are thriving, and patterns visible in those that are quietly degrading. This final chapter sketches both.

Patterns of Healthy Teams
  • Make conventions explicit and version-controlled. CLAUDE.md, .cursorrules, copilot-instructions.md belong in the repo and in PR review.
  • Maintain quality bars even when velocity allows skipping them. The bars are what make velocity safe.
  • Senior engineers spend more time reviewing, mentoring, and designing — less on implementation. The leverage is in the review and the architecture.
  • Junior engineers ship real code with real consequences, supervised but not protected. Protection breeds atrophy.
  • Postmortems for incidents and near-misses. A near-miss not analyzed is an incident postponed.
  • Rituals matter: weekly demos, monthly retrospectives, quarterly reading clubs. Not bureaucracy — connective tissue.
  • The team learns out loud. Decisions get written down. Lessons get shared. Knowledge accumulates as an asset, not a private possession.
A healthy team is recognizable from outside by the way its junior engineers talk about the work. They are curious. They have opinions. They are not afraid to disagree.
✦ ✦ ✦
Coda

The Discipline of Production

Where the four books meet.

The four books of this manual describe the same craft from four angles. Book the First taught the day-to-day rhythms of working with an AI: planning, prompting, debugging, testing, committing. Book the Second described the disciplines that turn those rhythms into a serious product: architecture, documentation, error handling, observability, release engineering, the long life of software. Book the Third turned outward — to APIs, auth, compliance, accessibility, specialized domains, the tools you build for yourself, and the limits beyond which the AI ceases to help. Book the Fourth turned inward — to the practitioner: how engineers are hired and mentored, how juniors develop, how to think clearly about ethical weight, what makes a healthy team and a healthy career.

In practice all four are inseparable. Every prompt is an architectural choice. Every test is documentation. Every deploy is a release decision. Every hire is a vote about what kind of team this will become. Every commit is a small statement about what you believe good engineering looks like. The disciplined practitioner does not pick between "moving fast" and "moving carefully." They notice that careful is the only kind of fast that lasts.

The AI does not change what good engineering is. It changes only the cost of the components — making the easy parts trivial, and exposing more sharply the parts that were never easy. Architecture, judgment, taste, accountability, craft, mentorship, ethics, and the wisdom to know when to set the machine aside: these were always the work. They still are.

The faster the AI gets, the more visible the engineer becomes — for better, and for worse. Closing maxim
Envoi

A Note to the Future Reader

For the practitioner reading this years from now.

This manual was written in mid-2026, when the tools were one shape; by the time you read it they will be another. The specific models will be replaced. The IDE integrations will look different. New categories of tooling will exist that this book does not name. Some of what is written here will read as quaint — the way early notes on version control read today, written when "commit" still felt like a daring verb.

The specifics will date. The principles, I hope, will not.

Software engineering has always rewarded clear thinking, disciplined practice, and humility before the consequences of what we build. The AI did not change that, and the AI's successors will not change it either. They only change the pace at which the consequences arrive, and which parts of the work demand the most attention. The work itself — designing, deciding, building, maintaining, and being accountable for systems that other people depend on — is older than any of our tools and will outlast them.

If you are reading this in a moment when some new instrument has just arrived and the discourse insists that everything has changed: it has not. Read the new instrument carefully. Find what it makes cheap, and what it makes obvious. Bring the same disciplines to bear. Build things that work, that last, and that you are proud to have made.

The tools change. The craft does not. A parting wish
✦ ✦ ✦
Appendix A

A Prompt Library

Reusable instruments for the daily practice.

A small collection of prompts that have earned their keep. Adapt them to your stack, your project, and your tone. Each one assumes you have already established context in the session — they are tools, not opening lines.

i. Architecture Exploration
For decisions that are expensive to change later
I need to make a design decision: <describe the problem and constraints>. Propose three distinct approaches. For each, give me: - A one-paragraph sketch of how it works - Estimated complexity and cost - Two failure modes and how each is recovered - What changes when load grows 10x Then critique each as a sceptical principal engineer. Recommend one and state the assumption that, if false, would change your recommendation.
ii. Sceptical Self-Review
For diffs before they merge
Review the following diff as a sceptical senior engineer preparing for a code review. List concerns by severity (critical, moderate, minor). Look specifically for: - Silent error handling (bare except, swallowed exceptions) - Fabricated APIs or functions that may not exist - Tests that do not actually assert behavior - Magic numbers, hardcoded values, secrets - Over-abstraction and premature generalization - Missing edge cases (empty input, null, large input) - Security issues (injection, auth, secrets) Do not propose fixes yet. I want the list first.
iii. Five-Causes Diagnosis
When "fix it" is no longer working
Here is the error: <paste full traceback> Here is the relevant code: <paste minimal repro> Here is what I have already tried: <list attempts> Do NOT propose a fix. 1. List five plausible root causes, most to least likely. 2. For each, describe one quick experiment that would confirm or refute it without changing application behavior. 3. Tell me which experiment to run first and what specific output would confirm or rule out the cause.
iv. Failing Test First
For bugs, before fixes
I have a bug: <describe symptom and reproduction>. Step 1: Write a failing test that reproduces the bug in our existing test framework. Do not fix the bug yet. Show me the test and confirm it fails on current code. Step 2: Only after I confirm, propose the minimal fix. Re-run the test. It must pass without modifying the test itself. Step 3: Identify any related code paths that may have the same bug. List them; do not fix them yet.
v. Security Audit
Periodic, not one-time
Audit this codebase (or specific module) for security issues. Check for: - Hardcoded secrets, API keys, tokens - SQL injection, command injection, path traversal - Missing authentication or authorization on endpoints - Unsafe deserialization (pickle, eval, etc.) - CORS misconfiguration, missing CSRF protection - Sensitive data in logs - Dependency vulnerabilities (suggest scanning tool) - Insecure defaults Group findings by severity. For each, cite the file:line. Suggest the fix but do not apply it yet.
vi. Legacy Walkthrough
Understand before you change
Walk me through this module as if I am a new engineer joining the team. 1. What is its purpose in one sentence? 2. What are its public entry points? 3. Who calls them (search the codebase if needed)? 4. What is the data flow through the module? 5. What invariants does it assume about inputs? 6. What is the riskiest or most surprising part? 7. What would you be afraid to change without tests? Do not propose any changes. I just want to understand.
vii. Profile-First Performance
Numbers before opinions
A profile (attached) shows <endpoint/function> spending most of its time in <area>. Code is attached. Do NOT propose optimizations yet. 1. Identify the top three time sinks in the profile. 2. For each, hypothesize the root cause based on the code. 3. Propose one diagnostic experiment per hypothesis. 4. Rank hypotheses by impact-per-effort. After we agree on the cause, we will discuss the fix. A "faster" change without a measurement is not accepted.
viii. Release Readiness
Before the deploy button
I am about to deploy <change> to production. Generate a release-readiness checklist for THIS specific change: - What could break, and how would we notice? - What metrics or logs should be watched post-deploy, and for how long? - Is a feature flag warranted? If yes, draft the flag config. - What is the rollback procedure? Step by step. - Are there data migrations? If yes, are they reversible? - Who else needs to be informed before, during, after? Be specific to the change, not generic.
Appendix B

An Anti-Patterns Catalog

Named failure modes of unsupervised vibe coding.

Anti-patterns are easier to avoid once they have names. The twelve below appear repeatedly in real codebases produced under AI assistance. If you can recognize them in a diff — yours or a teammate's — you have already done most of the work of preventing them.

i.The Confident Hallucination

The model invents an API that does not exist.

A function is called with the precise tone of someone who has used it for years. It does not exist. The library does not have that method. The import resolves, but the attribute does not — or worse, it resolves to something that does something else.

SignalA function or attribute name that looks plausible but does not appear in the library's documentation. Tests that import-error or fail with AttributeError.

RemedyRun the code. Read the docs. When in doubt, ask the model for the documentation link. Distrust fluency.

ii.The Silent Swallow

A bare except turns an error into a lie.

Code that "handles" exceptions by catching everything and continuing. Downstream code now operates on bad data and produces confidently-wrong results, with no log line to suggest anything went wrong.

SignalBare except: or except Exception: with no logging and no re-raise. return None on failure with no documentation that None means error.

RemedyCatch specific exception types. Log with context. Re-raise unless the failure is genuinely handleable. Distinguish errors you handle from errors you propagate.

iii.The Spec Drift

The implementation slowly forgets what it was for.

You start with a clear intent. Each prompt refines slightly. Twenty prompts later, the code does something subtly different from what you originally needed, and neither you nor the model remembers exactly when the drift occurred.

SignalYou catch yourself unable to explain why a particular feature works the way it does. The spec doc, if it exists, no longer matches the code.

RemedyWrite the spec down before coding. Re-read it at session boundaries. When in doubt, compare implementation against spec and reconcile deliberately.

iv.The Whack-a-Mole Fix

Each fix creates the next bug.

You report a bug. The AI proposes a fix. The fix introduces a new symptom. You report that. A new fix. New symptom. After three rounds you are no longer sure whether the original bug is even present.

SignalThe diff grows with every "fix." Tests pass selectively. You cannot explain why the current version works.

RemedyStop. Revert to the last known-good commit. Diagnose root cause before any fix. Use the five-causes prompt (Appendix A, iii).

v.The Vibe Sprawl

Every prompt grows the scope.

You asked for a small change. The model also reformatted the file, renamed a helper, "improved" an unrelated function, and added a dependency you did not request. The PR is now five times its intended size and impossible to review.

SignalDiffs much larger than the task warranted. Changes in files you never mentioned. New dependencies in package.json you did not approve.

RemedyState the scope explicitly. Add to CLAUDE.md: "Do not modify files outside the task. Do not add dependencies without permission." Reject sprawling PRs.

vi.The Mock Test

A test that asserts nothing meaningful.

A test exists. It runs. It passes. It does not actually verify that the function does what the function is supposed to do. The most common form: an assertion that the function returns a value (any value), or that it does not throw (regardless of result).

Signalassert result is not None. assert True. Tests that pass even when the function body is replaced with pass or return None.

RemedyMutation testing, or the cheaper version: deliberately break the function and confirm the test catches it. If it does not, the test was theater.

vii.The Phantom Refactor

Code is rewritten that did not need to change.

The model decides, mid-task, that a nearby function would be "cleaner" with different structure. It rewrites. Behavior may or may not be preserved. Either way, you did not ask, and now you must review code you did not intend to touch.

SignalDiffs in functions or files not mentioned in the prompt. Unexplained moves between modules. Renames in code you did not ask to rename.

RemedyA standing rule in CLAUDE.md: "Do not refactor adjacent code unless explicitly asked." Revert phantom changes on sight.

viii.The Frankenstein Stack

Too many libraries, none of them necessary.

Six dependencies appear over a single session: a date library, an HTTP library, a logging library, a validation library — each pulled in to handle something the standard library could have handled in three lines. Now you maintain six relationships you did not need.

SignalLockfile diffs that surprise you. Dependencies whose purpose you cannot articulate. Transitive dependency counts in the hundreds for a small project.

RemedyRequire permission before adding dependencies. Prefer the standard library when reasonable. Audit the dependency tree periodically.

ix.The Monolithic File

Everything piles into one place.

The AI's path of least resistance is to add to an existing file rather than create a new one. By month three, a single file is 2,000 lines, untestable in isolation, and impossible to review in any single sitting.

SignalFiles growing without bound. Imports that pull in more than you intended. Tests that cannot be run on individual modules.

RemedyPeriodic refactor sessions with an explicit prompt: "Suggest where this file should be split. Propose modules and their interfaces." Then split.

x.The Lost Thread

Context is poisoned, and you keep prompting anyway.

Three failed attempts in, the model is now confused. It contradicts itself. It re-introduces bugs it previously fixed. You keep prompting because the next answer feels close. The next answer is not close. The context is corrupt.

SignalThe model contradicts what it said earlier in the same session. Fixes regress. The diff oscillates between near-identical states.

RemedyStop. Summarize the desired end state. Open a fresh session with only that summary. The new context will outperform the poisoned one immediately.

xi.The Speed Trap

Velocity that hides accumulating debt.

Features ship at unprecedented pace. Customers are happy. Then, six months later, the team cannot make any change without breaking three things. The debt was invisible until it was unmanageable.

SignalPace dropping over time despite team size growing. Refactors that should be cheap turning expensive. Onboarding times getting longer.

RemedyMake debt visible. Track refactor time. Allocate a fixed percentage of every sprint to repayment. Treat code quality as a leading indicator, not a lagging one.

xii.The Demo Polish

Works perfectly in the happy path. Falls apart elsewhere.

The feature looks great in the demo. The first user with an unusual input — empty, very large, non-ASCII, slow network — encounters a stack trace. The AI optimized for the example you showed it. You did not show it the edge cases.

SignalBugs filed within hours of release for inputs nobody tested. "It worked when I tried it" as a recurring response.

RemedyProperty-based testing. Fuzz inputs. Explicit edge-case prompts: "What inputs would break this? Generate ten and test each."

Appendix C

A Production Readiness Checklist

For the day before — and the day of — going to production.

A checklist is not a substitute for judgment. It is a substitute for the parts of your judgment that you would otherwise forget at three in the morning. Adapt this to your context; treat unchecked items as conversations to have, not boxes to dismiss.

i.Security eight items


  • No secrets, keys, or tokens hardcoded in source. All credentials loaded from environment or secret manager.
  • All endpoints declare and enforce authentication requirements; none default to open.
  • Authorization checks on every endpoint that touches user-specific or sensitive data, with negative-case tests.
  • Input validation on all external inputs (request bodies, query parameters, file uploads).
  • Output encoding appropriate to context (HTML escaping, SQL parameterization, shell-safe quoting).
  • Rate limiting on authentication, registration, and other abusable endpoints.
  • Dependency vulnerability scan is green; high-severity findings triaged.
  • A security review by someone other than the original author has been performed and signed off.

ii.Reliability & Resilience seven items


  • Timeouts set on every network and process-boundary call. No "default infinite" timeouts in production code.
  • Retries with exponential backoff for transient failures; only on idempotent operations.
  • Circuit breakers (or equivalent) on calls to unreliable downstream services.
  • Graceful degradation paths defined for each external dependency that could fail.
  • Resource limits set (memory, CPU, file descriptors, connection pools) and tested under load.
  • Idempotency keys on state-changing operations callable by clients that may retry.
  • Documented behavior when upstream dependencies are slow, not just when they are down.

iii.Observability seven items


  • Structured logging (JSON, with fields) throughout. No print statements remaining.
  • The four golden signals — latency, traffic, errors, saturation — instrumented and dashboarded.
  • Error tracking system receives unhandled exceptions with full stack traces and request context.
  • Distributed tracing in place for any service-to-service call; correlation IDs propagated.
  • Health endpoints meaningfully reflect service capability, not merely process liveness.
  • Alerts defined for the conditions that warrant a human; runbooks linked from each alert.
  • Sensitive data (PII, tokens, credentials) explicitly excluded from logs and traces.

iv.Data & Migrations six items


  • All migrations applied to a staging copy of production data without errors.
  • All migrations reversible, or accompanied by an ADR explaining why they are not.
  • Backfills (if any) run as separate, idempotent, resumable jobs with progress reporting.
  • Database backup strategy documented and most recent restoration test passed.
  • Indexes reviewed for both query performance and write impact at expected scale.
  • Data retention and deletion policies match regulatory obligations.

v.Deployment & Release seven items


  • CI pipeline green; lint, type-check, test, build, security scan all passing.
  • Deploy procedure documented, executed at least once in staging, and reversible.
  • Rollback procedure documented and rehearsed within the last quarter.
  • Feature flag (if used) configured, defaults verified, kill switch tested.
  • Canary or staged rollout plan defined; success and abort criteria explicit.
  • Post-deploy monitoring window scheduled; on-call engineer identified.
  • Stakeholders (support, sales, ops) informed of the deploy and any user-facing changes.

vi.Documentation & Operability six items


  • README runs the project from scratch; tested on a clean machine within the last release cycle.
  • ADRs exist for the architecturally significant decisions; linked from the codebase.
  • Runbooks exist for deploy, rollback, restore, secret rotation, common incidents.
  • API documentation generated from code and accurate as of this release.
  • On-call rotation, escalation paths, and contact lists current.
  • Changelog updated; release notes communicated to users where relevant.

vii.Compliance & Privacy six items


  • PII inventory current; new fields classified and protected appropriately.
  • Audit logs capture all access to and changes affecting regulated data.
  • Data residency constraints (if any) verified at the infrastructure level.
  • Deletion and export workflows (right to erasure, data portability) implemented and tested.
  • Third-party data processors documented; data processing agreements in place.
  • For safety-critical or certified systems: traceability matrix current, qualified-tool requirements satisfied.
A checklist completed in haste is a checklist no one will trust later. Slow down at the items that matter most; speed up only where the consequences of being wrong are genuinely small.
Appendix D

A Glossary

Terms used throughout the four books.

Brief definitions, in the spirit of a working reference rather than an exhaustive dictionary. Where a term has multiple senses, the definition reflects the sense used in this manual.

ABAC Attribute-Based Access Control
Authorization model where decisions are derived from attributes of users, resources, and context, rather than from fixed roles.
ADR Architecture Decision Record
A short document capturing the context, options considered, decision made, and consequences of a significant architectural choice. Kept in the repository.
Agent
An AI system that can take actions in the world — run shell commands, read and write files, call APIs — on behalf of a user.
ASIL Automotive Safety Integrity Level
Classification of safety requirements in ISO 26262, ranging from QM (no safety relevance) through ASIL A to ASIL D (most stringent).
AuthN
Shorthand for authentication: the process of verifying who a user is.
AuthZ
Shorthand for authorization: determining what an authenticated user is allowed to do.
Backfill
A data operation that retroactively populates new fields or structures with values derived from existing data. Typically run as a separate idempotent job, not inside a migration.
Brownfield
Software work on an existing system. Contrast with greenfield.
Canary Release
A deployment strategy where new code is exposed to a small fraction of traffic first, then ramped up only on green metrics.
Characterization Test
A test written to pin down existing behavior, typically before refactoring code whose behavior must be preserved.
Circuit Breaker
A pattern that prevents repeated calls to a failing dependency by tripping open after a threshold of failures, then attempting to recover after a cool-down period.
CLAUDE.md
By convention, a markdown file in a repository's root containing project-specific instructions for an AI assistant: stack, conventions, things not to do, how to run.
Context Window
The maximum span of text a language model can attend to in a single inference. Has direct cost and quality implications.
Contract Test
A test that pins the shape and behavior of an API contract between services, surfacing breaking changes at build time rather than in production.
Correlation ID
A unique identifier attached to a request and propagated through every downstream call, enabling traces of distributed work to be reconstructed after the fact.
Dark Launch
Deploying new code that runs in production against real traffic but does not expose its results to users. Useful for performance and correctness validation.
Distributed Tracing
An observability technique that records the path of a single request through multiple services, enabling diagnosis of latency and failure across service boundaries.
DO-178C
Software considerations in airborne systems and equipment certification. The dominant standard for safety-critical avionics software.
Expand-Contract
A schema migration pattern: add new structure, dual-write, backfill, switch reads to the new structure, then remove the old. Enables zero-downtime schema changes.
Feature Flag
A runtime toggle that decouples deployment of code from release of behavior. Risky changes can ship dark, be tested in production, and be killed quickly if needed.
Golden Signals
Four key metrics recommended for monitoring any service: latency, traffic, errors, and saturation.
Greenfield
Software work starting from nothing. Contrast with brownfield.
IEC 62304
International standard for medical device software lifecycle processes.
Idempotent
An operation that produces the same result whether called once or many times. Critical property for safely retried operations.
ISO 26262
International standard for functional safety in road vehicles. Defines processes and requirements for safety-critical automotive software.
Lockfile
A file recording the exact versions of all transitive dependencies of a project, ensuring reproducible installs across environments and over time.
MCP Model Context Protocol
An open protocol that lets AI models connect to external tools and data sources — filesystems, databases, browsers, APIs — through standardized servers.
Mermaid
A text-based diagram syntax that renders as flowcharts, sequence diagrams, and similar. Version-controls naturally because it is text.
N+1 Query
A common performance anti-pattern: one query to fetch a collection, followed by one additional query per item.
Non-Functional Requirements
Requirements about how a system behaves — latency, availability, security, scalability — rather than what it does.
PHI Protected Health Information
Health-related data regulated under HIPAA in the United States and analogous regimes elsewhere.
PII Personally Identifiable Information
Data that can identify an individual, directly or in combination with other data. Regulated under GDPR, CCPA, and many other regimes.
Postmortem
A written analysis of an incident: timeline, root cause, contributing factors, action items. Blameless by convention.
Pre-commit Hook
A script run automatically before a Git commit completes. Used for fast local feedback: formatting, linting, type checks, secret scans.
RBAC Role-Based Access Control
Authorization model where users are assigned roles, and permissions are attached to those roles.
ReBAC Relationship-Based Access Control
Authorization model where permissions are derived from relationships between entities (e.g., "user is editor of document"). Zanzibar-style.
Retry with Backoff
A pattern for handling transient failures: retry the failed operation after a delay, with the delay growing (often exponentially) and a random jitter added to avoid synchronized retries.
Runbook
A documented procedure for an operational task, typically for use during incidents or routine maintenance.
SAST Static Application Security Testing
Automated analysis of source code for security vulnerabilities, run as part of CI.
SBOM Software Bill of Materials
A formal, machine-readable inventory of all components, libraries, and dependencies in a software system. Increasingly required for regulated and government-adjacent software.
SDD Spec-Driven Development
A practice of writing specifications before implementation, against which AI-assisted code generation can be steered and verified.
Skill in the AI tooling sense
A reusable bundle of instructions, prompts, and tool configurations that an AI assistant can invoke for a recurring workflow.
SLO Service Level Objective
A target value for a service quality metric (e.g., 99.9% of requests served under 200ms), against which performance is measured.
Sub-agent
A secondary AI instance spawned by a primary one, with an isolated context, to handle parallel or specialized work without polluting the main thread.
TDD Test-Driven Development
Writing a failing test first, then the minimum code to make it pass, then refactoring. Particularly powerful in AI-assisted workflows because it pins behavior explicitly.
Vibe Coding
The practice of building software in continuous dialogue with an AI assistant: describing intent in natural language, then steering, refining, and verifying the result.
WCAG Web Content Accessibility Guidelines
A set of recommendations for making web content accessible. WCAG AA is the typical legal and contractual baseline.
Appendix E

A Bibliography

Books and works that have shaped the practice described here.

A curated, not exhaustive, list. The works below have stood up to multiple re-readings and continue to repay attention. Few are about AI-assisted development directly; nearly all are about the older disciplines that the new tools amplify rather than replace.

i.Engineering Practice

  • Michael Feathers, Working Effectively with Legacy Code. The indispensable handbook for changing code you do not understand.
  • Andrew Hunt & David Thomas, The Pragmatic Programmer. Sound, durable advice that survives every wave of new tooling.
  • John Ousterhout, A Philosophy of Software Design. A short, opinionated, and unusually clear treatment of complexity.
  • Steve McConnell, Code Complete. Dated in details, evergreen in principles.
  • Martin Kleppmann, Designing Data-Intensive Applications. The reference for how modern data systems actually work.

ii.Systems & Reliability

  • Betsy Beyer et al. (Google), Site Reliability Engineering. The book that named and codified modern operational discipline.
  • Sam Newman, Building Microservices. A careful, sometimes-skeptical guide to distributed systems.
  • Michael Nygard, Release It! Stability patterns, anti-patterns, and the operational view of architecture.
  • Martin Fowler, Patterns of Enterprise Application Architecture. The vocabulary much of the industry still uses.

iii.Testing & Quality

  • Gerard Meszaros, xUnit Test Patterns. Exhaustive vocabulary for the patterns and pitfalls of automated testing.
  • Steve Freeman & Nat Pryce, Growing Object-Oriented Software, Guided by Tests. A demonstration of TDD as a design discipline.

iv.Security

  • Michal Zalewski, The Tangled Web. A clear-eyed tour of how web security actually fails.
  • Dafydd Stuttard & Marcus Pinto, The Web Application Hacker's Handbook. Encyclopedic on attack surfaces.

v.Career & Leadership

  • Camille Fournier, The Manager's Path. The most-recommended book for engineers stepping into leadership for good reason.
  • Will Larson, An Elegant Puzzle and Staff Engineer. Lucid on technical leadership at multiple levels.
  • Tom DeMarco & Timothy Lister, Peopleware. Old, prescient, still right about most of what matters in teams.

vi.Foundations & Wider Reading

  • Harold Abelson & Gerald Jay Sussman, Structure and Interpretation of Computer Programs. Forty years old; still teaches more about programs than most modern works.
  • Fred Brooks, The Mythical Man-Month. The classic on the human limits of software development.
  • Donald Knuth, The Art of Computer Programming. Not a manual; a monument. Consulted rather than read.

vii.On the AI Era Specifically

  • Anthropic and OpenAI documentation. The most authoritative source on the capabilities of current models is the model providers themselves.
  • The roadmap.sh roadmaps. Living maps of the modern practitioner's terrain. The Vibe Coding roadmap inspired this book.
  • Selected blog posts and conference talks. Far more up-to-date than any book on this fast-moving topic. The serious practitioner curates a personal feed of voices worth following.
A library is not a list of books one has read; it is a list of books one returns to. Build the second kind.
Appendix F

Three Worked Examples

The practice, end to end, in three realistic scenarios.

These three examples sketch the practice in three modes most engineers spend most of their time in: greenfield, brownfield, and operational. They are compressed; the actual sessions took longer and contained more dead ends than any narrative can show. Read for the rhythm, not the line count.

Example i · Greenfield
A Small URL Shortener, Zero to Deployed
An afternoon's work, performed with discipline.
Phase 1 · Planning≈5 minutes, fresh chat

The prompt: "I want to build a tiny URL shortener as an internal service. Single-team scale, ~100 short links per day. Before any code, propose a minimum viable spec, three implementation choices with trade-offs, and a recommended stack."

The AI returned a sensible plan. I chose a single FastAPI service with SQLite storage, deployed on a small VM. Wrote a one-page ADR documenting why.

Phase 2 · Scaffolding≈20 minutes

Created CLAUDE.md with the stack, conventions, "no new dependencies without permission," and the test framework choice. Asked the AI to scaffold the project structure, a Dockerfile, and a basic config module. Reviewed every line, committed.

Phase 3 · Core endpoints≈40 minutes

A TDD loop: failing test for POST /shorten, then the implementation. Failing test for GET /{code}, then the implementation. Each pair a separate commit with a clear message proposed by the AI and reviewed by me.

Phase 4 · Observability≈30 minutes

Added structured logging via the standard library. Added Prometheus metrics for the four golden signals. Added a /health endpoint that actually queries the database. Each step prompted, reviewed, tested, committed.

Phase 5 · CI≈15 minutes

GitHub Actions: lint, type-check, test, container build. Pre-commit hooks for formatting and secret scanning. The AI drafted the workflow; I read every step.

Phase 6 · Deploy≈15 minutes

Single VM behind a reverse proxy. A feature flag wrapping the new endpoints — overkill at this scale, but rehearsing the discipline. Documented rollback procedure. Pushed.

Phase 7 · Post-deploy monitoring≈24 hours of casual observation

Watched the dashboards. Caught one bug — a race condition under concurrent writes against SQLite. Wrapped the write path in a transaction. Wrote a short ADR documenting the scale limit and what would trigger a move to Postgres.

Roughly 2.5 hours of work; ~600 lines of code; observable, tested, deployed. The AI wrote perhaps seventy percent of the lines. Every decision — what to build, what to skip, what to commit to — was mine. The discipline made the speed safe.
Example ii · Brownfield
Adding Observability to a Legacy Service
A week of careful, reversible changes.
Phase 1 · Understanding≈1 hour

An eight-year-old internal service in Python 3.7. Sparse logging. No metrics. The owning team is small and reluctant to touch it. I used the Legacy Walkthrough prompt (Appendix A, vi). The AI summarized the module structure, identified six entry points, flagged three risky areas. I read along, checking each claim against the actual code.

Phase 2 · Characterization tests≈2 hours

For the three risky areas, I wrote tests that pinned existing behavior — including a few behaviors I suspected were bugs but did not yet want to change. These tests would catch regressions during instrumentation. I did not modify any production code in this phase.

Phase 3 · Design≈30 minutes

A short written proposal: structured logging via the standard library, Prometheus metrics through the existing internal library, distributed tracing deferred to a later effort. Sent to the owning team. Got approval with two small comments.

Phase 4 · Instrumentation in small commits≈3 hours over two days

Logging first, one commit per logical area. Then metrics. Each commit ran the existing test suite plus the new characterization tests, was deployed to staging, and observed. No batch changes. No "while I'm here" improvements.

Phase 5 · Staging soak24 hours

Watched the new dashboards. Caught one instrumentation bug — wrong label cardinality on a counter, which would have created a metric explosion in production. Fixed before any production change.

Phase 6 · Production rolloutOne week, gradual

Feature-flagged the change with the highest risk of high log volume. Ramped traffic gradually over four days. By the end of the week the service was observable. The owning team was happier than they expected to be.

Going slow was the only way to go fast. The AI was indispensable for understanding the unfamiliar code and for writing the instrumentation itself. The decisions about what to instrument, in what order, with what risk controls, were entirely human.
Example iii · Operational
An Incident, From Page to Postmortem
A Sunday afternoon, compressed into three hours.
Phase 1 · Triage≈15 minutes

A page: elevated error rate on the checkout service. Customers seeing failures. I pulled up dashboards: errors spiking, latency normal, traffic normal. I asked the AI: "Given this dashboard pattern — errors up, latency steady, traffic steady — what are the five most plausible causes?" The list: downstream dependency failing, recent deploy with bug, certificate expiry, rate limit reached, data anomaly.

Phase 2 · Hypothesis testing≈20 minutes

Recent deploys: a release had gone out two hours before the spike. Strong suspicion. I asked the AI to walk me through the diff. It identified a subtle change in error handling that converted a previously-retried error into a propagated one. Plausible cause located.

Phase 3 · Mitigation≈5 minutes

The documented rollback procedure (Chapter XXII). Rolled back. Error rate returned to baseline within ninety seconds. Stable.

Phase 4 · Root cause confirmation≈30 minutes

With production stable, dug into why the change had the observed effect. Confirmed: a third-party API had been returning intermittent 503s for weeks, previously absorbed by retry logic, now exposed by the change. The deploy had not introduced a new bug — it had stripped the protection that hid an existing one.

Phase 5 · Real fix≈60 minutes

Wrote a failing test that reproduced the bug under simulated upstream flakiness. Implemented the fix: retry with backoff plus a circuit breaker on the third-party dependency. Verified the test passed. Verified the original change's intent — better visibility into a specific error path — was preserved. Shipped on Monday morning, not Sunday night.

Phase 6 · Postmortem≈1 hour, with AI assistance

Used a standard postmortem template. The AI helped draft the timeline from logs. It also helped articulate three action items: monitor third-party error rates explicitly, add a contract test for the retry semantics, add a runbook for this specific failure mode. I reviewed and edited every sentence. Shared with the team Monday.

The AI was a force multiplier at every phase — hypothesis generation, code understanding, test writing, postmortem drafting. But every decision (rollback versus fix-forward; what to test; what action items to commit to) was mine. The on-call engineer's judgment was the limiting reagent. The AI made each step faster without changing what the steps were.
✦ ✦ ✦