A Field Guide to Modern Software Concepts

Chapter the First

Schema

A schema is a blueprint. It does not contain the data; it describes the shape the data must take.

Imagine you are building a paper form for new library patrons. Before any patron arrives, you decide which boxes the form will have — name (text), age (a number), has_card (yes or no). That blueprint is the schema. The forms patrons fill out are the data. The blueprint exists once; the filled forms exist by the thousand.

In software, a schema describes the structure, the types of each field, and the rules (which fields are required, what range a number may fall in). It is enforced by databases, by APIs, and by validation libraries — each rejecting any record that does not fit.

Figure 1.1 — The schema as blueprintinteractive

User schema

idint · required

namestring · required

emailstring · email format

ageint · 0–120

activebool

click a field A schema entry has three parts: a name, a type (what kind of value), and optional constraints (required, range, format). Click any row in the blueprint to inspect it.

Use cases

Database tables. SQL columns each declare a type and nullable flag — that is the table's schema.
API contracts. OpenAPI / Swagger documents are schemas describing what each endpoint expects and returns.
Form validation. Front-end libraries (Zod, Yup, Pydantic) check user input against a declared schema before submission.
Message buses. Kafka with Avro/Protobuf uses schemas so producers and consumers agree on payload shape.

In your TargetLink/AUTOSAR work, the DataType definitions and PlatformTypes are exactly schemas — they constrain what bit-width, sign, and range a signal may carry, and the build chain refuses code that violates them.

References

Date, C. J. An Introduction to Database Systems, 8th ed. — chapter on relational schemas.
OpenAPI Initiative. openapis.org — the de facto API schema specification.

Chapter the Second

JSON

A simple, plain-text language for writing down structured information that humans can read and any computer can parse.

JSON — JavaScript Object Notation — uses six building blocks: strings, numbers, true / false / null, arrays (in [ ]), and objects (in { }, holding key–value pairs). That is the entire grammar. Despite its tiny vocabulary, almost every web service in the world speaks it.

JSON has no comments, no dates, no decimals-with-units, no functions. Its great virtue is precisely this poverty: every language can read it the same way.

Figure 2.1 — A JSON document, expandableinteractive

Use cases

Configuration files. package.json, tsconfig.json, VS Code settings.
API payloads. The body of nearly every REST request and response is JSON.
Logging. Structured logs (one JSON line per event) flow into Splunk, Datadog, ELK.
LLM tool calling. Function arguments are passed as JSON.

JSON cannot represent NaN, Infinity, or trailing commas. If your float pipeline emits these, serialization will fail silently or throw. For embedded telemetry, prefer protobuf/MessagePack — JSON is not size- or precision-friendly.

References

RFC 8259 — The JavaScript Object Notation Data Interchange Format.
Crockford, D. json.org — original informal reference.

Chapter the Third

API

An Application Programming Interface is a contract: send this kind of request, receive that kind of reply. Everything else is hidden.

If a restaurant kitchen is a system, the menu is its API. You do not enter the kitchen. You read the menu (the contract), you place an order (a request) at the agreed counter (the endpoint), and a meal (the response) comes out. The chef may switch ingredients, hire new cooks, or rebuild the stove. As long as the menu is honored, you do not care.

Modern web APIs typically use HTTP with verbs (GET, POST, PUT, DELETE), a URL identifying the resource, optional headers (auth, content-type), and a JSON body. The reply is a status code (200 OK, 404 not found, 500 error) plus a body.

Figure 3.1 — A simulated API callinteractive

GET

Request—

GET /api/users/42 HTTP/1.1
Host: example.com
Accept: application/json

Response—

(awaiting request)

Try also /api/users/999 (not found) and /api/posts (a list).

Use cases

Calling a weather service. GET https://api.weather.com/v1/now?lat=… returns JSON.
Internal microservices. A checkout service calls the inventory service via API instead of sharing a database.
LLM access. Chat completions (OpenAI, Anthropic, Mercedes' GenAI Nexus) are exposed as HTTP APIs.
Hardware bridges. A vehicle's CAN-to-cloud gateway exposes an API that mobile apps consume.

"API" is also used for language APIs (the public functions of a library) and OS APIs (system calls). The contract idea is the same; only the transport differs.

References

Fielding, R. — Architectural Styles and the Design of Network-based Software Architectures, 2000 (the REST dissertation).
MDN Web Docs — HTTP overview.

Chapter the Fourth

Parser & Parsing

Parsing is the act of turning a flat stream of characters into a structured tree the computer can act on.

When you read the sentence "The cat sat on the mat," your brain unconsciously identifies the subject, the verb, and the prepositional phrase. A parser does the same for code or data: it takes raw text and recovers the structure hidden inside it.

Parsing usually happens in two passes. First the lexer (or tokenizer) chops the text into atomic tokens — numbers, identifiers, operators. Then the parser arranges those tokens according to grammar rules into an Abstract Syntax Tree (AST) — the structured form on which evaluators, compilers, and linters then operate.

Figure 4.1 — Lexing and parsing an arithmetic expressioninteractive

1 · Tokens (output of the lexer)

2 · Abstract Syntax Tree

3 · Evaluated result

—

Use cases

Compilers and interpreters. Every programming language has a parser at its front-end.
Data formats. JSON.parse(...), XML parsers, CSV parsers — all turn text into objects.
Configuration languages. YAML, TOML, INI files are parsed into config objects at boot.
HTML / DOM. The browser parses your HTML into a tree before rendering.

Parsing untrusted input is a frequent attack surface. Prefer battle-tested libraries; never write a JSON or YAML parser by hand for production.

References

Aho, Lam, Sethi, Ullman — Compilers: Principles, Techniques, and Tools ("the dragon book"), 2nd ed.
Crafting Interpreters — craftinginterpreters.com, an excellent free book by Bob Nystrom.

Chapter the Fifth

Regular Expressions

A tiny language whose only purpose is to describe text patterns: "a digit followed by two letters", "an email address", "anything between two quotes".

A regular expression — regex — is a string in which most characters mean themselves but a few have superpowers: . matches any character, * means "zero or more of the previous", + means "one or more", \d matches digits, [a-z] matches a range, ^ and $ anchor to start and end, and parentheses capture groups for later use.

Regex is dense. A pattern that takes ten minutes to write may take an hour to read. But for jobs like extracting all phone numbers from a document, no other tool is so concise.

Figure 5.1 — Live regex testerinteractive

phone numbers emails URLs capitalised words hashtags / refs

Use cases

Form validation. "Is this string a valid postcode?"
Log mining. Pull out every IP address in a 50 MB log file.
Find & replace. IDE search with regex turns repetitive edits into one-liners.
Bulk renaming. Strip prefixes, normalise dates, reorder filename parts.

Two cautions. One: never parse HTML with regex — it is not a regular language. Two: certain patterns (nested (a+)+b) cause catastrophic backtracking and can hang a server. Test on adversarial input.

References

Friedl, J. — Mastering Regular Expressions, 3rd ed., O'Reilly.
regex101.com — interactive tester with explanations.

Chapter the Sixth

JSON Mode

A switch on a Large Language Model that forces its reply to be a single, syntactically valid JSON document — never prose, never markdown, never apology.

Out of the box, an LLM is a free-form storyteller. Ask it for a recipe and you may receive a friendly preamble ("Sure! Here's a great recipe…"), then the recipe in markdown, then a closing remark. Useful to a human; ruinous to a program that tries to JSON.parse() the reply.

JSON mode changes the decoding rule of the model. At every token step, the sampler is constrained so the cumulative output remains valid JSON. The model can no longer wander into prose. It must close every brace it opens.

Figure 6.1 — The same prompt, two output modesinteractive

Prompt: "Extract the order: I need 3 lattes and 2 muffins for table 7."

Notice: the OFF response cannot be parsed by a program. The ON response goes straight into your downstream code.

Use cases

Function / tool calling. The model must produce arguments your code can deserialize.
Structured extraction. Pull entities, quantities, dates from free text into rows for a database.
Pipelines and agents. One model's output is another step's input — JSON keeps the contract.

JSON mode guarantees syntactic validity, not semantic correctness. The model may still emit a string where you wanted a number, or invent a field. Pair JSON mode with a JSON Schema (next chapter) for full safety.

References

OpenAI — Structured Outputs guide.
Anthropic — Tool use documentation.

Chapter the Seventh

JSON Schema

A JSON document whose only job is to describe — and validate — other JSON documents.

If JSON is a written record, a JSON Schema is the official template the record must conform to. The schema declares the expected type of every field ("string", "integer", etc.), which fields are required, what format a string must obey (email, URI, date), what range a number must lie in, and even how items inside an array should look.

A validator reads the schema, reads the data, and reports every place where data and schema disagree. This same schema then drives form generators, code generators, OpenAPI documentation, and — most importantly — LLM structured-output enforcers.

Figure 7.1 — Validate JSON against a schemainteractive

Schema

Data

(click Validate to check)

Use cases

API gateways. Reject malformed requests at the edge before they touch business logic.
LLM structured output. Constrain a model's tokens so its JSON also satisfies a schema.
Code generation. A schema becomes a TypeScript type, a Pydantic class, or a C struct.
UI auto-forms. Tools render a form straight from the schema, no boilerplate.

JSON Schema validates structure — not the meaning. A schema can guarantee age is an integer between 0 and 120. It cannot guarantee that the integer is the actual age of the person.

References

json-schema.org — the official specification and learning materials.
Pydantic docs — docs.pydantic.dev — schema-driven validation in Python.

Chapter the Eighth

ReAct

A pattern for LLM agents that interleaves reasoning and acting: think a step, take an action, observe the result, think again.

An LLM by itself only knows what is in its weights. Ask it for today's weather and it will guess. ReAct (Yao et al., 2022) lets the model break out of its head: at each turn it may produce a Thought (private reasoning), an Action (a call to a tool — search, calculator, database, API), and then read an Observation from the tool. It loops until it can give a final Answer.

This single cycle — Thought → Action → Observation → Thought → … → Answer — is the engine behind most modern AI agents, including the one that wrote this page.

Figure 8.1 — A ReAct trace, played step by stepinteractive

User: "What is the population of Japan multiplied by 2?"

Use cases

Research agents. Search → read → search → summarise.
Code assistants. Plan → run terminal → read output → fix → repeat.
Customer support bots. Look up account, check policy, draft reply.
Diagnostic copilots. In Vehicle Health Management, an LLM can reason over DTCs, query a fault database, then propose a root cause — a natural ReAct loop.

ReAct is more reliable than pure chain-of-thought because each Action grounds the reasoning in a real-world observation. But it can also fail loudly — a confused agent will repeat the same wrong action many times. Always cap the loop and log every step.

References

Yao, S. et al. — ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023. arXiv:2210.03629
LangChain ReAct docs — python.langchain.com

Chapter the Ninth

Tailwind

A CSS framework that gives you thousands of tiny, single-purpose classes — and asks you to compose them, in your HTML, into any design you like.

Most CSS frameworks (Bootstrap, Material UI) ship pre-made components — a "Card", a "Button" — each with its own opinions. Tailwind CSS does the opposite. It ships utility classes: p-4 means padding 1 rem, text-xl means large font, rounded-lg means large border radius, bg-blue-500 means medium-blue background. You build the component yourself by stringing utilities together, in the markup, exactly where the design lives.

The promise: no more renaming CSS classes, no more "button-primary-large-disabled-rounded" spaghetti. The cost: HTML can grow visually noisy. Most teams accept the trade-off.

Figure 9.1 — Compose utilities to style a cardinteractive

A small card.
Toggle classes below to restyle me.

background bg-white bg-amber-100 bg-slate-800

text colour text-slate-900 text-rose-600 text-amber-50

padding p-2 p-6 p-12

radius none md 2xl full

shadow none md 2xl

font size sm lg 2xl

class="…"

Use cases

Rapid prototyping. Designs ship in hours, not days; redesign without renaming.
Design systems. Custom themes via the Tailwind config file (one file, all tokens).
Shadcn/UI & co. Modern component libraries are written entirely in Tailwind.

For complex, repeated patterns, extract to a component (in React/Vue/Svelte) — do not paste the same 30-class string fifty times. Tailwind's strength is in utility composition, not utility duplication.

References

tailwindcss.com — official documentation.
Wathan, A. — Refactoring UI, an excellent companion book by Tailwind's creator.

Chapter the Tenth

BM25 Keyword Search

A scoring formula from the 1990s that, despite its age, still beats almost every neural search method when the user types a few keywords and expects exact matches.

The intuition is simple. A document deserves a high score for a query if (1) the query words appear in it, (2) the words are rare in the corpus (so they are informative — "the" is useless, "axial-flux" is golden), and (3) the document is not too long (a long document mentions everything; a short, focused one mentioning your terms is more likely to be on-topic).

BM25 — Best Match 25 — combines these three signals with two tuning knobs (k₁ for term saturation and b for length normalization). It is the default ranker in Elasticsearch, Lucene, and OpenSearch, and the keyword half of nearly every modern hybrid retrieval pipeline.

Figure 10.1 — A miniature BM25 index over five sentencesinteractive

axial flux machine motor vehicle health diagnostic the (stop word)

Use cases

Site search. Documentation portals, support knowledge bases, e-commerce.
Hybrid RAG. BM25 retrieves keyword-precise hits; vector search retrieves semantic hits; results are fused (RRF).
Code search. Identifiers and rare symbols favour exact-keyword retrieval.

BM25 cannot tell that "car" and "automobile" are related — that is the job of embeddings. For systems where users phrase questions in their own words, combine BM25 with a vector retriever rather than choosing between them.

References

Robertson, S. & Zaragoza, H. — The Probabilistic Relevance Framework: BM25 and Beyond, 2009.
Elastic docs — BM25 similarity.

Chapter the Eleventh

Tokens & Context Window

A language model does not see words. It sees tokens — small chunks of letters — and it can only fit a limited number of them in its working memory.

Before any text reaches a transformer, a tokenizer breaks it into pieces. Common words become a single token ("the"); rare or invented words split into several ("unhappiness" → "un" + "happi" + "ness"). Punctuation, spaces, and emojis each cost something. A useful rule of thumb in English: about 4 characters per token, or roughly 0.75 tokens per word.

The context window is the maximum number of tokens a model can attend to at once: prompt plus history plus generated reply. Older models held 4k; modern frontier models hold 200k–1M. Exceed it and the oldest tokens fall off the back of a moving train.

Figure 11.1 — A simplified tokenizerinteractive

0 tokens 0 chars 0 chars/token (this is a heuristic split — real BPE tokenizers are learnt)

Use cases

Cost & speed. Most APIs price per token. A 100-page PDF in your prompt is not free.
Truncation strategy. When a chat exceeds the window, decide what to drop, summarise, or move to retrieval.
Long-context tradeoffs. Even with a 200k window, performance degrades on the middle of the prompt ("lost in the middle").

Token counts are language-dependent. The same paragraph in Japanese or German often costs 1.5–2× more tokens than in English. For your GenAI Nexus integrations, instrument the token counter early; cost surprises always come from there.

References

Sennrich, R. et al. — Neural Machine Translation of Rare Words with Subword Units, ACL 2016 (BPE).
OpenAI — interactive tokenizer.

Chapter the Twelfth

Embeddings

A way to turn any piece of text into a list of numbers — a vector — such that texts with related meaning land near each other in space.

Imagine assigning every English word a coordinate in a high-dimensional map. Words that play similar roles in similar contexts (king, queen, monarch) get neighbouring coordinates; unrelated words (queen, asphalt) land far apart. That assignment is an embedding. Modern embeddings live in 768- or 1536-dimensional space, but the idea is the same: distance encodes meaning.

The clever bit is that the same trick works for sentences, paragraphs, even images. Once you have vectors, you can ask the question every search engine secretly wants to ask: find me the things most similar to this.

Figure 12.1 — A 2-D embedding map · click a word to see its neighboursinteractive

click any point above ↑

Use cases

Semantic search. Find documents whose meaning matches a query, even when no keyword overlaps.
Clustering. Group customer feedback, support tickets, or research papers automatically.
Recommendation. "Other things similar to what you liked."
Deduplication. Catch near-duplicate posts that lexical hashing would miss.

Embeddings inherit their model's biases. If the training corpus encodes a stereotype, the vector space encodes it too. Audit before deploying in hiring, lending, or moderation.

References

Mikolov, T. et al. — word2vec, NeurIPS 2013.
Reimers, N. & Gurevych, I. — Sentence-BERT, EMNLP 2019.

Chapter the Thirteenth

Vector Search & Cosine Similarity

Once everything is a vector, "search" becomes "find the vectors closest to mine" — usually measured by the cosine of the angle between them.

Cosine similarity ignores how long the vectors are and asks only: do they point in the same direction? Two vectors pointing the same way score 1; perpendicular ones score 0; opposite ones score −1. For text embeddings (which are usually L2-normalised), cosine and dot-product give the same ranking.

A naive search compares the query to every vector — fine for thousands, painful for millions. Production systems use approximate nearest-neighbour indexes (HNSW, IVF, ScaNN) that trade a sliver of recall for orders-of-magnitude speed.

Figure 13.1 — Cosine-ranked retrieval over a tiny corpusinteractive

cars drive themselves train a neural network healthy recipe motor controller

Use cases

RAG retrievers. Pull the top-k chunks before answering.
Image / audio search. Same machinery, different modality.
Anomaly detection. Anything far from every cluster centroid is suspicious.

Cosine cannot tell synonyms from opposites reliably — both "love" and "hate" appear in similar emotional contexts. Combine with metadata filters and (often) a reranker.

References

Malkov, Y. & Yashunin, D. — Efficient and robust approximate nearest neighbor search using HNSW, 2018.
Pinecone, Weaviate, Qdrant, FAISS — open and managed vector DB documentation.

Chapter the Fourteenth

Chunking

Documents are too long to embed whole and too long to feed to a model whole. Chunking is the unglamorous craft of slicing them into the right-sized pieces.

You cannot embed a 200-page PDF as one vector — you would lose all locality. So you cut. Naive cuts at fixed character counts can split a sentence in half and bury the answer across two chunks. Better cuts respect structure: paragraph boundaries, sentence boundaries, headings. Better still, an overlap of a few sentences between adjacent chunks ensures context near the seam is never lost.

Three knobs matter: chunk size (typically 200–800 tokens), overlap (10–20%), and splitter strategy (recursive by paragraph → sentence → word).

Figure 14.1 — Slide the size and overlap to see how chunks changeinteractive

chunk size 120 overlap 20

Use cases

RAG ingestion pipelines. Every PDF, Confluence page, ticket gets chunked before embedding.
Code search. Chunk by function or class, not by line count.
Summarisation. Map-reduce style: summarise each chunk, then summarise the summaries.

There is no universally best chunk size. Test on your queries, your model, and your evaluation set. A common antipattern is over-chunking: 50-token slivers that lose all context.

References

LangChain text splitters — documentation.
Pinecone — chunking strategies.

Chapter the Fifteenth

Retrieval-Augmented Generation

Instead of asking a model to answer from memory, fetch the relevant facts first and paste them into the prompt. The model becomes an open-book student.

An LLM trained last year does not know your codebase, your customers, or yesterday's meeting notes. RAG bridges the gap. At query time the system retrieves the most relevant chunks from a vector index (and often a keyword index), augments the prompt with them, and asks the model to answer using that context. The model's job changes from recall to read-and-respond.

Done well, RAG cuts hallucination, gives citations, and lets you update knowledge without retraining. Done poorly, it serves wrong chunks confidently.

Figure 15.1 — A RAG pipeline, animatedinteractive

1 · query

—

→

2 · retrieve

—

→

3 · augment

—

→

4 · generate

—

Use cases

Internal knowledge assistants. Your WarpDrive RAG assistant is exactly this pattern.
Customer support. Answer from the product manual, not from training data.
Code-aware copilots. Retrieve the actual repository before suggesting a change.
Diagnostic copilots. Pull recent fault logs into the context before reasoning about root cause.

RAG is only as good as its retriever. Spend at least as much time on retrieval evaluation (precision@k, recall@k) as on prompt engineering — the prompt cannot fix bad chunks.

References

Lewis, P. et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020. arXiv:2005.11401
Anthropic — Contextual Retrieval blog post.

Chapter the Sixteenth

Reranking

A two-stage strategy: cast a wide net cheaply, then re-sort the catch carefully. Most production search uses both.

Vector search is fast but coarse. It returns 100 plausible chunks in milliseconds. A reranker — typically a smaller cross-encoder model that reads query and candidate together — then assigns each one a precise relevance score. You discard the bottom 95 and keep the top 5.

Why two stages? A cross-encoder is too slow to run over a million documents, but a vector index is too crude to put the truly best result on top. The combination delivers both recall and precision.

Figure 16.1 — Click "Rerank" to watch the order shuffleinteractive

Query: "how to enforce torque limit safely"

Use cases

RAG quality boost. Often the single largest improvement to retrieval is adding a reranker.
Web search. Google's pipeline is a giant cascade of progressively pricier rerankers.
Recommendation feeds. Candidate generation → ranking → re-ranking with business rules.

Cross-encoder rerankers (Cohere Rerank, BGE-rerank) read query + chunk together and are more accurate than embedding similarity, but they cost roughly one model call per candidate. Cap the candidate set.

References

Nogueira, R. & Cho, K. — Passage Re-ranking with BERT, 2019.
Cohere — Rerank documentation.

Chapter the Seventeenth

Hybrid Search & Reciprocal Rank Fusion

BM25 is great at exact matches. Vector search is great at meaning. Combine them with a single elegant formula and you outperform either alone.

Hybrid search runs both retrievers in parallel, then merges the lists. The simplest, most robust merger is Reciprocal Rank Fusion (RRF):

score(d) = Σ_r 1 / (k + rank_r(d))

For each retriever r, you take 1 over (a small constant k, typically 60, plus the rank of the document in that retriever's list). Sum across retrievers. Documents ranked highly by either method bubble up; documents ignored by both stay low. No score normalisation needed.

Figure 17.1 — Two rankers fused into oneinteractive

Query: "axial flux NVH harmonic injection" · k = 60

BM25

Vector

Fused (RRF)

Use cases

Enterprise search. Mix of jargon (BM25 wins) and natural-language questions (vectors win).
Code search. Identifier names need exact match; intent needs semantics.
RAG over technical documentation. Hybrid + reranker is the modern default.

RRF is rank-based, so absolute scores from heterogeneous retrievers do not need to be on the same scale — a key practical advantage. Other fusion methods (linear combination, learned-to-rank) require careful score calibration.

References

Cormack, G. et al. — Reciprocal rank fusion outperforms Condorcet and individual rank learning methods, SIGIR 2009.
Elasticsearch — RRF reference.

Chapter the Eighteenth

Function (Tool) Calling

Hand the model a list of functions it may call, with their JSON schemas, and let it decide which one — and with what arguments — best answers the user.

Function calling (also called tool use) is the structural foundation of every modern AI agent. You describe each tool with a name, a description, and a JSON schema for its parameters. The model, when it judges a tool call necessary, emits a JSON blob naming the tool and supplying its arguments. Your code receives the blob, runs the function, returns the result, and the model continues.

This is more disciplined than JSON mode: JSON mode controls format, function calling controls which function.

Figure 18.1 — Watch the model pick a tool and fill its argumentsinteractive

Available tools

get_weather(city) search_orders(user, date_range) convert_currency(amount, from, to) send_email(to, subject, body)

currency conversion order search email no tool needed

Use cases

Agents. Every multi-step agent is a loop of tool calls.
Voice assistants. Translate "set a 10-minute timer" into setTimer({minutes:10}).
Database access. Constrain an LLM's interaction with your DB to a small, audited set of read tools.

Always validate the model's tool arguments against your real schema before executing — even "structured" output can hallucinate. Treat tool calls as untrusted input from a junior intern.

References

Anthropic — Tool use guide.
OpenAI — Function calling guide.

Chapter the Nineteenth

Chain of Thought

Ask a model to "think step by step" before answering, and on hard problems its accuracy jumps — sometimes dramatically.

Chain-of-thought (CoT) prompting nudges a model to produce its intermediate reasoning out loud — list assumptions, do arithmetic, work the problem in stages — before stating the final answer. Wei et al. (2022) showed this single change can lift performance on math and reasoning benchmarks by tens of percentage points, especially in larger models.

CoT differs from ReAct: CoT is reasoning only (no tool calls), while ReAct interleaves reasoning with actions. Modern reasoning models (o1, o3, Claude with extended thinking) bake CoT into their decoding so you no longer have to ask.

Figure 19.1 — The same word problem, with and without CoTinteractive

Question: "A motor draws 12 A for 8 hours and 3 A for 16 hours each day. What is its average daily current?"

Use cases

Math & arithmetic. The original killer app — multi-step word problems.
Code reasoning. "Trace through this function with input X" works better than "what does it output."
Decision rationales. Make the model's choice auditable; reasoning becomes an artifact.

CoT can also hurt on simple lookup tasks (a "let me think" preamble for "what is the capital of France?" wastes tokens). Modern systems route CoT only when the question warrants it.

References

Wei, J. et al. — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022. arXiv:2201.11903
Kojima, T. et al. — Large Language Models are Zero-Shot Reasoners ("Let's think step by step"), NeurIPS 2022.

Chapter the Twentieth

Model Context Protocol

An open standard — championed by Anthropic — that lets any AI client talk to any tool or data source through a single, uniform protocol. USB-C for LLMs.

Without a standard, every AI app re-implements its own tool wiring: a custom GitHub plugin, a custom Slack plugin, a custom database plugin. MCP (Model Context Protocol) defines a small client–server protocol where AI applications (the host) speak to MCP servers that each expose tools, resources, and prompts. Build the server once; every MCP-aware host can use it.

The protocol carries three primitives. Tools are functions the model can invoke. Resources are read-only data the host may inject into context. Prompts are reusable templates servers offer to clients.

Figure 20.1 — Host, servers, and the world they exposediagram

The host knows nothing about GitHub, files, or Postgres directly. It only knows MCP. Each server translates between MCP and its native protocol.

Use cases

IDE integrations. Cursor, Claude Code, Zed all consume MCP servers.
Internal tooling. Expose your knowledge base or telemetry as an MCP server; every assistant can use it.
Reusable agents. An agent built against MCP works with any compliant host.

MCP servers run with the host's permissions. Treat installing one like installing a browser extension — review the source, prefer official servers, and constrain the tools each server exposes.

References

Anthropic — modelcontextprotocol.io — the spec, SDKs, and server registry.
Anthropic — Introducing the Model Context Protocol, Nov 2024.

Chapter the Twenty-First

HTTP Verbs & Status Codes

A handful of verbs describe what you want to do, and a three-digit status code describes what happened. Most of the web is built on these.

The verb signals intent: GET reads, POST creates, PUT replaces, PATCH partially updates, DELETE removes. The server reads the verb and the URL, does its work, and returns a status code grouped by family: 2xx success, 3xx redirection, 4xx client error (your fault), 5xx server error (their fault).

Verbs and codes carry a contract beyond their literal action. GET is meant to be safe and cacheable; repeating it must not change state. PUT and DELETE are meant to be idempotent (chapter 27).

Figure 21.1 — Click any cell for an explanationinteractive

click a cell ↑

Use cases

REST API design. Map resources to URLs and operations to verbs.
Debugging. A 401 vs 403 vs 404 reveals different bugs.
Monitoring. Alert on 5xx rate, not on raw error count.

Beware the lazy 200 OK with {"error": "..."} in the body. Status codes exist precisely so HTTP infrastructure (proxies, retries, browser dev tools) can reason about success without parsing JSON.

References

RFC 9110 — HTTP Semantics.
MDN — Status code reference.

Chapter the Twenty-Second

OAuth & JWT

Two patterns for proving identity to an API without re-typing a password on every request. OAuth is the dance; JWT is the badge it issues.

OAuth 2.0 is the protocol behind every "Sign in with Google" button. The user authenticates once with the identity provider; the application receives a short-lived access token instead of the user's password. Subsequent API calls present the token as a Bearer header.

That token is often a JWT (JSON Web Token) — three Base64-encoded parts separated by dots: a header (the algorithm), a payload (claims about the user), and a signature the server can verify without a database lookup. Anyone can read the payload; only the issuer, with the secret, could have signed it.

Figure 22.1 — Decode a JWTinteractive

header

payload

The signature ensures nobody tampered with the payload. The contents are not encrypted — never put secrets in a JWT.

Use cases

API authentication. Stateless services validate JWTs without round-trips to a session store.
Single sign-on. One login, many apps, via OpenID Connect on top of OAuth.
Service-to-service. Machine identities use OAuth client-credentials with short tokens.

Two bugs cause most JWT incidents: forgetting to verify the signature ("alg=none" attack) and giving tokens long lifetimes. Verify always; expire fast; refresh.

References

RFC 6749 — OAuth 2.0; RFC 7519 — JSON Web Token.
jwt.io — full-featured online debugger.

Chapter the Twenty-Third

WebSocket vs Polling

Two ways to keep a client up to date: ask the server repeatedly ("polling"), or open one persistent line and let the server speak whenever it has news.

Polling is HTTP business as usual: every few seconds the client sends "any updates?" and the server answers yes or no. Simple, firewall-friendly, but wasteful when nothing changes — and laggy by definition (you only learn at the next poll).

WebSockets open a single TCP connection that stays alive. Either party may push a message at any time, with no per-message HTTP overhead. The cost: stateful connections, harder load-balancing, harder horizontal scaling.

Figure 23.1 — Compare network chatter side by sideanimation

Polling: regular requests · WebSocket: events on demand

Use cases

Chat & collaboration. Slack, Figma, Google Docs — WebSockets.
Live dashboards. Trading apps, observability tools.
Polling still wins for simple status checks and long-tail clients behind hostile firewalls.

A middle ground is Server-Sent Events (SSE): one-way push from server to client over plain HTTP. Simpler than WebSockets and increasingly used to stream LLM tokens.

References

RFC 6455 — The WebSocket Protocol.
MDN — Server-Sent Events.

Chapter the Twenty-Fourth

Webhooks

Instead of polling someone else's API every minute, give them a URL of yours; they'll POST to it whenever something interesting happens. Reverse APIs.

A webhook is a normal HTTP endpoint you host. You register its URL with a third-party service (Stripe, GitHub, Slack), and from then on the service pushes events to you the moment they occur. Your endpoint receives a JSON payload, returns 200 OK, and gets back to its life.

Webhooks invert the usual control flow: you become the server-of-events. They are the lightest possible event bus across the public internet.

Figure 24.1 — Trigger an event and watch it arriveinteractive

Stripe

payment provider

→

POST /hook
JSON body

Your server

your-app.com/webhooks/stripe

Use cases

Payments. Stripe / PayPal notify you when a charge succeeds, fails, or is disputed.
VCS. GitHub posts to your CI when a branch is pushed.
Messaging. Slack posts to your bot endpoint when a user mentions it.

Always verify webhook signatures (HMAC over the body) before trusting the payload. Anyone who learns the URL can otherwise replay or forge events. And design your handler to be idempotent — providers retry on 5xx.

References

Stripe — Webhooks documentation (the canonical implementation).
Svix — Standard Webhooks draft spec.

Chapter the Twenty-Fifth

CORS

A browser security rule that, by default, forbids JavaScript on one website from calling APIs on another. The server must explicitly opt in.

Cross-Origin Resource Sharing (CORS) protects you from a malicious page secretly making authenticated calls to your bank in your name. The browser enforces a "same-origin policy": JavaScript loaded from foo.com may freely call foo.com, but to call bar.com it needs bar.com's permission, expressed in response headers like Access-Control-Allow-Origin.

The notorious "CORS error" you see in the console is not a bug in the browser — it is the browser doing exactly the job it is paid for. The fix is on the server you are calling, not on yours.

Figure 25.1 — Will the browser allow this request?interactive

page origin API host

server allows

Use cases

Frontend / backend split. Your SPA at app.foo.com calls your API at api.foo.com — explicit allow-list needed.
Public APIs. Set Access-Control-Allow-Origin: * for read-only, non-credentialed endpoints.

CORS does not protect your server from anyone. It protects users' browsers from malicious cross-site requests. Server-side requests (curl, Postman, your own backend) are unaffected — and unauthenticated.

References

MDN — Cross-Origin Resource Sharing.
WHATWG — Fetch standard.

Chapter the Twenty-Sixth

Caching

Storing the answer to a question so the next person asking the same question gets it instantly. Most performance work, in the end, is the right cache in the right place.

A request travels through layers, and at every layer something might already have the answer. The browser's memory cache, the disk cache, a CDN edge node, an application cache (Redis), the database's own buffer pool. A hit at any layer skips the rest. A miss falls through to the next.

Caching is governed by two hard problems, both quoted endlessly: invalidation (when does cached data become stale?) and naming (what is the right key?). Get either wrong and users see yesterday's prices.

Figure 26.1 — Watch a request fall through the cache layersinteractive

Browser

—

CDN

—

App / Redis

—

Database

—

Use cases

HTTP caching. Cache-Control, ETag, Last-Modified headers.
Application caching. Redis, Memcached for hot keys.
Database caching. Materialized views, query result caches.
LLM prompt caching. Anthropic and OpenAI both let you cache long fixed prompt prefixes.

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Believe it. The bug you cannot reproduce is usually a cache.

References

RFC 9111 — HTTP Caching.
Anthropic — Prompt caching.

Chapter the Twenty-Seventh

Idempotency

An operation is idempotent if doing it twice has the same effect as doing it once. The internet is unreliable; your operations had better be.

Networks drop packets. Clients retry. Without care, a single "charge $50" intent turns into two charges. The cure is the idempotency key: the client invents a unique ID per intent and sends it with every retry. The server records the result against the key, and a second arrival with the same key returns the same result without doing the work again.

GET, PUT, and DELETE are naturally idempotent (re-reading, re-replacing, re-deleting all converge). POST is not — and almost every payment system requires an Idempotency-Key header to make it so.

Figure 27.1 — Press the button many times. Watch what happens.interactive

use idempotency key

charges: 0 total: $0 current key: —

Use cases

Payments. Always idempotent — Stripe, Adyen, Square all require keys.
Webhook receivers. Providers retry on 5xx; deduplicate by event id.
Background jobs. Workers may re-execute after a crash; design for that.

An idempotency key has a TTL. Keep it long enough that legitimate retries match (24h is common) but short enough that it does not leak. And key the cache on the operation and the user — never globally.

References

Stripe — Idempotent requests.
Brandur Leach — Designing robust and predictable APIs with idempotency.

Chapter the Twenty-Eighth

Rate Limiting & Exponential Backoff

Servers protect themselves by capping how often any one client may call. Polite clients, when refused, wait — longer each time — before trying again.

Every public API rate-limits: 100 requests per minute, 10,000 tokens per second. Exceed it and the server returns 429 Too Many Requests, often with a Retry-After header. A client that hammers harder will only be banned faster.

The standard polite response is exponential backoff with jitter: wait b · 2ⁿ seconds before retry n, plus a small random offset to avoid synchronized retries from a thousand clients colliding (the "thundering herd"). It's the universal manners of distributed systems.

Figure 28.1 — A burst, a 429, and the backoff that followsanimation

Limit: 5 requests per 10 seconds. Burst exceeds limit, server replies 429, client backs off (1s, 2s, 4s) before resuming.

Use cases

API consumption. Every SDK worth using implements backoff for you.
Server protection. Token-bucket / leaky-bucket algorithms at the edge.
LLM streaming. Token-per-minute limits common; respect Retry-After.

Always add jitter. Without it, every client that synced their retries by getting limited at the same moment retries again at the same moment. With jitter, retries spread out and the server recovers smoothly.

References

AWS Architecture Blog — Exponential Backoff and Jitter.
RFC 6585 — Additional HTTP Status Codes (introduces 429).

Chapter the Twenty-Ninth

Async / Await & the Event Loop

JavaScript runs on a single thread. Async is the bookkeeping that lets it feel like several at once: start a slow task, do other work, resume when the answer is ready.

The runtime maintains a call stack (currently executing functions), a callback queue (tasks waiting their turn — timer fires, network responses), and a microtask queue (Promise resolutions, almost-immediate). The event loop is the rule: when the stack is empty, drain all microtasks, then take one task from the queue, then repeat.

async/await is sugar over Promises. await suspends the function, frees the stack, and the runtime resumes the function once the awaited Promise settles — typically as a microtask.

Figure 29.1 — Step through a small async programinteractive

console.log("A");
setTimeout(() => console.log("B"), 0);
Promise.resolve().then(() => console.log("C"));
console.log("D");

Call stack

Microtask queue

Task queue

(output appears here)

Use cases

Network I/O. Fetch many URLs in parallel without blocking the UI.
Streams. LLM token streaming uses async iteration.
UI responsiveness. Heavy work goes to Web Workers; main thread handles events.

"Async" is not the same as "parallel." JavaScript with async/await still runs on one thread. True parallelism in browsers requires Web Workers; in Node, worker threads or separate processes.

References

MDN — The event loop.
Lin Clark — "What the heck is the event loop anyway?" (talk).

Chapter the Thirtieth

Race Conditions

When two things happen at almost the same time and the order of their tiny inner steps decides the outcome — sometimes correctly, sometimes catastrophically.

The classic example: two threads each read a counter (it says 5), each add 1, each write back. Done concurrently, the counter ends at 6, not 7 — one increment is lost. The bug is invisible until the day it bites in production.

Cures come in three families. Locks serialize access (mutex, semaphore). Atomic operations bundle read-modify-write into one indivisible step (compare-and-swap). Avoid sharing: actors, message queues, immutable data — no two threads touch the same memory.

Figure 30.1 — Replay the race; observe the lost updateanimation

counter = 0

Thread A

read counter

add 1

write back

Thread B

read counter

add 1

write back

Use cases (well, occurrences)

Double charging. Two clicks while the request is in flight create two orders.
Inventory oversell. Two customers buy the last unit because both checks passed before either sale committed.
UI flicker. Two async fetches resolve out of order; stale data overwrites fresh.

The hardest race conditions are those whose probability scales with load. They pass tests at 1 req/s and corrupt the database at 1000 req/s. Treat any "intermittent" production bug as a race until proven otherwise.

References

Herlihy & Shavit — The Art of Multiprocessor Programming, 2nd ed.
Kleppmann, M. — Designing Data-Intensive Applications, ch. 7 (transactions).

Chapter the Thirty-First

Debounce vs Throttle

Two ways of taming a fire-hose of events into something a server (or a search-as-you-type box) can stomach.

Debounce: wait until the user stops, then fire once. Perfect for search-as-you-type — no point querying after every keystroke when one's coming a millisecond later.

Throttle: fire at most once every N milliseconds, no matter how often events arrive. Perfect for window resize or scroll handlers — you want updates during the action, just not 200 of them per second.

Figure 31.1 — Type fast and watch the counters divergeinteractive

raw events

0

debounced (300ms)

0

throttled (300ms)

0

Debounced: only the last keystroke in a quiet window fires. Throttled: at most one fire per 300ms, regardless.

Use cases

Search-as-you-type. Debounce by 200–400ms.
Scroll / resize. Throttle at 16ms (60fps) or 50ms.
Form auto-save. Debounce to commit only when typing stops.
Telemetry. Throttle to keep volume bounded.

Most utility libraries (lodash, underscore) ship both. If you're writing your own, get the trailing edge right — most users want the last event delivered, not silently swallowed.

References

Lodash — debounce & throttle.
CSS-Tricks — Debouncing and throttling explained.

Chapter the Thirty-Second

YAML & TOML

Two cousins of JSON optimised for humans writing configuration: YAML for indented prose, TOML for clean sectioned files.

YAML ("YAML Ain't Markup Language") uses indentation, hyphens for lists, and colons for key-value pairs. It supports comments, multiline strings, and references — at the cost of subtle whitespace bugs and famously inconsistent boolean parsing (yes, NO, on all once meant booleans). It dominates Kubernetes, GitHub Actions, Ansible, and ML configs.

TOML ("Tom's Obvious, Minimal Language") trades indentation for explicit [sections] and quoted strings. Less expressive, far less ambiguous. The Rust ecosystem (Cargo.toml) and Python's pyproject.toml made it ubiquitous.

Figure 32.1 — The same configuration, three formatsinteractive

Use cases

YAML. Container orchestration, CI/CD pipelines, infrastructure-as-code, ML hyperparameter files.
TOML. Build configs (Cargo.toml, pyproject.toml), simple application settings.
Neither. Exchange between systems → use JSON. Schemas → use JSON Schema.

YAML's "Norway problem": country: NO can be parsed as the boolean false. Modern parsers (YAML 1.2) fixed this, but many libraries default to 1.1. Always quote string values you do not control.

References

yaml.org · toml.io.
Noyes, P. — The Norway Problem, hitchdev.com.

Chapter the Thirty-Third

Markdown

A way to write formatted text using only the punctuation already on your keyboard. Asterisks become bold, hashes become headings, the result reads almost as well as plain prose.

John Gruber and Aaron Swartz invented Markdown in 2004 as a writing format that compiled to HTML. The genius: the source reads naturally, even unrendered. **bold** looks like emphasis even before it becomes bold. # Heading looks like the heading it represents.

Today, almost every README, every chat client, every AI prompt uses Markdown. Variants (CommonMark, GitHub-Flavored Markdown) standardised the messy edge cases.

Figure 33.1 — Type Markdown, see HTMLinteractive

Use cases

READMEs. The lingua franca of every code repository.
Documentation. Static-site generators (MkDocs, Hugo, Docusaurus) consume Markdown.
LLM I/O. Most assistants output Markdown by default; chat UIs render it live.
Notes & second brains. Obsidian, Notion, Bear, Logseq.

There is no single Markdown — there are dozens of dialects. For interoperable docs, target CommonMark with explicit GFM extensions (tables, task lists, fenced code).

References

Gruber, J. — Original Markdown spec, 2004.
CommonMark · GitHub-Flavored Markdown.

Chapter the Thirty-Fourth

SQL JOINs

A way to combine two tables along a shared column. Four flavours decide what happens to rows that don't match.

You have a users table and an orders table. Each order has a user_id. To list every user with their orders, you JOIN on the matching id. The interesting question is: what do you do with users who have no orders, or orders whose user was deleted?

INNER JOIN keeps only matched rows. LEFT JOIN keeps every row from the left table, padding with NULL where the right is missing. RIGHT JOIN is its mirror. FULL OUTER JOIN keeps every row from both, padding both sides.

Figure 34.1 — Pick a join; see the rows it returnsinteractive

INNER JOIN LEFT JOIN RIGHT JOIN FULL OUTER

Use cases

Reports. "All customers and their lifetime spend, including those with zero" → LEFT JOIN.
Data integrity. Find orphans with FULL OUTER JOIN + WHERE x.id IS NULL OR y.id IS NULL.
Analytics. Funnels, cohorts, retention all built from chained joins.

Joins are expensive on large tables without indexes. The columns you join on must be indexed; otherwise the database does a full scan. The most common slow-query in your career will be a missing index on a join key.

References

Date, C. J. — SQL and Relational Theory, 3rd ed.
Use The Index, Luke! — use-the-index-luke.com for join performance.

Chapter the Thirty-Fifth

Hashing vs Encryption

Two operations that look superficially similar — both take input and produce gibberish — but differ on a fundamental axis: can you go back?

A hash is a one-way fingerprint. Given the input, you always get the same fixed-size output; given the output alone, you cannot recover the input. A single bit changed in the input produces a wholly different hash. Use cases: password storage (with salt), file integrity, content-addressed storage (Git, IPFS).

Encryption is reversible — given the right key. The output looks random, but the legitimate holder of the key can decrypt back to the original. Use cases: transmitting secrets (TLS), storing sensitive data at rest, signed messages.

The bug behind a thousand breaches: confusing them. Storing a password "encrypted" means someone with the key can read every password. Storing a password "hashed" (with bcrypt/argon2) means even the database admin cannot.

Figure 35.1 — Type a value; observe both transformationsinteractive

SHA-256 hash

—

One-way. Same input → same hash, always. There is no "un-SHA-256". Try changing one character.

Encrypt (AES-GCM, key in browser)

—

Reversible with the key. Each run produces different ciphertext (random IV) — but decrypts to the same plaintext.

Use cases

Hashing. Password storage (bcrypt/argon2), file checksums, content-addressed git commits, JWT signatures.
Encryption. TLS in flight, disk encryption at rest, end-to-end messaging (Signal, WhatsApp), API secrets.
Both together. Sign with a hash, encrypt the package — modern protocols use them in concert.

Never invent crypto. Use a vetted library and a vetted construction (libsodium, the platform's WebCrypto, OpenSSL with sane defaults). And never use MD5 or SHA-1 for anything new — both are broken.

References

Ferguson, Schneier & Kohno — Cryptography Engineering, 2010.
OWASP — Password Storage Cheat Sheet.

Chapter the Thirty-Sixth

Base64

A way to package binary data — images, hashes, encrypted blobs, anything — into ordinary text so it survives email, JSON, URLs, and every other channel that only speaks ASCII.

Base64 takes raw bytes and re-expresses them using only sixty-four printable characters: A–Z, a–z, 0–9, plus + and /. The trick is arithmetic: every three bytes (24 bits) splits cleanly into four six-bit groups, each of which indexes into the 64-character alphabet. The output is exactly 4/3 the length of the input, padded at the end with = when the byte count is not a multiple of three.

A URL-safe variant swaps + for - and / for _ so the result can travel inside URLs and filenames without further escaping. JWTs (chapter 22) use the URL-safe form; classic data URIs use the standard form. Both decode the same bytes back.

It bears repeating, because the misconception is endemic: Base64 is encoding, not encryption. There is no key. Anyone with a browser console can decode it.

Figure 36.1 — Watch three bytes become four charactersinteractive

variant

one byte: M two bytes: Hi three bytes: Cat longer text unicode: café

output

Use cases

JWT segments. Each header.payload.signature piece is URL-safe Base64 (chapter 22).
Data URIs. <img src="data:image/png;base64,iVBOR…"> embeds an image directly in the page source.
MIME email. Attachments are Base64-encoded so they survive 7-bit SMTP servers without corruption.
Binary in JSON. A signed payload, a hash, an encrypted blob — anywhere bytes need to live inside a text-only protocol.
Basic auth. The HTTP Authorization: Basic dXNlcjpwYXNz header is just user:pass Base64-encoded — which is exactly why HTTPS is non-negotiable for it.

If you ever see a "secured" credential or API key Base64-encoded in a config file or log, that is a security bug, not a security feature. Treat Base64 as readable text, because it is readable text.

References

RFC 4648 — The Base16, Base32, and Base64 Data Encodings.
MDN — Base64 glossary entry.