← Back to Autonomy
❦ ❦ ❦

A Field Guide to
Modern Software Concepts

an interactive monograph in five volumes & thirty-six chapters
edition ii.i  ·  with figures, demonstrations & references

By Majid Mazouchi

Volume I
Foundations
The vocabulary that shows up in every job: structured data, contracts, parsers, patterns, and how language models present themselves.
Chapter the First

Schema

A schema is a blueprint. It does not contain the data; it describes the shape the data must take.

Imagine you are building a paper form for new library patrons. Before any patron arrives, you decide which boxes the form will have — name (text), age (a number), has_card (yes or no). That blueprint is the schema. The forms patrons fill out are the data. The blueprint exists once; the filled forms exist by the thousand.

In software, a schema describes the structure, the types of each field, and the rules (which fields are required, what range a number may fall in). It is enforced by databases, by APIs, and by validation libraries — each rejecting any record that does not fit.

Figure 1.1  —  The schema as blueprintinteractive

User schema

idint · required
namestring · required
emailstring · email format
ageint · 0–120
activebool
click a field A schema entry has three parts: a name, a type (what kind of value), and optional constraints (required, range, format). Click any row in the blueprint to inspect it.

Use cases

In your TargetLink/AUTOSAR work, the DataType definitions and PlatformTypes are exactly schemas — they constrain what bit-width, sign, and range a signal may carry, and the build chain refuses code that violates them.

References

  1. Date, C. J. An Introduction to Database Systems, 8th ed. — chapter on relational schemas.
  2. OpenAPI Initiative. openapis.org — the de facto API schema specification.
Chapter the Second

JSON

A simple, plain-text language for writing down structured information that humans can read and any computer can parse.

JSON — JavaScript Object Notation — uses six building blocks: strings, numbers, true / false / null, arrays (in [ ]), and objects (in { }, holding key–value pairs). That is the entire grammar. Despite its tiny vocabulary, almost every web service in the world speaks it.

JSON has no comments, no dates, no decimals-with-units, no functions. Its great virtue is precisely this poverty: every language can read it the same way.

Figure 2.1  —  A JSON document, expandableinteractive

Use cases

JSON cannot represent NaN, Infinity, or trailing commas. If your float pipeline emits these, serialization will fail silently or throw. For embedded telemetry, prefer protobuf/MessagePack — JSON is not size- or precision-friendly.

References

  1. RFC 8259 — The JavaScript Object Notation Data Interchange Format.
  2. Crockford, D. json.org — original informal reference.
Chapter the Third

API

An Application Programming Interface is a contract: send this kind of request, receive that kind of reply. Everything else is hidden.

If a restaurant kitchen is a system, the menu is its API. You do not enter the kitchen. You read the menu (the contract), you place an order (a request) at the agreed counter (the endpoint), and a meal (the response) comes out. The chef may switch ingredients, hire new cooks, or rebuild the stove. As long as the menu is honored, you do not care.

Modern web APIs typically use HTTP with verbs (GET, POST, PUT, DELETE), a URL identifying the resource, optional headers (auth, content-type), and a JSON body. The reply is a status code (200 OK, 404 not found, 500 error) plus a body.

Figure 3.1  —  A simulated API callinteractive
GET
Request
GET /api/users/42 HTTP/1.1
Host: example.com
Accept: application/json
Response
(awaiting request)

Try also /api/users/999 (not found) and /api/posts (a list).

Use cases

"API" is also used for language APIs (the public functions of a library) and OS APIs (system calls). The contract idea is the same; only the transport differs.

References

  1. Fielding, R. — Architectural Styles and the Design of Network-based Software Architectures, 2000 (the REST dissertation).
  2. MDN Web Docs — HTTP overview.
Chapter the Fourth

Parser & Parsing

Parsing is the act of turning a flat stream of characters into a structured tree the computer can act on.

When you read the sentence "The cat sat on the mat," your brain unconsciously identifies the subject, the verb, and the prepositional phrase. A parser does the same for code or data: it takes raw text and recovers the structure hidden inside it.

Parsing usually happens in two passes. First the lexer (or tokenizer) chops the text into atomic tokens — numbers, identifiers, operators. Then the parser arranges those tokens according to grammar rules into an Abstract Syntax Tree (AST) — the structured form on which evaluators, compilers, and linters then operate.

Figure 4.1  —  Lexing and parsing an arithmetic expressioninteractive
1 · Tokens (output of the lexer)
2 · Abstract Syntax Tree

        
3 · Evaluated result

Use cases

Parsing untrusted input is a frequent attack surface. Prefer battle-tested libraries; never write a JSON or YAML parser by hand for production.

References

  1. Aho, Lam, Sethi, Ullman — Compilers: Principles, Techniques, and Tools ("the dragon book"), 2nd ed.
  2. Crafting Interpreters — craftinginterpreters.com, an excellent free book by Bob Nystrom.
Chapter the Fifth

Regular Expressions

A tiny language whose only purpose is to describe text patterns: "a digit followed by two letters", "an email address", "anything between two quotes".

A regular expression — regex — is a string in which most characters mean themselves but a few have superpowers: . matches any character, * means "zero or more of the previous", + means "one or more", \d matches digits, [a-z] matches a range, ^ and $ anchor to start and end, and parentheses capture groups for later use.

Regex is dense. A pattern that takes ten minutes to write may take an hour to read. But for jobs like extracting all phone numbers from a document, no other tool is so concise.

Figure 5.1  —  Live regex testerinteractive
phone numbers emails URLs capitalised words hashtags / refs

Use cases

Two cautions. One: never parse HTML with regex — it is not a regular language. Two: certain patterns (nested (a+)+b) cause catastrophic backtracking and can hang a server. Test on adversarial input.

References

  1. Friedl, J. — Mastering Regular Expressions, 3rd ed., O'Reilly.
  2. regex101.com — interactive tester with explanations.
Chapter the Sixth

JSON Mode

A switch on a Large Language Model that forces its reply to be a single, syntactically valid JSON document — never prose, never markdown, never apology.

Out of the box, an LLM is a free-form storyteller. Ask it for a recipe and you may receive a friendly preamble ("Sure! Here's a great recipe…"), then the recipe in markdown, then a closing remark. Useful to a human; ruinous to a program that tries to JSON.parse() the reply.

JSON mode changes the decoding rule of the model. At every token step, the sampler is constrained so the cumulative output remains valid JSON. The model can no longer wander into prose. It must close every brace it opens.

Figure 6.1  —  The same prompt, two output modesinteractive

Prompt: "Extract the order: I need 3 lattes and 2 muffins for table 7."


        

Notice: the OFF response cannot be parsed by a program. The ON response goes straight into your downstream code.

Use cases

JSON mode guarantees syntactic validity, not semantic correctness. The model may still emit a string where you wanted a number, or invent a field. Pair JSON mode with a JSON Schema (next chapter) for full safety.

References

  1. OpenAI — Structured Outputs guide.
  2. Anthropic — Tool use documentation.
Chapter the Seventh

JSON Schema

A JSON document whose only job is to describe — and validate — other JSON documents.

If JSON is a written record, a JSON Schema is the official template the record must conform to. The schema declares the expected type of every field ("string", "integer", etc.), which fields are required, what format a string must obey (email, URI, date), what range a number must lie in, and even how items inside an array should look.

A validator reads the schema, reads the data, and reports every place where data and schema disagree. This same schema then drives form generators, code generators, OpenAPI documentation, and — most importantly — LLM structured-output enforcers.

Figure 7.1  —  Validate JSON against a schemainteractive
Schema
Data
(click Validate to check)

Use cases

JSON Schema validates structure — not the meaning. A schema can guarantee age is an integer between 0 and 120. It cannot guarantee that the integer is the actual age of the person.

References

  1. json-schema.org — the official specification and learning materials.
  2. Pydantic docs — docs.pydantic.dev — schema-driven validation in Python.
Chapter the Eighth

ReAct

A pattern for LLM agents that interleaves reasoning and acting: think a step, take an action, observe the result, think again.

An LLM by itself only knows what is in its weights. Ask it for today's weather and it will guess. ReAct (Yao et al., 2022) lets the model break out of its head: at each turn it may produce a Thought (private reasoning), an Action (a call to a tool — search, calculator, database, API), and then read an Observation from the tool. It loops until it can give a final Answer.

This single cycle — Thought → Action → Observation → Thought → … → Answer — is the engine behind most modern AI agents, including the one that wrote this page.

Figure 8.1  —  A ReAct trace, played step by stepinteractive

User: "What is the population of Japan multiplied by 2?"

Use cases

ReAct is more reliable than pure chain-of-thought because each Action grounds the reasoning in a real-world observation. But it can also fail loudly — a confused agent will repeat the same wrong action many times. Always cap the loop and log every step.

References

  1. Yao, S. et al. — ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023. arXiv:2210.03629
  2. LangChain ReAct docs — python.langchain.com
Chapter the Ninth

Tailwind

A CSS framework that gives you thousands of tiny, single-purpose classes — and asks you to compose them, in your HTML, into any design you like.

Most CSS frameworks (Bootstrap, Material UI) ship pre-made components — a "Card", a "Button" — each with its own opinions. Tailwind CSS does the opposite. It ships utility classes: p-4 means padding 1 rem, text-xl means large font, rounded-lg means large border radius, bg-blue-500 means medium-blue background. You build the component yourself by stringing utilities together, in the markup, exactly where the design lives.

The promise: no more renaming CSS classes, no more "button-primary-large-disabled-rounded" spaghetti. The cost: HTML can grow visually noisy. Most teams accept the trade-off.

Figure 9.1  —  Compose utilities to style a cardinteractive
A small card.
Toggle classes below to restyle me.
bg-white bg-amber-100 bg-slate-800
text-slate-900 text-rose-600 text-amber-50
p-2 p-6 p-12
none md 2xl full
none md 2xl
sm lg 2xl
class="…"

Use cases

For complex, repeated patterns, extract to a component (in React/Vue/Svelte) — do not paste the same 30-class string fifty times. Tailwind's strength is in utility composition, not utility duplication.

References

  1. tailwindcss.com — official documentation.
  2. Wathan, A. — Refactoring UI, an excellent companion book by Tailwind's creator.
Chapter the Tenth

BM25 Keyword Search

A scoring formula from the 1990s that, despite its age, still beats almost every neural search method when the user types a few keywords and expects exact matches.

The intuition is simple. A document deserves a high score for a query if (1) the query words appear in it, (2) the words are rare in the corpus (so they are informative — "the" is useless, "axial-flux" is golden), and (3) the document is not too long (a long document mentions everything; a short, focused one mentioning your terms is more likely to be on-topic).

BM25 — Best Match 25 — combines these three signals with two tuning knobs (k₁ for term saturation and b for length normalization). It is the default ranker in Elasticsearch, Lucene, and OpenSearch, and the keyword half of nearly every modern hybrid retrieval pipeline.

Figure 10.1  —  A miniature BM25 index over five sentencesinteractive
axial flux machine motor vehicle health diagnostic the (stop word)

    Use cases

    BM25 cannot tell that "car" and "automobile" are related — that is the job of embeddings. For systems where users phrase questions in their own words, combine BM25 with a vector retriever rather than choosing between them.

    References

    1. Robertson, S. & Zaragoza, H. — The Probabilistic Relevance Framework: BM25 and Beyond, 2009.
    2. Elastic docs — BM25 similarity.
    Volume II
    AI & Retrieval
    How modern language models find, structure, and act on knowledge — from tokens at the bottom to agent protocols at the top.
    Chapter the Eleventh

    Tokens & Context Window

    A language model does not see words. It sees tokens — small chunks of letters — and it can only fit a limited number of them in its working memory.

    Before any text reaches a transformer, a tokenizer breaks it into pieces. Common words become a single token ("the"); rare or invented words split into several ("unhappiness""un" + "happi" + "ness"). Punctuation, spaces, and emojis each cost something. A useful rule of thumb in English: about 4 characters per token, or roughly 0.75 tokens per word.

    The context window is the maximum number of tokens a model can attend to at once: prompt plus history plus generated reply. Older models held 4k; modern frontier models hold 200k–1M. Exceed it and the oldest tokens fall off the back of a moving train.

    Figure 11.1  —  A simplified tokenizerinteractive
    0 tokens 0 chars 0 chars/token (this is a heuristic split — real BPE tokenizers are learnt)

    Use cases

    Token counts are language-dependent. The same paragraph in Japanese or German often costs 1.5–2× more tokens than in English. For your GenAI Nexus integrations, instrument the token counter early; cost surprises always come from there.

    References

    1. Sennrich, R. et al. — Neural Machine Translation of Rare Words with Subword Units, ACL 2016 (BPE).
    2. OpenAI — interactive tokenizer.
    Chapter the Twelfth

    Embeddings

    A way to turn any piece of text into a list of numbers — a vector — such that texts with related meaning land near each other in space.

    Imagine assigning every English word a coordinate in a high-dimensional map. Words that play similar roles in similar contexts (king, queen, monarch) get neighbouring coordinates; unrelated words (queen, asphalt) land far apart. That assignment is an embedding. Modern embeddings live in 768- or 1536-dimensional space, but the idea is the same: distance encodes meaning.

    The clever bit is that the same trick works for sentences, paragraphs, even images. Once you have vectors, you can ask the question every search engine secretly wants to ask: find me the things most similar to this.

    Figure 12.1  —  A 2-D embedding map · click a word to see its neighboursinteractive
    click any point above ↑

    Use cases

    Embeddings inherit their model's biases. If the training corpus encodes a stereotype, the vector space encodes it too. Audit before deploying in hiring, lending, or moderation.

    References

    1. Mikolov, T. et al. — word2vec, NeurIPS 2013.
    2. Reimers, N. & Gurevych, I. — Sentence-BERT, EMNLP 2019.
    Chapter the Thirteenth

    Vector Search & Cosine Similarity

    Once everything is a vector, "search" becomes "find the vectors closest to mine" — usually measured by the cosine of the angle between them.

    Cosine similarity ignores how long the vectors are and asks only: do they point in the same direction? Two vectors pointing the same way score 1; perpendicular ones score 0; opposite ones score −1. For text embeddings (which are usually L2-normalised), cosine and dot-product give the same ranking.

    A naive search compares the query to every vector — fine for thousands, painful for millions. Production systems use approximate nearest-neighbour indexes (HNSW, IVF, ScaNN) that trade a sliver of recall for orders-of-magnitude speed.

    Figure 13.1  —  Cosine-ranked retrieval over a tiny corpusinteractive
    cars drive themselves train a neural network healthy recipe motor controller

      Use cases

      Cosine cannot tell synonyms from opposites reliably — both "love" and "hate" appear in similar emotional contexts. Combine with metadata filters and (often) a reranker.

      References

      1. Malkov, Y. & Yashunin, D. — Efficient and robust approximate nearest neighbor search using HNSW, 2018.
      2. Pinecone, Weaviate, Qdrant, FAISS — open and managed vector DB documentation.
      Chapter the Fourteenth

      Chunking

      Documents are too long to embed whole and too long to feed to a model whole. Chunking is the unglamorous craft of slicing them into the right-sized pieces.

      You cannot embed a 200-page PDF as one vector — you would lose all locality. So you cut. Naive cuts at fixed character counts can split a sentence in half and bury the answer across two chunks. Better cuts respect structure: paragraph boundaries, sentence boundaries, headings. Better still, an overlap of a few sentences between adjacent chunks ensures context near the seam is never lost.

      Three knobs matter: chunk size (typically 200–800 tokens), overlap (10–20%), and splitter strategy (recursive by paragraph → sentence → word).

      Figure 14.1  —  Slide the size and overlap to see how chunks changeinteractive

      Use cases

      There is no universally best chunk size. Test on your queries, your model, and your evaluation set. A common antipattern is over-chunking: 50-token slivers that lose all context.

      References

      1. LangChain text splitters — documentation.
      2. Pinecone — chunking strategies.
      Chapter the Fifteenth

      Retrieval-Augmented Generation

      Instead of asking a model to answer from memory, fetch the relevant facts first and paste them into the prompt. The model becomes an open-book student.

      An LLM trained last year does not know your codebase, your customers, or yesterday's meeting notes. RAG bridges the gap. At query time the system retrieves the most relevant chunks from a vector index (and often a keyword index), augments the prompt with them, and asks the model to answer using that context. The model's job changes from recall to read-and-respond.

      Done well, RAG cuts hallucination, gives citations, and lets you update knowledge without retraining. Done poorly, it serves wrong chunks confidently.

      Figure 15.1  —  A RAG pipeline, animatedinteractive
      1 · query
      2 · retrieve
      3 · augment
      4 · generate

      Use cases

      RAG is only as good as its retriever. Spend at least as much time on retrieval evaluation (precision@k, recall@k) as on prompt engineering — the prompt cannot fix bad chunks.

      References

      1. Lewis, P. et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020. arXiv:2005.11401
      2. Anthropic — Contextual Retrieval blog post.
      Chapter the Sixteenth

      Reranking

      A two-stage strategy: cast a wide net cheaply, then re-sort the catch carefully. Most production search uses both.

      Vector search is fast but coarse. It returns 100 plausible chunks in milliseconds. A reranker — typically a smaller cross-encoder model that reads query and candidate together — then assigns each one a precise relevance score. You discard the bottom 95 and keep the top 5.

      Why two stages? A cross-encoder is too slow to run over a million documents, but a vector index is too crude to put the truly best result on top. The combination delivers both recall and precision.

      Figure 16.1  —  Click "Rerank" to watch the order shuffleinteractive

      Query: "how to enforce torque limit safely"

        Use cases

        Cross-encoder rerankers (Cohere Rerank, BGE-rerank) read query + chunk together and are more accurate than embedding similarity, but they cost roughly one model call per candidate. Cap the candidate set.

        References

        1. Nogueira, R. & Cho, K. — Passage Re-ranking with BERT, 2019.
        2. Cohere — Rerank documentation.
        Chapter the Seventeenth

        Hybrid Search & Reciprocal Rank Fusion

        BM25 is great at exact matches. Vector search is great at meaning. Combine them with a single elegant formula and you outperform either alone.

        Hybrid search runs both retrievers in parallel, then merges the lists. The simplest, most robust merger is Reciprocal Rank Fusion (RRF):

        score(d) = Σr 1 / (k + rankr(d))

        For each retriever r, you take 1 over (a small constant k, typically 60, plus the rank of the document in that retriever's list). Sum across retrievers. Documents ranked highly by either method bubble up; documents ignored by both stay low. No score normalisation needed.

        Figure 17.1  —  Two rankers fused into oneinteractive

        Query: "axial flux NVH harmonic injection"  ·  k = 60

        BM25
          Vector
            Fused (RRF)

              Use cases

              RRF is rank-based, so absolute scores from heterogeneous retrievers do not need to be on the same scale — a key practical advantage. Other fusion methods (linear combination, learned-to-rank) require careful score calibration.

              References

              1. Cormack, G. et al. — Reciprocal rank fusion outperforms Condorcet and individual rank learning methods, SIGIR 2009.
              2. Elasticsearch — RRF reference.
              Chapter the Eighteenth

              Function (Tool) Calling

              Hand the model a list of functions it may call, with their JSON schemas, and let it decide which one — and with what arguments — best answers the user.

              Function calling (also called tool use) is the structural foundation of every modern AI agent. You describe each tool with a name, a description, and a JSON schema for its parameters. The model, when it judges a tool call necessary, emits a JSON blob naming the tool and supplying its arguments. Your code receives the blob, runs the function, returns the result, and the model continues.

              This is more disciplined than JSON mode: JSON mode controls format, function calling controls which function.

              Figure 18.1  —  Watch the model pick a tool and fill its argumentsinteractive

              Available tools

              get_weather(city) search_orders(user, date_range) convert_currency(amount, from, to) send_email(to, subject, body)
              currency conversion order search email no tool needed
              
                    

              Use cases

              Always validate the model's tool arguments against your real schema before executing — even "structured" output can hallucinate. Treat tool calls as untrusted input from a junior intern.

              References

              1. Anthropic — Tool use guide.
              2. OpenAI — Function calling guide.
              Chapter the Nineteenth

              Chain of Thought

              Ask a model to "think step by step" before answering, and on hard problems its accuracy jumps — sometimes dramatically.

              Chain-of-thought (CoT) prompting nudges a model to produce its intermediate reasoning out loud — list assumptions, do arithmetic, work the problem in stages — before stating the final answer. Wei et al. (2022) showed this single change can lift performance on math and reasoning benchmarks by tens of percentage points, especially in larger models.

              CoT differs from ReAct: CoT is reasoning only (no tool calls), while ReAct interleaves reasoning with actions. Modern reasoning models (o1, o3, Claude with extended thinking) bake CoT into their decoding so you no longer have to ask.

              Figure 19.1  —  The same word problem, with and without CoTinteractive

              Question: "A motor draws 12 A for 8 hours and 3 A for 16 hours each day. What is its average daily current?"

              Use cases

              CoT can also hurt on simple lookup tasks (a "let me think" preamble for "what is the capital of France?" wastes tokens). Modern systems route CoT only when the question warrants it.

              References

              1. Wei, J. et al. — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022. arXiv:2201.11903
              2. Kojima, T. et al. — Large Language Models are Zero-Shot Reasoners ("Let's think step by step"), NeurIPS 2022.
              Chapter the Twentieth

              Model Context Protocol

              An open standard — championed by Anthropic — that lets any AI client talk to any tool or data source through a single, uniform protocol. USB-C for LLMs.

              Without a standard, every AI app re-implements its own tool wiring: a custom GitHub plugin, a custom Slack plugin, a custom database plugin. MCP (Model Context Protocol) defines a small client–server protocol where AI applications (the host) speak to MCP servers that each expose tools, resources, and prompts. Build the server once; every MCP-aware host can use it.

              The protocol carries three primitives. Tools are functions the model can invoke. Resources are read-only data the host may inject into context. Prompts are reusable templates servers offer to clients.

              Figure 20.1  —  Host, servers, and the world they exposediagram
              AI Host CLAUDE · CURSOR · IDE MCP MCP MCP GitHub server tools · resources Filesystem server read · write · grep Database server query · schema github.com local files Postgres

              The host knows nothing about GitHub, files, or Postgres directly. It only knows MCP. Each server translates between MCP and its native protocol.

              Use cases

              MCP servers run with the host's permissions. Treat installing one like installing a browser extension — review the source, prefer official servers, and constrain the tools each server exposes.

              References

              1. Anthropic — modelcontextprotocol.io — the spec, SDKs, and server registry.
              2. Anthropic — Introducing the Model Context Protocol, Nov 2024.
              Volume III
              Web Plumbing
              The unglamorous infrastructure that keeps networked applications honest: verbs, status codes, identity, push and pull, caches, retries.
              Chapter the Twenty-First

              HTTP Verbs & Status Codes

              A handful of verbs describe what you want to do, and a three-digit status code describes what happened. Most of the web is built on these.

              The verb signals intent: GET reads, POST creates, PUT replaces, PATCH partially updates, DELETE removes. The server reads the verb and the URL, does its work, and returns a status code grouped by family: 2xx success, 3xx redirection, 4xx client error (your fault), 5xx server error (their fault).

              Verbs and codes carry a contract beyond their literal action. GET is meant to be safe and cacheable; repeating it must not change state. PUT and DELETE are meant to be idempotent (chapter 27).

              Figure 21.1  —  Click any cell for an explanationinteractive
              click a cell ↑

              Use cases

              Beware the lazy 200 OK with {"error": "..."} in the body. Status codes exist precisely so HTTP infrastructure (proxies, retries, browser dev tools) can reason about success without parsing JSON.

              References

              1. RFC 9110 — HTTP Semantics.
              2. MDN — Status code reference.
              Chapter the Twenty-Second

              OAuth & JWT

              Two patterns for proving identity to an API without re-typing a password on every request. OAuth is the dance; JWT is the badge it issues.

              OAuth 2.0 is the protocol behind every "Sign in with Google" button. The user authenticates once with the identity provider; the application receives a short-lived access token instead of the user's password. Subsequent API calls present the token as a Bearer header.

              That token is often a JWT (JSON Web Token) — three Base64-encoded parts separated by dots: a header (the algorithm), a payload (claims about the user), and a signature the server can verify without a database lookup. Anyone can read the payload; only the issuer, with the secret, could have signed it.

              Figure 22.1  —  Decode a JWTinteractive
              header
              payload

              The signature ensures nobody tampered with the payload. The contents are not encrypted — never put secrets in a JWT.

              Use cases

              Two bugs cause most JWT incidents: forgetting to verify the signature ("alg=none" attack) and giving tokens long lifetimes. Verify always; expire fast; refresh.

              References

              1. RFC 6749 — OAuth 2.0; RFC 7519 — JSON Web Token.
              2. jwt.io — full-featured online debugger.
              Chapter the Twenty-Third

              WebSocket vs Polling

              Two ways to keep a client up to date: ask the server repeatedly ("polling"), or open one persistent line and let the server speak whenever it has news.

              Polling is HTTP business as usual: every few seconds the client sends "any updates?" and the server answers yes or no. Simple, firewall-friendly, but wasteful when nothing changes — and laggy by definition (you only learn at the next poll).

              WebSockets open a single TCP connection that stays alive. Either party may push a message at any time, with no per-message HTTP overhead. The cost: stateful connections, harder load-balancing, harder horizontal scaling.

              Figure 23.1  —  Compare network chatter side by sideanimation
              Polling: regular requests · WebSocket: events on demand

              Use cases

              A middle ground is Server-Sent Events (SSE): one-way push from server to client over plain HTTP. Simpler than WebSockets and increasingly used to stream LLM tokens.

              References

              1. RFC 6455 — The WebSocket Protocol.
              2. MDN — Server-Sent Events.
              Chapter the Twenty-Fourth

              Webhooks

              Instead of polling someone else's API every minute, give them a URL of yours; they'll POST to it whenever something interesting happens. Reverse APIs.

              A webhook is a normal HTTP endpoint you host. You register its URL with a third-party service (Stripe, GitHub, Slack), and from then on the service pushes events to you the moment they occur. Your endpoint receives a JSON payload, returns 200 OK, and gets back to its life.

              Webhooks invert the usual control flow: you become the server-of-events. They are the lightest possible event bus across the public internet.

              Figure 24.1  —  Trigger an event and watch it arriveinteractive
              Stripe
              payment provider
              POST /hook
              JSON body
              Your server
              your-app.com/webhooks/stripe

              Use cases

              Always verify webhook signatures (HMAC over the body) before trusting the payload. Anyone who learns the URL can otherwise replay or forge events. And design your handler to be idempotent — providers retry on 5xx.

              References

              1. Stripe — Webhooks documentation (the canonical implementation).
              2. Svix — Standard Webhooks draft spec.
              Chapter the Twenty-Fifth

              CORS

              A browser security rule that, by default, forbids JavaScript on one website from calling APIs on another. The server must explicitly opt in.

              Cross-Origin Resource Sharing (CORS) protects you from a malicious page secretly making authenticated calls to your bank in your name. The browser enforces a "same-origin policy": JavaScript loaded from foo.com may freely call foo.com, but to call bar.com it needs bar.com's permission, expressed in response headers like Access-Control-Allow-Origin.

              The notorious "CORS error" you see in the console is not a bug in the browser — it is the browser doing exactly the job it is paid for. The fix is on the server you are calling, not on yours.

              Figure 25.1  —  Will the browser allow this request?interactive

              Use cases

              CORS does not protect your server from anyone. It protects users' browsers from malicious cross-site requests. Server-side requests (curl, Postman, your own backend) are unaffected — and unauthenticated.

              References

              1. MDN — Cross-Origin Resource Sharing.
              2. WHATWG — Fetch standard.
              Chapter the Twenty-Sixth

              Caching

              Storing the answer to a question so the next person asking the same question gets it instantly. Most performance work, in the end, is the right cache in the right place.

              A request travels through layers, and at every layer something might already have the answer. The browser's memory cache, the disk cache, a CDN edge node, an application cache (Redis), the database's own buffer pool. A hit at any layer skips the rest. A miss falls through to the next.

              Caching is governed by two hard problems, both quoted endlessly: invalidation (when does cached data become stale?) and naming (what is the right key?). Get either wrong and users see yesterday's prices.

              Figure 26.1  —  Watch a request fall through the cache layersinteractive
              Browser
              CDN
              App / Redis
              Database

              Use cases

              "There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Believe it. The bug you cannot reproduce is usually a cache.

              References

              1. RFC 9111 — HTTP Caching.
              2. Anthropic — Prompt caching.
              Chapter the Twenty-Seventh

              Idempotency

              An operation is idempotent if doing it twice has the same effect as doing it once. The internet is unreliable; your operations had better be.

              Networks drop packets. Clients retry. Without care, a single "charge $50" intent turns into two charges. The cure is the idempotency key: the client invents a unique ID per intent and sends it with every retry. The server records the result against the key, and a second arrival with the same key returns the same result without doing the work again.

              GET, PUT, and DELETE are naturally idempotent (re-reading, re-replacing, re-deleting all converge). POST is not — and almost every payment system requires an Idempotency-Key header to make it so.

              Figure 27.1  —  Press the button many times. Watch what happens.interactive
              charges: 0 total: $0 current key:

              Use cases

              An idempotency key has a TTL. Keep it long enough that legitimate retries match (24h is common) but short enough that it does not leak. And key the cache on the operation and the user — never globally.

              References

              1. Stripe — Idempotent requests.
              2. Brandur Leach — Designing robust and predictable APIs with idempotency.
              Chapter the Twenty-Eighth

              Rate Limiting & Exponential Backoff

              Servers protect themselves by capping how often any one client may call. Polite clients, when refused, wait — longer each time — before trying again.

              Every public API rate-limits: 100 requests per minute, 10,000 tokens per second. Exceed it and the server returns 429 Too Many Requests, often with a Retry-After header. A client that hammers harder will only be banned faster.

              The standard polite response is exponential backoff with jitter: wait b · 2n seconds before retry n, plus a small random offset to avoid synchronized retries from a thousand clients colliding (the "thundering herd"). It's the universal manners of distributed systems.

              Figure 28.1  —  A burst, a 429, and the backoff that followsanimation

              Limit: 5 requests per 10 seconds. Burst exceeds limit, server replies 429, client backs off (1s, 2s, 4s) before resuming.

              Use cases

              Always add jitter. Without it, every client that synced their retries by getting limited at the same moment retries again at the same moment. With jitter, retries spread out and the server recovers smoothly.

              References

              1. AWS Architecture Blog — Exponential Backoff and Jitter.
              2. RFC 6585 — Additional HTTP Status Codes (introduces 429).
              Volume IV
              Async & Concurrency
              What happens when more than one thing is in flight at once — the most reliable source of bugs in any codebase.
              Chapter the Twenty-Ninth

              Async / Await & the Event Loop

              JavaScript runs on a single thread. Async is the bookkeeping that lets it feel like several at once: start a slow task, do other work, resume when the answer is ready.

              The runtime maintains a call stack (currently executing functions), a callback queue (tasks waiting their turn — timer fires, network responses), and a microtask queue (Promise resolutions, almost-immediate). The event loop is the rule: when the stack is empty, drain all microtasks, then take one task from the queue, then repeat.

              async/await is sugar over Promises. await suspends the function, frees the stack, and the runtime resumes the function once the awaited Promise settles — typically as a microtask.

              Figure 29.1  —  Step through a small async programinteractive
              console.log("A");
              setTimeout(() => console.log("B"), 0);
              Promise.resolve().then(() => console.log("C"));
              console.log("D");
              Call stack
              Microtask queue
              Task queue
              (output appears here)

              Use cases

              "Async" is not the same as "parallel." JavaScript with async/await still runs on one thread. True parallelism in browsers requires Web Workers; in Node, worker threads or separate processes.

              References

              1. MDN — The event loop.
              2. Lin Clark — "What the heck is the event loop anyway?" (talk).
              Chapter the Thirtieth

              Race Conditions

              When two things happen at almost the same time and the order of their tiny inner steps decides the outcome — sometimes correctly, sometimes catastrophically.

              The classic example: two threads each read a counter (it says 5), each add 1, each write back. Done concurrently, the counter ends at 6, not 7 — one increment is lost. The bug is invisible until the day it bites in production.

              Cures come in three families. Locks serialize access (mutex, semaphore). Atomic operations bundle read-modify-write into one indivisible step (compare-and-swap). Avoid sharing: actors, message queues, immutable data — no two threads touch the same memory.

              Figure 30.1  —  Replay the race; observe the lost updateanimation
              counter = 0
              Thread A
              read counter
              add 1
              write back
              Thread B
              read counter
              add 1
              write back

              Use cases (well, occurrences)

              The hardest race conditions are those whose probability scales with load. They pass tests at 1 req/s and corrupt the database at 1000 req/s. Treat any "intermittent" production bug as a race until proven otherwise.

              References

              1. Herlihy & Shavit — The Art of Multiprocessor Programming, 2nd ed.
              2. Kleppmann, M. — Designing Data-Intensive Applications, ch. 7 (transactions).
              Chapter the Thirty-First

              Debounce vs Throttle

              Two ways of taming a fire-hose of events into something a server (or a search-as-you-type box) can stomach.

              Debounce: wait until the user stops, then fire once. Perfect for search-as-you-type — no point querying after every keystroke when one's coming a millisecond later.

              Throttle: fire at most once every N milliseconds, no matter how often events arrive. Perfect for window resize or scroll handlers — you want updates during the action, just not 200 of them per second.

              Figure 31.1  —  Type fast and watch the counters divergeinteractive
              raw events
              0
              debounced (300ms)
              0
              throttled (300ms)
              0

              Debounced: only the last keystroke in a quiet window fires. Throttled: at most one fire per 300ms, regardless.

              Use cases

              Most utility libraries (lodash, underscore) ship both. If you're writing your own, get the trailing edge right — most users want the last event delivered, not silently swallowed.

              References

              1. Lodash — debounce & throttle.
              2. CSS-Tricks — Debouncing and throttling explained.
              Volume V
              Data & Formats
              Side-companions to JSON: alternative serialisations, the lingua franca of prose, the algebra of relational queries, and the difference between a fingerprint and a vault.
              Chapter the Thirty-Second

              YAML & TOML

              Two cousins of JSON optimised for humans writing configuration: YAML for indented prose, TOML for clean sectioned files.

              YAML ("YAML Ain't Markup Language") uses indentation, hyphens for lists, and colons for key-value pairs. It supports comments, multiline strings, and references — at the cost of subtle whitespace bugs and famously inconsistent boolean parsing (yes, NO, on all once meant booleans). It dominates Kubernetes, GitHub Actions, Ansible, and ML configs.

              TOML ("Tom's Obvious, Minimal Language") trades indentation for explicit [sections] and quoted strings. Less expressive, far less ambiguous. The Rust ecosystem (Cargo.toml) and Python's pyproject.toml made it ubiquitous.

              Figure 32.1  —  The same configuration, three formatsinteractive
              
                    

              Use cases

              YAML's "Norway problem": country: NO can be parsed as the boolean false. Modern parsers (YAML 1.2) fixed this, but many libraries default to 1.1. Always quote string values you do not control.

              References

              1. yaml.org  ·  toml.io.
              2. Noyes, P. — The Norway Problem, hitchdev.com.
              Chapter the Thirty-Third

              Markdown

              A way to write formatted text using only the punctuation already on your keyboard. Asterisks become bold, hashes become headings, the result reads almost as well as plain prose.

              John Gruber and Aaron Swartz invented Markdown in 2004 as a writing format that compiled to HTML. The genius: the source reads naturally, even unrendered. **bold** looks like emphasis even before it becomes bold. # Heading looks like the heading it represents.

              Today, almost every README, every chat client, every AI prompt uses Markdown. Variants (CommonMark, GitHub-Flavored Markdown) standardised the messy edge cases.

              Figure 33.1  —  Type Markdown, see HTMLinteractive

              Use cases

              There is no single Markdown — there are dozens of dialects. For interoperable docs, target CommonMark with explicit GFM extensions (tables, task lists, fenced code).

              References

              1. Gruber, J. — Original Markdown spec, 2004.
              2. CommonMark  ·  GitHub-Flavored Markdown.
              Chapter the Thirty-Fourth

              SQL JOINs

              A way to combine two tables along a shared column. Four flavours decide what happens to rows that don't match.

              You have a users table and an orders table. Each order has a user_id. To list every user with their orders, you JOIN on the matching id. The interesting question is: what do you do with users who have no orders, or orders whose user was deleted?

              INNER JOIN keeps only matched rows. LEFT JOIN keeps every row from the left table, padding with NULL where the right is missing. RIGHT JOIN is its mirror. FULL OUTER JOIN keeps every row from both, padding both sides.

              Figure 34.1  —  Pick a join; see the rows it returnsinteractive
              INNER JOIN LEFT JOIN RIGHT JOIN FULL OUTER
              users orders

              Use cases

              Joins are expensive on large tables without indexes. The columns you join on must be indexed; otherwise the database does a full scan. The most common slow-query in your career will be a missing index on a join key.

              References

              1. Date, C. J. — SQL and Relational Theory, 3rd ed.
              2. Use The Index, Luke! — use-the-index-luke.com for join performance.
              Chapter the Thirty-Fifth

              Hashing vs Encryption

              Two operations that look superficially similar — both take input and produce gibberish — but differ on a fundamental axis: can you go back?

              A hash is a one-way fingerprint. Given the input, you always get the same fixed-size output; given the output alone, you cannot recover the input. A single bit changed in the input produces a wholly different hash. Use cases: password storage (with salt), file integrity, content-addressed storage (Git, IPFS).

              Encryption is reversible — given the right key. The output looks random, but the legitimate holder of the key can decrypt back to the original. Use cases: transmitting secrets (TLS), storing sensitive data at rest, signed messages.

              The bug behind a thousand breaches: confusing them. Storing a password "encrypted" means someone with the key can read every password. Storing a password "hashed" (with bcrypt/argon2) means even the database admin cannot.

              Figure 35.1  —  Type a value; observe both transformationsinteractive
              SHA-256 hash
              One-way. Same input → same hash, always. There is no "un-SHA-256". Try changing one character.
              Encrypt (AES-GCM, key in browser)
              Reversible with the key. Each run produces different ciphertext (random IV) — but decrypts to the same plaintext.

              Use cases

              Never invent crypto. Use a vetted library and a vetted construction (libsodium, the platform's WebCrypto, OpenSSL with sane defaults). And never use MD5 or SHA-1 for anything new — both are broken.

              References

              1. Ferguson, Schneier & Kohno — Cryptography Engineering, 2010.
              2. OWASP — Password Storage Cheat Sheet.
              Chapter the Thirty-Sixth

              Base64

              A way to package binary data — images, hashes, encrypted blobs, anything — into ordinary text so it survives email, JSON, URLs, and every other channel that only speaks ASCII.

              Base64 takes raw bytes and re-expresses them using only sixty-four printable characters: A–Z, a–z, 0–9, plus + and /. The trick is arithmetic: every three bytes (24 bits) splits cleanly into four six-bit groups, each of which indexes into the 64-character alphabet. The output is exactly 4/3 the length of the input, padded at the end with = when the byte count is not a multiple of three.

              A URL-safe variant swaps + for - and / for _ so the result can travel inside URLs and filenames without further escaping. JWTs (chapter 22) use the URL-safe form; classic data URIs use the standard form. Both decode the same bytes back.

              It bears repeating, because the misconception is endemic: Base64 is encoding, not encryption. There is no key. Anyone with a browser console can decode it.

              Figure 36.1  —  Watch three bytes become four charactersinteractive
              variant
              one byte: M two bytes: Hi three bytes: Cat longer text unicode: café
              output

              Use cases

              If you ever see a "secured" credential or API key Base64-encoded in a config file or log, that is a security bug, not a security feature. Treat Base64 as readable text, because it is readable text.

              References

              1. RFC 4648 — The Base16, Base32, and Base64 Data Encodings.
              2. MDN — Base64 glossary entry.