All scenes
Build a Service Mesh (Envoy / Istio style)
13 scenes · ~91 min · build the primitive
Build your own Service Mesh (Envoy / Istio style)
Every microservice request crosses two proxies. This curriculum is what they do: routing, load balancing, timeout-and-retry-budget, circuit breakers, outlier detection, token-bucket rate limits, mTLS with workload identity, and a control plane that streams config to all of them. Build it in the order the production problems show up — and feel why Envoy plus a control plane has eaten the east-west world.
- 01Fifty services, fifty broken retry policiesEvery team picks its own retry, timeout, breaker, and mTLS library. One slow dependency turns into a fleet-wide outage.~7 min
- 02The sidecar — one proxy per podPut a small proxy next to every service. App talks to localhost; cross-service traffic flows sidecar to sidecar. The fleet of sidecars is the service mesh.~7 min
- 03L4 vs L7 — bytes or requestsAn L4 proxy forwards opaque TCP bytes; an L7 proxy parses HTTP and can act on path, method, and headers. The mesh is L7 for everything that follows.~7 min
- 04Listener and route — bind and matchA listener accepts on a port; an ordered route table picks a destination on the first match. Reorder the rules and the same request lands somewhere else.~7 min
- 05Cluster and load balancing — pick one of manyBehind one destination name is a cluster of replicas; round-robin, least-request, or ring-hash decides who serves each request. Round-robin is the wrong default under heterogeneous latency.~7 min
- 06Timeout and retry budget — bounded patienceNaive multi-hop retries amplify load 243x on a failing backend. A retry budget caps total retries as a fraction of normal traffic so retries can't become the outage.~7 min
- 07Circuit breaker — the state machineClosed → open → half-open. Fast-fail to a known-broken dependency and periodically probe for recovery, so callers stop wasting resources on guaranteed failures.~7 min
- 07aOutlier detection — eject one bad replicaDon't trip the whole cluster — pull just the misbehaving replica from the pool. Passive (real 5xx) catches what active /healthz probes miss.~7 min
- 08Rate limiting — the token bucketPer-client token bucket: each request takes a token, an empty bucket returns 429. Local is cheap and drifts; global stays exact via a coordinator.~7 min
- 09mTLS — identity for both sidesTLS proves the server, mTLS proves both. The control plane mints short-lived certs that carry a stable workload identity — pod IPs are not identities.~7 min
- 10Control plane and data plane — config over gRPCSidecars (data plane) handle traffic; the control plane (Istiod) streams listener/route/cluster/cert config via xDS. Kill the control plane and traffic keeps flowing.~7 min
- 11Trace and span — stitching one user requestEvery sidecar emits a span tagged with the trace id from `traceparent`. One forgotten header rebuild breaks the trace silently — RED metrics keep flowing regardless.~7 min
- 12Design canvas — configure the meshFour workloads, every knob from the prior scenes. Each verifier note cites the scene that earned it.~7 min