#13Notification System
Push, email, SMS. Idempotent. Failover.
Build a multi-channel transactional notification platform — push (APNs/FCM), email (SES/SendGrid), SMS (Twilio/Sinch). Producers (your own services or merchant integrations) POST a notification request with an Idempotency-Key; the platform fans out to the user's preferred channels under the user's preferences and quiet hours, retries on transient provider failures, dedupes end-to-end so no duplicate ever reaches the recipient, and survives a full-region failover without sending the same message twice. This is the system that powers "your driver is arriving" SMS, "order shipped" email, "2FA code" push, and the merchant-webhook fan-out that other systems lean on. Three things define it: (1) end-to-end idempotency under at-least-once Kafka and retried HTTP, (2) per-channel bulkheading so one provider's outage doesn't cascade, and (3) active-region fencing so failover never duplicates.
Reading: Uber — Real-time Push Platform (uber.com/blog/real-time-push-platform) · LinkedIn — Air Traffic Controller: Member-First Notifications (2016) · LinkedIn — Incremental cooperative rebalancing in Kafka (KIP-429) · Stripe — Idempotency Keys + Webhooks docs · Brandur — Implementing Stripe-like Idempotency Keys in Postgres · Slack — Scaling Slack's Job Queue (slack.engineering) · Slack — Outage on January 4th 2021 retro · Cloudflare — Outage on June 21 2022 retro · AWS Builders' Library — Timeouts, retries, backoff with jitter · AWS re:Invent ARC403 — Idempotency at Scale (2021) · AWS re:Invent DAT328 — DynamoDB Global Tables (2022) · Netflix — Active-Active for Multi-Regional Resiliency (2013) · Twilio — A2P 10DLC Registration docs · Apple — APNs HTTP/2 provider API docs · Firebase — FCM scaling guide · Campbell & Majors — Database Reliability Engineering (ch. 9) · Beyer et al. — SRE Workbook (Managing Load + Cascading Failures)
end-to-end idempotency
per-channel bulkheading
active-region fencing tokens
vendor failover
DLQ + dedupe TTL ordering
cooperative consumer-group rebalance
TCPA opt-out semantics