Build a distributed logging stack (ELK / Loki) (12 scenes)
Scene 04 · String vs map — fields and labels
A log line is either a string parsed at read-time or a typed map parsed at write-time, and the two systems we'll meet attach different names to the same idea — fields in ELK, labels in Loki.
Previously
The agent survived the backend outage by spilling to disk — but every line it spilled was still just a string. Before we ask the backend to index this stuff, we have to decide whether a log line is text or a typed record.
Scene 04
String vs map — fields and labels
Diagram
Left half (unstructured): the raw log line as a STRING; a regex cursor scrubs left-to-right looking for `user_id=42`, and a clock badge ticks the read-side cost — every query pays it again. Right half (structured): the same line as a JSON MAP with named keys; `user_id` resolves via an O(1) hash lookup, no read clock — but a small write-side clock badge appears, because the JSON had to be marshalled at emit time. Below (revealed at slider position 2): a 4-row, 3-column Rosetta-stone panel. Column 1 is the concept; column 2 is what ELK calls it; column 3 is what Loki calls it. Row 2 ("body") greys out the Loki cell with a `not in index` tag — a tease, not the topic.
One line, emitted twice. On the LEFT, it lands as an opaque string; the regex cursor scrubs across it looking for `user_id=42`, and the read-clock ticks — every character is work. On the RIGHT, the same line is a JSON map; `user_id` is a key, the hash lookup is instant, and the read-clock stays silent. Watch the asymmetry before we touch a slider.
Implementation
App.emit_unstructured
the line is built as a STRING — work deferred to read time
1def handle_request(uid, path, code):2 # printf-style: format chars into one opaque blob3 line = f"user_id={uid} path={path} status={code}"4 log.info(line)56# later, every query pays the parse cost:7def query(term):8 for line in scan_log_files():9 if re.search(term, line): # regex over bytes10 yield line
App.emit_structured
slog/zap-style: the line is a typed MAP — paid once at write
1def handle_request(uid, path, code):2 # named slots, typed values — JSON-marshalled at emit3 log.info(4 "checkout_failed",5 "user_id", uid,6 "path", path,7 "status", code,8 )910# later, every query is an O(1) hash lookup on the key:11def query(key, value):12 return index[key].get(value, [])