Notes on data & AI engineering
Practical, no-fluff writing on the things I build for clients — retrieval-augmented generation, event tracking, experimentation, MLOps, and shipping AI features that hold up in production.
CUPED explained: faster A/B tests with variance reduction
A practical explanation of CUPED for A/B testing: how variance reduction works, when it helps, and what to watch before trusting the results.
Dashboard design for executives: clarity before charts
How to design executive dashboards that leaders actually use: decision framing, metric hierarchy, context, drill-downs, speed, and trust.
An AI workflow automation playbook for operations teams
How to find, scope, and ship reliable AI workflow automations for operations: intake, triage, enrichment, routing, reporting, human review, and observability.
Marketing attribution with first-party data
How to build practical marketing attribution with first-party events, UTMs, ad platform data, CRM stages, revenue, and transparent assumptions.
A dbt analytics engineering checklist for trustworthy metrics
A dbt checklist for analytics engineering: sources, staging models, marts, tests, documentation, naming, performance, and dashboard ownership.
How to choose a vector database for RAG
A practical guide to choosing a vector database for RAG: pgvector, Pinecone, Weaviate, Qdrant, filtering, hybrid search, scale, and operations.
Product analytics metrics that actually matter
How to choose product analytics metrics that support decisions: activation, retention, adoption, conversion, guardrails, and north-star metrics.
Server-side tracking explained for analytics and attribution
What server-side tracking is, when it helps, when it adds unnecessary complexity, and how to design it for cleaner analytics and attribution.
A RAG evaluation checklist for production AI systems
A practical checklist for evaluating RAG systems: retrieval relevance, source coverage, grounded answers, citations, abstention, latency, and feedback loops.
LLM evaluation: what to measure before an AI feature ships
A production-focused guide to LLM evaluation: golden datasets, groundedness, retrieval quality, refusal behavior, latency, cost, and regression tests.
How to reduce LLM hallucinations in production
Practical techniques to reduce LLM hallucinations: retrieval grounding, citations, evaluation harnesses, output guardrails, and knowing when to make the model say 'I don't know'.
5 A/B testing mistakes that quietly ruin your results
Peeking, sample-ratio mismatch, underpowered tests, ignored guardrails, and multiple comparisons — the common A/B testing mistakes that lead to confident but wrong decisions.
From notebook to production: an MLOps checklist
A practical MLOps checklist for shipping machine learning models to production: reproducible training, deployment, monitoring, evaluation, retraining, and cost control.
What is RAG? A practical guide to Retrieval-Augmented Generation
A plain-English guide to Retrieval-Augmented Generation (RAG): what it is, how the pipeline works, where it beats fine-tuning, and how to keep answers grounded and accurate.
How to design an event tracking plan that scales
A practical framework for designing an event tracking plan: naming conventions, schema versioning, governance, and validation that keep your analytics clean as you grow.