Blog

Notes on data & AI engineering

Practical, no-fluff writing on the things I build for clients — retrieval-augmented generation, event tracking, experimentation, MLOps, and shipping AI features that hold up in production.

ExperimentationJune 4, 20266 min read

CUPED explained: faster A/B tests with variance reduction

A practical explanation of CUPED for A/B testing: how variance reduction works, when it helps, and what to watch before trusting the results.

Data VisualizationMay 28, 20266 min read

Dashboard design for executives: clarity before charts

How to design executive dashboards that leaders actually use: decision framing, metric hierarchy, context, drill-downs, speed, and trust.

AI AutomationMay 21, 20267 min read

An AI workflow automation playbook for operations teams

How to find, scope, and ship reliable AI workflow automations for operations: intake, triage, enrichment, routing, reporting, human review, and observability.

AnalyticsMay 14, 20267 min read

Marketing attribution with first-party data

How to build practical marketing attribution with first-party events, UTMs, ad platform data, CRM stages, revenue, and transparent assumptions.

Analytics EngineeringMay 7, 20266 min read

A dbt analytics engineering checklist for trustworthy metrics

A dbt checklist for analytics engineering: sources, staging models, marts, tests, documentation, naming, performance, and dashboard ownership.

LLM EngineeringApril 30, 20267 min read

How to choose a vector database for RAG

A practical guide to choosing a vector database for RAG: pgvector, Pinecone, Weaviate, Qdrant, filtering, hybrid search, scale, and operations.

AnalyticsApril 23, 20266 min read

Product analytics metrics that actually matter

How to choose product analytics metrics that support decisions: activation, retention, adoption, conversion, guardrails, and north-star metrics.

AnalyticsApril 16, 20266 min read

Server-side tracking explained for analytics and attribution

What server-side tracking is, when it helps, when it adds unnecessary complexity, and how to design it for cleaner analytics and attribution.

LLM EngineeringApril 9, 20266 min read

A RAG evaluation checklist for production AI systems

A practical checklist for evaluating RAG systems: retrieval relevance, source coverage, grounded answers, citations, abstention, latency, and feedback loops.

LLM EngineeringApril 2, 20267 min read

LLM evaluation: what to measure before an AI feature ships

A production-focused guide to LLM evaluation: golden datasets, groundedness, retrieval quality, refusal behavior, latency, cost, and regression tests.

LLM EngineeringMarch 18, 20265 min read

How to reduce LLM hallucinations in production

Practical techniques to reduce LLM hallucinations: retrieval grounding, citations, evaluation harnesses, output guardrails, and knowing when to make the model say 'I don't know'.

ExperimentationMarch 5, 20266 min read

5 A/B testing mistakes that quietly ruin your results

Peeking, sample-ratio mismatch, underpowered tests, ignored guardrails, and multiple comparisons — the common A/B testing mistakes that lead to confident but wrong decisions.

MLOpsFebruary 26, 20266 min read

From notebook to production: an MLOps checklist

A practical MLOps checklist for shipping machine learning models to production: reproducible training, deployment, monitoring, evaluation, retraining, and cost control.

LLM EngineeringFebruary 10, 20267 min read

What is RAG? A practical guide to Retrieval-Augmented Generation

A plain-English guide to Retrieval-Augmented Generation (RAG): what it is, how the pipeline works, where it beats fine-tuning, and how to keep answers grounded and accurate.

AnalyticsJanuary 22, 20266 min read

How to design an event tracking plan that scales

A practical framework for designing an event tracking plan: naming conventions, schema versioning, governance, and validation that keep your analytics clean as you grow.