Frameworks

Articles

Pinterest

Airbnb

Netflix

Twitter

Stripe

Etsy

Uber

LinkedIn

Stitch Fix

Booking.com

Spotify

Papers

Microsoft - almost all available from their excellent website

Best papers

Other good references

LinkedIn

Google

Netflix

Facebook

DoorDash

Other

Good ideas for A/B test experiments from the papers

  • Combined experiments. Per-user with x% chance of seeing treatment
  • Metrics definitions as discussed in the Microsoft paper
  • Learning experiments where you intentionally degrade the experience to see how it affects the baseline
  • Digging deeper into adoption and retention
  • Surprising results should be replicated
  • Risk of focusing on small changes is incrementalism. Should be tried to get some high ROI but also some big bets for audacious goals
  • Changes rarely have a big impact to key metrics – corollary: only ~10% of experiments have any positive result
  • Metrics improvements should be diluted to their segment size (mobile -> overall MW)
  • Borderline significant results should be tentative and experiments rerun to verify
  • If the result is so good (e.g. 8 sd from mean) then check again even if statistically significant
  • Best to test yourself as many explanations for amazing A/B test results are wrong
  • Can quantify latency’s effects by artificially slowing down a site
  • Can we do a speed-up experiment by allowing response and delaying loading social column?
  • Reducing abandonment is hard, shifting clicks is easy
  • Delta method instead of bootstrap when data sizes are large (See Casella & Berger - Statistical Inference or Wasserman - All of Statistics)
  • ANOVA as an alternative to t-tests when comparing the means of more than 2 samples
  • Check experiment groups religiously for equal sizes and variances via A/A tests
  • Check for browser-specific bugs
  • Filter out users who didn’t even reach the page you have the treatment on for lower variance and better power (aka triggering)
  • Check for server-related caching issues