Frameworks
Articles
Pinterest
Airbnb
Netflix
Stripe
Etsy
Uber
LinkedIn
Stitch Fix
Booking.com
Spotify
Papers
Microsoft - almost all available from their excellent website
Best papers
Other good references
LinkedIn
Google
Netflix
Facebook
DoorDash
Other
Good ideas for A/B test experiments from the papers
- Combined experiments. Per-user with x% chance of seeing treatment
- Metrics definitions as discussed in the Microsoft paper
- Learning experiments where you intentionally degrade the experience to see how it affects the baseline
- Digging deeper into adoption and retention
- Surprising results should be replicated
- Risk of focusing on small changes is incrementalism. Should be tried to get some high ROI but also some big bets for audacious goals
- Changes rarely have a big impact to key metrics – corollary: only ~10% of experiments have any positive result
- Metrics improvements should be diluted to their segment size (mobile -> overall MW)
- Borderline significant results should be tentative and experiments rerun to verify
- If the result is so good (e.g. 8 sd from mean) then check again even if statistically significant
- Best to test yourself as many explanations for amazing A/B test results are wrong
- Can quantify latency’s effects by artificially slowing down a site
- Can we do a speed-up experiment by allowing response and delaying loading social column?
- Reducing abandonment is hard, shifting clicks is easy
- Delta method instead of bootstrap when data sizes are large (See Casella & Berger - Statistical Inference or Wasserman - All of Statistics)
- ANOVA as an alternative to t-tests when comparing the means of more than 2 samples
- Check experiment groups religiously for equal sizes and variances via A/A tests
- Check for browser-specific bugs
- Filter out users who didn’t even reach the page you have the treatment on for lower variance and better power (aka triggering)
- Check for server-related caching issues