profile

Machine Learning, RecSys, LLMs, Engineering.

Eugene Yan

Building machine learning systems @ Amazon. Writing about ML, RecSys, and LLMs @ eugeneyan.com. Join 6,000+ readers!

Featured Post

Experimenting with LLMs to Research, Reflect, and Plan

Hi friends, I've been playing with LLMs the past couple of weekends to build a simple discord bot that can summarize URLs, explain content to a 5-year old, run SQL & google search queries, and provide advice based on my personal board of mentors. It's been epic fun and I can't wait to share my experience and lessons with you below. As always, feedback welcome! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version...

about 1 year ago • 11 min read

Hey friends, This week, I share about how I built Tara, a simple AI coach that I can talk to. I was initially skeptical of voice as a modality—Siri and other voice assistants didn't work so well for me. But after building Tara, I'm fully convinced. The post includes a phone line to Tara. Enjoy! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras & images) 👈 I suffer from monkey mind, chronic imposter...

about 1 month ago • 3 min read

Hey friends, I've been thinking a lot about evals lately, and trying dozens of them to understand which correlate best with actual use cases. In this write-up, I share an opinionated take on what doesn't really work and what does, focusing on classification, summarization, translation copyright regurgitation, and toxicity. I hope this saves you time figuring out your evals! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best...

about 1 month ago • 21 min read

Hey friends, This week we discuss how to overcome the bottleneck of human data and annotations—Synthetic Data. We'll see how we can apply distillation and self-improvement across the three stages of model training (pretraining, instruction-tuning, preference-tuning). Enjoy! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras & images) 👈 It is increasingly viable to use synthetic data for pretraining,...

3 months ago • 24 min read

Hey friends, A short post of my 2023 in review. I'll go through the goals I set in 2022, some highlights for the year (e.g., diving into language modeling), goals for 2024, and some stats from 2023. If you have a review of your own, please reply to this with it—I'd love to read it. I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web version has extras & images) 👈 2023 was a peaceful year of small, steady steps. There...

5 months ago • 6 min read

Hey friends, I've been digging into various push notification systems from companies like Duolingo, Twitter, Pinterest, and LinkedIn and it's been fascinating. On the surface, push seems similar to regular recsys. But as we dig deeper, we see that the user experience and implementation differs a fair bit. In this piece, we'll discuss how push differs from regular recsys, and how we choose what items to push or not push, and how often to push. Hope you're having a great vacation and a good...

5 months ago • 13 min read

Hey friends, This week we dig into an interesting finding where finetuning on Wikipedia data actually helps with detecting hallucinations on news data. We'll go through it stage-by-stage and evaluate the model after each step to get a better understanding of what happens. I hope you'll find this as exciting to read and it was for me when I ran the experiments.P.S., The recording for my ai.engineer talk was just released! You can watch it here. I appreciate you receiving this, but if you want...

6 months ago • 7 min read

Hey friends, It's been a while since my last email. That's because I was busy preparing for my talk at the inaugural AI Engineer Summit in San Francisco. Here are the slides and transcript. While there, I got the chance to chat with folks on the edge of building and deploying LLM products and learned a few interesting insights. For one, evals and cost are the biggest challenges in deployment, while guardrails and caching were less prioritized. Also, code assistants will likely become more...

7 months ago • 6 min read

Hey friends, This week we dive into evals for abstractive summarization. Yea, summarization sounds like a boring use case. But I think it's pretty versatile and can be used to summarize world events, your week's eating and exercise habits, and journaling activity. Nonetheless, I've learned that it's pretty challenging to evaluate the quality of a summary. Enjoy! I appreciate you receiving this, but if you want to stop, simply unsubscribe. • • • 👉 Read in browser for best experience (web...

8 months ago • 16 min read

Hey friends, This week, we've a short post that's mostly a follow-up on my previous write-up on LLM patterns. Since the last post, I've gotten questions on which patterns to apply to different problems, so this post tries to clarify that. Hope you find it helpful! I appreciate you receiving this, but if you want to stop, simply unsubscribe. 👉 Read in browser for best experience (web version has extras & images) 👈 After my previous write-up on LLM patterns, I’ve received questions on how to...

9 months ago • 9 min read

Hey friends, It's been a while since my last email and that's because today's post (Design Patterns for LLM Systems & Products) took waaay longer than I expected. What I had imagined to be a 3,000 word write-up grew to 12,000+ words—as I researched more into these patterns, there's was just more and more to dig into and write about. Thus, because today's piece is so long, I've only included the introduction section, with a link to the full post. Enjoy! I appreciate you receiving this, but if...

10 months ago • 1 min read
Share this page