Hi, I’m Eugene Yan. I work at the intersection of consumer data & tech to build machine learning systems to help customers. I also write about how to be effective in data science, learning, and career.
Currently, I’m an Applied Scientist at Amazon helping users read more, and get more out of reading. We build book recommendation systems and contribute to efforts in discovery (e.g., search). Previously, I led the data science team at Lazada (acquired by Alibaba in 2016) and worked on e-commerce ML systems (e.g., ranking, automation, fraud detection).
What does an average day look like?
First, a disclaimer: Even for people with the same title (i.e., applied scientist), the average day will look different. It will also vary with the project’s lifecycle, such as research, prototyping, development, and maintenance. My role mostly involves prototyping and development.
My day usually has these buckets of activities:
- Stand-up: The team checks in on what’s being worked on, blocked, or needs help.
- Data science/coding: This includes (i) literature research, (ii) exploring and preparing data, (iii) running offline experiments, (iv) building prototypes (and giving demos), (v) writing and reviewing production code, and (vi) launching A/B tests.
- Writing: I write documents (e.g., one-pagers, design docs) to share ideas and get feedback. I also document my methodology, decisions, and experiment results for future reference. Writing my ideas and findings in documents makes them more scalable.
- Reading: Reading papers helps me to be a more effective data scientist. Thus, I try to read at least an hour a day. The content includes internal/external articles and papers. (I have a bias towards papers on applied machine learning.)
- Meetings: Not the most enjoyable activity for me. Nonetheless, meetings are essential for coordination and communication. A 30-min meeting beats days (or weeks) of email back and forth.
What does a non-average day look like?
I’m struggling with this question as most days don’t seem average. Nonetheless, here are some exceptional events that may come up:
- High severity incidents: This includes critical system failures, sometimes with customer-facing impact. Fortunately, this seldom happens for our team (i.e., less than a handful of times a year).
- Migrating legacy systems: All code eventually becomes legacy code. In a previous role, I was involved in a massive migration (migrating cloud providers, data and machine learning systems) that required us to drop everything and solely focus on migration.
- Attending conferences: Many great conferences are online now which is great. Attending them requires having to balance between work and the conference.
What’s your favourite part about the job?
I really enjoy working with data. Through data (e.g., search logs, clickstreams, transactions), we understand our customers and how they interact with our platform and products. The data reveals interesting patterns in human behaviour. For example, consumption changes due to life-stages (e.g., becoming a parent) and socio-economic events (e.g., COVID-19, work from home). By understanding our customers better, we can serve them better.
I also enjoy applying machine learning. While data helps us understand customers, there’s far too much for a person (or even a large team) to process. Machine learning (algorithms) can help with this. For example, machine learning helps (i) automatically classify products and audit product reviews (ii) identify fraudulent sellers and products, (iii) recommend products to customers given their historical preferences, etc. Data and machine learning helps to write software 2.0.
Another aspect I enjoy is the amount of leverage working in a consumer tech company (e.g., Lazada, Amazon) provides. Our team can build and deploy machine learning systems to help customers around the world. It scales well too. Most of the system doesn’t need to change from country to country. Some necessary changes include using local data and adapting to local regulations (e.g., privacy). I get a huge kick from seeing customers benefit from our work (we see this through metrics and anecdotes).
What’s your least favourite part about being an Applied Scientist?
I’m still learning about how to manage this, but sometimes, I spend more time than I would like writing documents and in meetings. Nonetheless, it’s essential for socialising ideas and getting buy-in and feedback. I just wish I was more effective and faster at it.
Occasionally, stakeholders suggest solutions that are far more complex than it needs to be. I blame the overhyping of technology and machine learning in the media. When this happens, our team patiently tries to understand their perspective and educate them. Nonetheless, it takes considerable time and effort and distracts us from work that helps customers.
Lastly, because my work revolves around data, I’m also constrained by access to high-quality data. Delays happen now and then. Sometimes, it’s a minor lack of permissions which takes a few hours to a few days to resolve. Other times, we find that our system isn’t tracking a specific field and we need to update our trackers and wait a few months, or backfill the data.
You’ve worked in both Singapore and the United States – are there any cultural differences you’ve noticed in the workplace that would necessarily impact your day to day?
While there are some cultural differences, they don’t affect the day-to-day. For example (and this is likely a stereotype), Asians are more reserved while Americans are more outgoing—something like this doesn’t affect my work. I think the organization and team culture matters a lot more—this is independent of the country. Before deciding to join a team, it’s important to interact with them to get a feel of the culture.
Do you think more people in tech would benefit from having a humanities background? You studied Psychology as an undergrad. What are some ways in which that has helped you in Data Science?
Having a humanities background is associated with certain traits: Being more open-minded, critical thinking, better problem framing, research skills, and the ability to communicate with laymen. I think such traits would benefit everyone, not just tech folks. While a humanities degree helps with cultivating these traits, there are plenty of other ways—it can also come from having the opportunity to work on diverse, challenging problems, good role models, and work experience.
Other than the traits mentioned above, my Psychology degree taught me how to analyse qualitative and quantitative data. It also taught me about statistics (and how to be skeptical of it). In addition, I learned about how people perceive, think, and behave; this helps when I’m building customer-facing machine learning features.
How is working at a big company like Amazon different to working at a startup like Lazada? Do you enjoy one more than the other?
I’m still fairly new in Amazon. Nonetheless, I think Amazon’s more like a group of start-ups (rather than a big company). For example, each AWS service seems to operate like a start-up. In that sense, my experience so far has been similar to working in Lazada. We’re constantly experimenting, shipping, and getting feedback from customers. Nonetheless, being a global company, Amazon does provide slightly more leverage (see “What’s your favourite part about the job”).
I enjoy—and work best in—a role that’s between commando and soldier. Both my experience in Lazada and Amazon allow me to do this which plays to my strengths.
You also have a masters in CS. Did you even consider going down the Software Engineering route?
Nope. I really enjoy working with data and machine learning to build useful systems and products for customers. The Masters in CS was essential to improve my understanding of the fundamentals so I could be a better data scientist, especially when developing and maintaining production systems.
Something I’ve noticed is that you’ve given a few talks and like to speak at meetups. How much do you think strong communication plays a role in being a successful data scientist? How can younger professionals cultivate that skill?
Communication is one of the most important—if not the most important—skill for an effective data scientist. Initially, I didn’t think this way. But I reached out to several mentors asking what the most important skill for a data scientist was and guess what—it was communication. Thus, I focused on improving my communication and saw gains in my effectiveness within a year. (Ahmed wrote a great thread summarising my views.)
I think the best way to improve communication is through practice. At the start, it’s useful to read about the fundamentals of good writing and speaking—this arms us with knowledge from the experts. But to really get better, we need to practice.
How can we practice? Offer to write documents at work. This can be in the form of proposals, design documents, or internal newsletters. Or write about personal projects or what we learn on a blog. To practice speaking, offer to share at meet-ups or conferences about work-related or personal projects. With everything online now, it’s much easier.