Alexandr Yarats, Head of Search at Perplexity - Interview Series

Alexandr Yarats is the Head of Search at Perplexity AI. He began his career at Yandex in 2017, concurrently studying at the Yandex School of Data Analysis. The initial years were intense yet rewarding, propelling his growth to become an Engineering Team Lead. Driven by his aspiration to work with a tech giant, he joined Google in 2022 as a Senior Software Engineer, focusing on the Google Assistant team (later Google Bard). He then moved to Perplexity as the Head of Search.

Perplexity AI is an AI-chatbot-powered research and conversational search engine that answers queries using natural language predictive text. Launched in 2022, Perplexity generates answers using the sources from the web and cites links within the text response.

What initially got you interested in machine learning?

My interest in machine learning (ML) was a gradual process. During my school years, I spent a lot of time studying math, probability theory, and statistics, and got an opportunity to play with classical machine learning algorithms such as linear regression and KNN. It was fascinating to see how you can build a predictive function directly from the data and then use it to predict unseen data. This interest led me to the Yandex School of Data Analysis, a highly competitive machine learning master’s degree program in Russia (only 200 people are accepted each year). There, I learned a lot about more advanced machine learning algorithms and built my intuition. The most crucial point during this process was when I learned about neural networks and deep learning. It became very clear to me that this was something I wanted to pursue over the next couple of decades.

You previously worked at Google as a Senior Software Engineer for a year, what were some of your key takeaways from this experience?

Before joining Google, I spent over four years at Yandex, right after graduating from the Yandex School of Data Analysis. There, I led a team that developed various machine learning methods for Yandex Taxi (an analog to Uber in Russia). I joined this group at its inception and had the chance to work in a close-knit and fast-paced team that rapidly grew over four years, both in headcount (from 30 to 500 people) and market cap (it became the largest taxi service provider in Russia, surpassing Uber and others).

Throughout this time, I had the privilege to build many things from scratch and launch several projects from zero to one. One of the final projects I worked on there was building chatbots for service support. There, I got a first glimpse of the power of large language models and was fascinated by how important they could be in the future. This realization led me to Google, where I joined the Google Assistant team, which was later renamed Google Bard (one of the competitors of Perplexity).

At Google, I had the opportunity to learn what world-class infrastructure looks like, how Search and LLMs work, and how they interact with each other to provide factual and accurate answers. This was a great learning experience, but over time I grew frustrated with the slow pace at Google and the feeling that nothing ever got done. I wanted to find a company that worked on search and LLMs and moved as fast, or even faster, than when I was at Yandex. Fortunately, this happened organically.

Internally at Google, I started seeing screenshots of Perplexity and tasks that required comparing Google Assistant against Perplexity. This piqued my interest in the company, and after several weeks of research, I was convinced that I wanted to work there, so I reached out to the team and offered my services.

Can you define your current role and responsibilities at Perplexity?

I’m currently serving as the head of the search team and am responsible for building our internal retrieval system that powers Perplexity. Our search team works on building a web crawling system, retrieval engine, and ranking algorithms. These challenges allow me to take advantage of the experience I gained at Google (working on Search and LLMs) as well as at Yandex. On the other hand, Perplexity’s product poses unique opportunities to redesign and reengineer how a retrieval system should look in a world that has very powerful LLMs. For instance, it is no longer important to optimize ranking algorithms to increase the probability of a click; instead, we are focusing on improving the helpfulness and factuality of our answers. This is a fundamental distinction between an answer engine and a search engine. My team and I are trying to build something that will go beyond the traditional 10 blue links, and I can’t think of anything more exciting to work on currently.

Can you elaborate on the transition at Perplexity from developing a text-to-SQL tool to pivoting towards creating AI-powered search?

We initially worked on building a text-to-SQL engine that provides a specialized answer engine in situations where you need to get a quick answer based on your structured data (e.g., a spreadsheet or table). Working on a text-to-SQL project allowed us to gain a much deeper understanding of LLMs and RAG, and led us to a key realization: this technology is much more powerful and general than we originally thought. We quickly realized that we could go well beyond well-structured data sources and tackle unstructured data as well.

What were the key challenges and insights during this shift?

The key challenges during this transition were shifting our company from being B2B to B2C and rebuilding our infrastructure stack to support unstructured search. Very quickly during this migration process, we realized that it is much more enjoyable to work on a customer-facing product as you start to receive a constant stream of feedback and engagement, something that we didn’t see much of when we were building a text-to-SQL engine and focusing on enterprise solutions.

Retrieval-augmented generation (RAG) seems to be a cornerstone of Perplexity’s search capabilities. Could you explain how Perplexity utilizes RAG differently compared to other platforms, and how this impacts search result accuracy?

RAG is a general concept for providing external knowledge to an LLM. While the idea might seem simple at first glance, building such a system that serves tens of millions of users efficiently and accurately is a significant challenge. We had to engineer this system in-house from scratch and build many custom components that proved critical for achieving the last bits of accuracy and performance. We engineered our system where tens of LLMs (ranging from big to small) work in parallel to handle one user request quickly and cost-efficiently. We also built a training and inference infrastructure that allows us to train LLMs together with search end-to-end, so they are tightly integrated. This significantly reduces hallucinations and improves the helpfulness of our answers.

With the limitations compared to Google’s resources, how does Perplexity manage its web crawling and indexing strategies to stay competitive and ensure up-to-date information?

Building an index as extensive as Google’s requires considerable time and resources. Instead, we are focusing on topics that our users frequently inquire about on Perplexity. It turns out that the majority of our users utilize Perplexity as a work/research assistant, and many queries seek high-quality, trusted, and helpful parts of the web. This is a power law distribution, where you can achieve significant results with an 80/20 approach. Based on these insights, we were able to build a much more compact index optimized for quality and truthfulness. Currently, we spend less time chasing the tail, but as we scale our infrastructure, we will also pursue the tail.

How do large language models (LLMs) enhance Perplexity’s search capabilities, and what makes them particularly effective in parsing and presenting information from the web?

We use LLMs everywhere, both for real-time and offline processing. LLMs allow us to focus on the most important and relevant parts of web pages. They go beyond anything before in maximizing the signal-to-noise ratio, which makes it much easier to tackle many things that were not tractable before by a small team. In general, this is perhaps the most important aspect of LLMs: they enable you to do sophisticated things with a very small team.

Looking ahead, what are the main technological or market challenges Perplexity anticipates?

As we look ahead, the most important technological challenges for us will be centered around continuing to improve the helpfulness and accuracy of our answers. We aim to increase the scope and complexity of the types of queries and questions we can answer reliably. Along with this, we care a lot about the speed and serving efficiency of our system and will be focusing heavily on driving serving costs down as much as possible without compromising the quality of our product.

In your opinion, why is Perplexity’s approach to search superior to Google’s approach of ranking websites according to backlinks, and other proven search engine ranking metrics?

We are optimizing a completely different ranking metric than classical search engines. Our ranking objective is designed to natively combine the retrieval system and LLMs. This approach is quite different from that of classical search engines, which optimize the probability of a click or ad impression.

Thank you for the great interview, readers who wish to learn more should visit Perplexity AI.