Why Vector Databases Are Essential for Recommendation Systems
In modern applications like mobile and web, one of the important KPIs is to increase users' retention time. To boost this retention, many services strive to implement recommendation systems. Particularly in content and e-commerce, the quality of a recommendation system is a factor directly linked to sales. By analyzing user behavior to discover interests, and continuously showing items that users might be interested in—even if they don't search for them—services increase retention time and induce purchases. In fact, 35% of Amazon's sales and over 70% of Netflix's video views come from recommendations.
However, recommendation systems are as challenging to implement as search functions. They require substantial capital, including securing excellent development talent and utilizing numerous solutions. Therefore, high-quality recommendation systems are often considered the exclusive domain of well-funded companies. Essentially, search and recommendation are not significantly different in that they both involve finding similar targets. Search aims to find results that best match the query, while recommendation provides the most suitable results based on user information; thus, the two functions are similar.
To implement a recommendation system, more information is needed. For OTT services like Netflix, data such as viewing history, content preferred by users with similar tastes, favorite actors, and genres are required. For e-commerce platforms like Coupang, data like product purchase information, browsing history, search keywords, customer's age and gender, and purchase cycles are necessary. By aggregating this data to determine similarity, the system recommends items that customers might be interested in.
To assess similarity by comparing vast amounts of information in a single dimension, it's common to convert various types of data into vector embeddings and analyze the similarity between embeddings. A representative example is Airbnb. Airbnb maximizes reservation rates by providing real-time personalized accommodations to users, with bookings through recommendations accounting for 99% of all reservations. They built an accommodation embedding technology for real-time personalization, resulting in algorithms using vector embeddings that increased click-through rates by 21%.
Now, since vast amounts of data need to be converted into embeddings and similarities compared—and above all, recommendations must be made immediately to users—the system requires real-time capabilities. Not only is embedding conversion technology necessary, but also processes for storing vector embeddings, searching after storage, and updating when changes occur. In temporary studies or research, strict management of vector embeddings may not be crucial, but for services targeting end users, it's essential to manage incoming new products and customer information in a trackable form at every moment. This is why a vector database is needed.
Read more
Why We Need Vector Search
The mobile applications and web services we use have search functions. Most are developed using basic text search provided by databases or full-text search provided by search engines like Elasticsearch. Full-Text Search is one of the traditional methods mainly used for searching text data, focusing on finding specific keywords, words, phrases, etc., in documents, web pages, databases, and more. It typically involves inputting keywords or short sentences to search text data and finding documents that match the keywords, but it does not consider context or semantic similarity.
By Tim Yang|2023-09-14
Vector Embedding - Representing All Data
Vector embedding is a concept that converts various forms of data (documents, images, audio, video, etc.) into arrays of numbers to measure similarity. For example, colors can be represented as three-dimensional vector data in RGB format. By calculating the distance between these vector embeddings, we can determine the similarity between data. This plays an important role in natural language processing, recommendation algorithms, and more. Various data can be converted into vectors through Transformer models, allowing us to measure the similarity between different types of data. For instance, it is possible to measure the similarity between the text "cat" and a picture of a cat in vector space.
By Tim Yang|2023-09-13