Why Did OpenAI Acquire Rockset?
Introduction
On June 21, 2024, OpenAI announced the acquisition of database startup Rockset. According to OpenAI, the background of the Rockset acquisition is to improve search infrastructure to make AI more useful. Specifically, what advantages led OpenAI to acquire Rockset?
Rockset, a Leader in Real-Time Data Analysis
First, let's find out what kind of company Rockset is. Rockset is a startup developing databases, founded in 2016 by Venkat Venkataramani and Dhruba Borthakur. CEO Venkataramani worked at Facebook and Oracle before founding Rockset, and CTO Borthakur developed RocksDB at Facebook.
RocksDB, developed by Facebook and provided as open source, is a key-value database specialized for real-time data processing. Rockset also uses RocksDB code, and Aeca also utilizes RocksDB.
In Korea, Rockset's recognition is relatively low compared to other databases like Snowflake and MongoDB, but it has emerged as a unicorn company, receiving a total of $109 million in investments from prominent investors such as Greylock Partners, Sequoia Capital, and Glynn Capital.
Rockset puts forth the world's fastest real-time search and analytics as its core value. It collects data from data stores like Kafka, MongoDB, DynamoDB, and S3, indexes it in real-time, stores it in Rockset, and allows real-time search, filtering, and vector search using SQL. Technically, it is an OLAP database specialized in analytics in NoSQL using a document store.
Real-time data analysis and search can be used effectively in various fields. For example, in live commerce businesses that need to recommend the most relevant videos by tracking the videos viewers are watching, comments, and products they are browsing in real-time, real-time analysis of user data is a key element of growth strategy.
However, to process the enormous amount of data that changes every hour in real-time, you need to continuously update the changed data and index it for search, which consumes a lot of time and cost. Especially if you're using Elasticsearch, managing clusters requires tremendous resources, which is one of the processes that many developers find difficult. The problems that Rockset aims to solve are these arduous tasks involved in real-time data search and analysis.
OpenAI needed Rockset's database, which can instantly process the enormous data generated in real-time. At first glance, it might seem that they would need a vector database product, which can be called the external memory of LLM, but why did they choose Rockset?
Data Infrastructure for RAG
When you input prompts containing words like "recent" or "latest" in GPT-4o, the LLM doesn't answer immediately. It first performs a search and provides an answer based on the search results. Answering about the latest or specialized areas that the LLM hasn't learned through search-based responses is called Retrieval-Augmented Generation (RAG). Before generation, you need to quickly search vast amounts of data to provide context to the LLM. Rockset's powerful search performance can quickly find the desired information from enormous data, minimizing delays and improving the response speed of RAG.
Additionally, by utilizing Rockset's scalable architecture to seamlessly integrate new types of data and continuously accumulating data from various sources, you can quickly build a backend infrastructure optimized for AI applications, including RAG.
In contrast, most vector databases focus only on storing and searching vector data. However, real-world data doesn't exist only as vectors; it includes textual meta or structured data, binary data like images, and structured data like JSON, and there's a need to fuse these data to extract appropriate information. Due to these demands, vector databases are also adding features to filter metadata, but from the perspective of databases and search engines, this implementation becomes more burdensome than beneficial. In this aspect, Aeca's direction is similar to Rockset's. Ultimately, we can presume that these factors influenced OpenAI to choose Rockset rather than a vector database company.
Enhancing AI Model Performance
The technology to process vast amounts of data quickly and accurately can enhance the competitiveness of AI models in itself. The performance of AI models is determined by how much and how suitable data they have learned. OpenAI can use Rockset's database to instantly index and query data pouring in real-time from various sources, quickly refining it into data optimized for learning. In other words, they can train ChatGPT faster and more efficiently, maintaining the model's up-to-dateness. Through the acquisition of Rockset, they can further strengthen their core business of LLM development competitiveness.
OpenAI's Market Expansion
Through the acquisition of Rockset, OpenAI can expand into markets that cannot be accessed with just LLMs. For example, in areas with high dependence on the latest information, such as financial trading systems where transactions occur every moment, security fields like real-time anomaly detection, and live commerce where tracking customer behavior is important, the integration of databases and AI models is essential. Especially, the combination of real-time data analysis and AI models will play a crucial role in maximizing business intelligence functions to create new value, which is expected to provide OpenAI with sufficient competitiveness to secure new customers in the B2B domain.
Conclusion
We have examined what kind of company Rockset, acquired by OpenAI, is and why they acquired it. Ultimately, even if LLMs learn vast amounts of data, databases and search engines still play a core role, and perhaps their importance may become greater than in the past.
Read more
An AI Database for RAG (Retrieval Augmented Generation)
You can easily create RAG (Retrieval Augmented Generation) with just one AI database without complex infrastructure setup.
By Tim Yang|2023-12-11
How Aeca Simplifies Search Development
Search in web and mobile applications is a core function that shapes a positive user experience. Particularly in commerce services, search goes beyond enhancing user experience and directly impacts company revenue. With the explosive growth of product information and content, search quality—providing timely information that matches the keywords entered by customers—has become a critical factor determining the success or failure of applications and websites. Generally, customers searching for products in commerce services are considered strong potential buyers with a high willingness to pay. It's observed that all actions users take when searching and reacting to search results reflect their purchase intentions, needs, and willingness to spend. Statistically, the conversion rate of users who perform searches is more than twice that of those who don't. Although only less than 20% of the total MAU use search, it's known that over half of the revenue comes from users who have performed a search at least once. Additionally, the churn rate is high for users who fail on their first search, and the conversion rate for customers who re-search is very low. In other words, search is not only a powerful tool to open customers' wallets but also a factor that greatly impacts the sustainability of the service. So, how does search specifically contribute to customer retention, revenue growth, and service improvement?
By Tim Yang|2023-09-12