An AI Database for RAG (Retrieval Augmented Generation)

Posted on December 11, 2023

Intro

The Cognica Search Demo provided on the Cognica website is a demo of Retrieval Augmented Generation (RAG). The sample data consists of approximately 380 million embeddings from the entire Korean and English Wikipedia datasets. It's about 1.1TB in size, and we store newly created information from Wikipedia or edited content within existing documents in the Cognica database in real-time, providing the most up-to-date data.

Despite the large data size, it's being served not on the cloud but on a basic-spec Mac Mini home server over a household Wi-Fi network. We're processing data with just one Cognica instance without any distributed processing like sharding. It can be used with very low server costs.

Understanding Retrieval Augmented Generation (RAG)

The RAG model appears similar to GPT on the surface, but instead of the LLM providing answers based on the information it has learned, the retrieval model searches data from external storage, and the generation model provides answers based on this retrieved data. Since the answer is based on retrieved content rather than learned content, you can increase the reliability of the answer and handle real-time information.

Retrieval: Returns the most appropriate search results for the user's question from data stored in a database or search engine.
Generation: Generates new sentences, answers, summaries, etc., using the retrieved information.

In particular, proprietary data held by a company is difficult to train on due to security issues, but by using RAG, you can perform Q&A even if the content is not learned. Since the retrieval model searches and delivers real-time updated information such as news and social media, you can ensure the recency of the response.

The Core of RAG: The Retrieval Model

The core of RAG is retrieval. When a user asks a question, you need to find the content most appropriate to the user's intent and deliver it as context to the LLM. Building a retrieval model that faithfully provides search results "matching the user's intent" is not simple.

With Full-Text Search (FTS) alone, it's difficult to grasp context. A representative issue is synonym processing; for example, when searching for a frontend developer JD by searching for "web development," even though they have the same meaning, the search may not yield results. Conversely, irrelevant results may appear just because they contain the same words.

To compensate for these limitations, Vector Search is used. By using an embedding model to convert text (sentences or paragraphs) into vector embeddings in the form of arrays, and calculating the distances between vector embeddings, the closer they are, the more similar they are judged to be. The drawback of Vector Search is that even slight parameter differences can significantly skew the search results.

In conclusion, using only FTS or Vector Search makes it difficult to find search results that consider context. Hybrid Search, which calculates the final scoring by weighting the results of both FTS and Vector Search, is emerging as a methodology for retrieval models.

Implementing such a high-level RAG inevitably makes the infrastructure complex. You need to synchronize the database that stores the original data with a Full-Text Search engine and a Vector Database, and especially if real-time processing is required, complex tasks are needed to perform searches immediately upon transaction occurrence. For smooth service, you also need to use cache databases like Redis. Not only is the cost high, but it also takes a lot of time. After deployment, numerous DevOps tasks are needed for stable operation and monitoring.

The requirements for the retrieval model in RAG are as follows:

Real-Time: Processes data addition, modification, and deletion in real time.
Transaction: Maintains the reliability and consistency of the data to be searched (effectively manages data and tracks changes, allowing rollback or recovery to a previous state if necessary).
Low Latency: Searches large amounts of data in storage without delay.
Semantic Search: Searches for information similar to the user's intent by considering context, not just simple keyword matching.
Scalability: Effectively manages load caused by an increase in the number of users or during peak times.

Implementing RAG with Just One AI Database

What if you could build RAG, which requires such a diverse and complex tech stack, with just one product? By unifying the tech stack, you can achieve overwhelming cost and time savings, leading to enhanced service competitiveness.

Cognica is a product that provides the tech stack needed for AI software development within a single database.

With Cognica, you can implement RAG easily and quickly without the need for multiple products. It is an OLTP database that also provides OLAP, Full-Text Search, Vector Search, and Time to Live. By storing original data, embedding data, etc., in one storage without the need to distribute and synchronize them across multiple storages, you can achieve overwhelming development speed.

The retrieval model of RAG needs to be continuously improved as the search targets and environment change. Unlike traditional inefficient Hybrid Search that performs FTS and Vector Search separately and then aggregates them, Cognica provides FTS and Vector Search in a single query. This allows you to quickly tune search results by changing weights in real time.

In addition to adjusting the weights of FTS and Vector Search, you can also improve the quality of Vector Search by changing the weights among embedding models applied to Vector Search. Unlike existing products that require creating multiple collections for each embedding model used, Cognica integrates multiple embedding models into one collection and can dramatically improve search quality by assigning scoring among embedding models.

Conclusion

RAG is gaining attention as a methodology to complement the limitations of LLMs, but building a retrieval model to implement effective RAG incurs significant costs and time.

Cognica is an 'AI database' that provides all the necessary tech stacks—Real-Time, Transactions, FTS, Vector Search, Cache, etc.—allowing you to develop RAG easily and simply with just one product.

Why Vector Databases Are Essential for Recommendation Systems

Many services are introducing recommendation systems to increase user retention time in modern applications, and this is an important factor directly related to sales, especially in content and e-commerce sectors. Recommendation systems analyze user behavior to understand their interests and provide related items, thereby increasing retention time and inducing purchases. How can vector databases be utilized in this context?

By Tim Yang|2023-09-16

Why We Need Vector Search

The mobile applications and web services we use have search functions. Most are developed using basic text search provided by databases or full-text search provided by search engines like Elasticsearch. Full-Text Search is one of the traditional methods mainly used for searching text data, focusing on finding specific keywords, words, phrases, etc., in documents, web pages, databases, and more. It typically involves inputting keywords or short sentences to search text data and finding documents that match the keywords, but it does not consider context or semantic similarity.

By Tim Yang|2023-09-14

Tags:

#Retrieval Augmented Generation