Why We Need Vector Search
The mobile applications and web services we use have search functions. Most are developed using the basic text search provided by databases or full-text search provided by search engines like Elasticsearch. Full-Text Search is one of the traditional methods mainly used for searching text data, focusing on finding specific keywords, words, phrases, etc., in documents, web pages, databases, and more. It typically involves inputting keywords or short sentences to search text data and finding documents that match the keywords, but it does not consider context or semantic similarity.
However, due to these characteristics, Full-Text Search has limitations. You've probably had the experience of wanting to find something but struggling because you didn't know the exact name. You can describe the features or what it does, but you keep hitting the search bar because you don't know the name of the target. While Full-Text Search offers synonym processing, similar word search, and typo correction features, there are limitations.
For example, let's look at the following five sentences:
- Happens maybe once in a hundred years
- A 0.001% chance
- Zero chance of success
- Not possible even if you die and come back to life
- Getting struck by lightning consecutively
All five sentences indicate "almost no possibility." However, there isn't a single common word included in all the sentences. Even with Full-Text Search that incorporates similar words and synonym processing, it's nearly impossible to search all the above sentences at once. In other words, Full-Text Search relies only on the words or phrases in the text and cannot consider semantic similarity.
Additionally, there is the limitation that the search target is confined to text. For example, when you input "puppy" as text, you cannot find "puppy" in image files that have no description. The images we find by typing a word into Naver or Google are images included in documents like blogs or news articles where that word is mentioned.
As machine learning technology advances, it has become possible to convert various data such as images, videos, audio, and documents (words, paragraphs) into the same form called vector embeddings. By measuring the distances between these vectors, it's now possible to assess the similarity between the original data, leading to a new search technology called vector search.
Vector Embedding - Representing All Data
Vector search overcomes the limitations of Full-Text Search and allows searching based on meaning. If we represent sentences expressing the meaning of "almost no possibility" as vectors, they are located close to each other. Assuming they are expressed in two-dimensional vectors, it would look like this:
Using vector search offers several advantages:
- Consideration of Semantic Similarity: Full-Text Search relies on the exact match of the search term and does not consider context. However, vector search represents data in vector space, considering semantic similarity. This allows you to find not only documents containing the search term but also semantically similar documents even if they don't contain the search term.
- Easy Synonym Handling: For example, in Full-Text Search, to search for documents containing "vehicle" with the search term "car," you need separate synonym processing. In contrast, vector search naturally handles synonyms because it represents the meaning of words as vectors.
- Searching Various Data Types: Full-Text Search applies only to text data and is difficult to apply to unstructured data like images, audio, and video. Vector search can process different data types by converting them into vector embeddings, making it effective for searching and analyzing different kinds of data.
- Handling Complex Data: Modern data is very complex and has multidimensional characteristics. Full-Text Search finds it difficult to handle such multidimensional data, but vector search represents multidimensional data as vectors, easily handling complex data structures.
- Improved User Experience Through Enhanced Search Quality: Since vector search considers semantic similarity, customers can quickly find information that fits their intent even if they don't input exact search terms.
In summary, vector search is a powerful tool that overcomes the limitations of Full-Text Search and provides better search results by considering the meaning and similarity of data. It is effective in handling various data types and complex data structures, helping customers find information more accurately and quickly. It is an essential technology in modern data analysis and search systems.
Read more
Vector Embedding - Representing All Data
Vector embedding is a concept that converts various forms of data (documents, images, audio, video, etc.) into arrays of numbers to measure similarity. For example, colors can be represented as three-dimensional vector data in RGB format. By calculating the distance between these vector embeddings, we can determine the similarity between data. This plays an important role in natural language processing, recommendation algorithms, and more. Various data can be converted into vectors through Transformer models, allowing us to measure the similarity between different types of data. For instance, it is possible to measure the similarity between the text "cat" and a picture of a cat in vector space.
By Tim Yang|2023-09-13
How Aeca Simplifies Search Development
Search in web and mobile applications is a core function that shapes a positive user experience. Particularly in commerce services, search goes beyond enhancing user experience and directly impacts company revenue. With the explosive growth of product information and content, search quality—providing timely information that matches the keywords entered by customers—has become a critical factor determining the success or failure of applications and websites. Generally, customers searching for products in commerce services are considered strong potential buyers with a high willingness to pay. It's observed that all actions users take when searching and reacting to search results reflect their purchase intentions, needs, and willingness to spend. Statistically, the conversion rate of users who perform searches is more than twice that of those who don't. Although only less than 20% of the total MAU use search, it's known that over half of the revenue comes from users who have performed a search at least once. Additionally, the churn rate is high for users who fail on their first search, and the conversion rate for customers who re-search is very low. In other words, search is not only a powerful tool to open customers' wallets but also a factor that greatly impacts the sustainability of the service. So, how does search specifically contribute to customer retention, revenue growth, and service improvement?
By Tim Yang|2023-09-12