Three suggestions for vector search based on Elastic search

background

We tested three schemes:

Elasticsearch+Elastiknn: es+plug-in mode, test the time consumption and throughput of exact and lsh respectively. Link: Elastic search vector search plug-in research and testing
Elasticsearch+has its own vector search capability. The test only has exact time consumption and throughput. Links: Elasticsearch's official vector search pressure test
OpenSearch+has its own vector search capability, which tests the time consumption and throughput of exact and hnsw respectively. Links: OpenSearch vector search pressure test

Third party overview of different plug-ins:

2022-02-24T09:34:37.png

conclusion

stay 10 concurrent+lasting for 10 seconds The following conclusions can be drawn

We can see that the recall rate of LSH is the lowest among the three, only 85%. It can basically give up the application of this algorithm in business. The LVSPQ plug-in open source community basically does not maintain or consider it;
In the ES+elastiknn mode, the average time for exact is 10ms, which is low. The plug-in itself has done vector optimization, which can be applied to business;
With the native implementation of ES+, the exact takes an average of 70ms, is usable, and is time insensitive to business friendly. Why not optimize vectors like plug-ins in ES 7.14.1? Because the current version 7.14.1 relies on Lucene 8.9.0, vector optimization falls to Lucene. This version of Lucene is not good enough for vector search support, but according to the development record of Lucene 9, it will provide better support for vectors. If you upgrade to 9, Exact should be able to achieve the same time consuming as the plug-in, and then es 8.0 will also have native support for hnsw. The future version of es 8+is a version that is more friendly to vector search support and has more related functions;
The opensearch knn scheme is the closest to Alibaba Cloud and Baidu Cloud. By comparing the contents of their respective documents, it is reasonable to believe that Alibaba and Baidu have borrowed the code of OS knn for their implementation, so I think OS knn is equal to Alibaba Cloud or Baidu Cloud's scheme, because OS is based on the 7.10.2 of ES, Lucene is also 8.9.0. Its exact time consumption is slightly lower than that of es 7.14, with an average time consumption of 90ms;
The opensearch knn hnsw scheme is excellent. Under the 100w+256 dimensional data volume, the average time consumption can reach 10ms. According to the recall rate test conducted by a third party, 95%+. In terms of large data volume, time consumption sensitive, accurate and sensitive business, hnsw is the best scheme. Its shortcomings are also obvious, and it eats memory very much.

In combination with our business scenario, the similarity search includes the current database, historical database and fixed database. No matter what kind of database, the data volume is relatively small, ranging from 0 to 10w.

The elasticsearch+elasticjnn+exact solution can meet the current needs, but the subsequent new version updates are basically stopped;
Opensearch+knn+exact is slightly inferior to elasticsearch+exact, and because opensearch lags behind elasticsearch both in terms of version iteration and community activity, opensearch+knn+exact is not considered here;
opensearch + knn + hnsw， It is more suitable for business scenarios with large amounts of data, which is not suitable here;
The advantage of elasticsearch+exact is that subsequent version upgrades can directly enjoy community iterations, especially the 8.0 version, which will have more support for knn related optimizations and functions.

proposal

1. At this stage, the scheme of elasticsearch 7.14.1+elasticjnn 7.14.1.2+exact is recommended;
2. The use of the elastic search 8+version can be used as an optimization point for related business services in subsequent upgrade iterations;
3. Opensearch is not considered because the community maintenance is backward and relatively passive.

The following is the part of Elasticsearhc 8.0 about knn support
2022-02-24T09:38:47.png

link

Alibaba Cloud plug-in: https://help.aliyun.com/document_detail/145062.html
Baidu Cloud Plug in: https://cloud.baidu.com/doc/BES/s/Rke3o8qos
Third party plug-ins: https://elastiknn.com/
Official support: https://www.elastic.co/guide/en/elasticsearch/reference/master/dense-vector.html
OpenSearch: https://opensearch.org/docs/latest/search-plugins/knn/index/

Some links for understanding:

https://www.sofastack.tech/blog/antfin-zsearch-vector-search/

This article is written by Chakhsu Lau Creation, adoption Knowledge Sharing Attribution 4.0 International License Agreement.
All articles on this website are original or translated by this website, except for the reprint/source. Please sign your name before reprinting.