background
-
Elasticsearch+Elastiknn: es+plug-in mode, test the time consumption and throughput of exact and lsh respectively. Link: Elastic search vector search plug-in research and testing -
Elasticsearch+has its own vector search capability. The test only has exact time consumption and throughput. Links: Elasticsearch's official vector search pressure test -
OpenSearch+has its own vector search capability, which tests the time consumption and throughput of exact and hnsw respectively. Links: OpenSearch vector search pressure test
conclusion
-
We can see that the recall rate of LSH is the lowest among the three, only 85%. It can basically give up the application of this algorithm in business. The LVSPQ plug-in open source community basically does not maintain or consider it; -
In the ES+elastiknn mode, the average time for exact is 10ms, which is low. The plug-in itself has done vector optimization, which can be applied to business; -
With the native implementation of ES+, the exact takes an average of 70ms, is usable, and is time insensitive to business friendly. Why not optimize vectors like plug-ins in ES 7.14.1? Because the current version 7.14.1 relies on Lucene 8.9.0, vector optimization falls to Lucene. This version of Lucene is not good enough for vector search support, but according to the development record of Lucene 9, it will provide better support for vectors. If you upgrade to 9, Exact should be able to achieve the same time consuming as the plug-in, and then es 8.0 will also have native support for hnsw. The future version of es 8+is a version that is more friendly to vector search support and has more related functions; -
The opensearch knn scheme is the closest to Alibaba Cloud and Baidu Cloud. By comparing the contents of their respective documents, it is reasonable to believe that Alibaba and Baidu have borrowed the code of OS knn for their implementation, so I think OS knn is equal to Alibaba Cloud or Baidu Cloud's scheme, because OS is based on the 7.10.2 of ES, Lucene is also 8.9.0. Its exact time consumption is slightly lower than that of es 7.14, with an average time consumption of 90ms; -
The opensearch knn hnsw scheme is excellent. Under the 100w+256 dimensional data volume, the average time consumption can reach 10ms. According to the recall rate test conducted by a third party, 95%+. In terms of large data volume, time consumption sensitive, accurate and sensitive business, hnsw is the best scheme. Its shortcomings are also obvious, and it eats memory very much.
-
The elasticsearch+elasticjnn+exact solution can meet the current needs, but the subsequent new version updates are basically stopped; -
Opensearch+knn+exact is slightly inferior to elasticsearch+exact, and because opensearch lags behind elasticsearch both in terms of version iteration and community activity, opensearch+knn+exact is not considered here; -
opensearch + knn + hnsw, It is more suitable for business scenarios with large amounts of data, which is not suitable here; -
The advantage of elasticsearch+exact is that subsequent version upgrades can directly enjoy community iterations, especially the 8.0 version, which will have more support for knn related optimizations and functions.
proposal
link
-
Alibaba Cloud plug-in: https://help.aliyun.com/document_detail/145062.html -
Baidu Cloud Plug in: https://cloud.baidu.com/doc/BES/s/Rke3o8qos -
Third party plug-ins: https://elastiknn.com/ -
Official support: https://www.elastic.co/guide/en/elasticsearch/reference/master/dense-vector.html -
OpenSearch: https://opensearch.org/docs/latest/search-plugins/knn/index/