index Similar to a library in a relational database,
index Data storage unit is a general term for data. But in fact, this is just a program perspective, which is implemented inside the ES, each of which
index Equivalent to a namespace that points to one or more
index For better expression, data storage and search It means.
Elasticsearch is based on Lucene, which is based on
inverted index ,
inverted index Can better serve the search.
inverted index (inverted index) can get the unique data
token And then record which documents contain these
see http://en.wikipedia.org/wiki/Inverted_ index Learn more.
shard (fragment) is an example of Lucene, which has a fully functional search engine. One
index It can be composed of one shard, but in most cases it is composed of multiple shards, which allows
index Continue to grow and divide into different machines.
primary shard The main document is the entry,
replica shard Replica sharding is the replica set of the primary shard. It provides the failover when the node where the primary partition is located fails. At the same time, it also increases the read-write throughput.
shard Contains multiple
segment (paragraph), each paragraph is one
inverted index (inverted index), in a fragment, each search will search each segment in turn, and then merge its results into the final result of the fragment, and then return.
Every time a new document is written, elasticsearch collects the new documents and stores them in memory (in the transaction log for security), and then writes a new segment to disk every 1s, and
refreshes To make it searchable.
This makes the data in the new segment visible to the search, but the segment has not yet been synchronized with the disk because the data may still be lost.
Then, every once in a while, elasticsearch will be executed
flush This means that a commit is performed to synchronize segments with the disk, followed by clearing the transaction log, which has been written to disk and is no longer needed.
The more segments, the longer each search takes. Therefore, elasticsearch performs a merge operation in the background, merging a large number of small segments of similar size into a larger segment. The old paragraph will be deleted. When there are a large number of small segments, the process is repeated.
segment It is the smallest particle that cannot be separated. When you update a document, it actually marks the old document as deleted and creates a new index for the new document. The merge process also eliminates the documents marked for deletion in the section.
It's easier to understand when you look at the picture above.
This paper is written by Chakhsu Lau Creation, adoption Knowledge sharing signature 4.0 International license agreement
In addition to the reprint / source, all articles are original or translated. Please sign before reprinting
The last editing time was Jan 15, 2019 at 11:43 PM