Understanding elasticsearch data persistence model

in Notes with 5 comments



Elasticsearch's index Similar to a library in a relational database, index Data storage unit is a general term for data. But in fact, this is just a program perspective, which is implemented inside the ES, each of which index Equivalent to a namespace that points to one or more shards

Why index For better expression, data storage and search It means.

inverted index

Elasticsearch is based on Lucene, which is based on inverted index , inverted index Can better serve the search. inverted index (inverted index) can get the unique data Terms or token And then record which documents contain these Terms and token

see http://en.wikipedia.org/wiki/Inverted_ index Learn more.


One shard (fragment) is an example of Lucene, which has a fully functional search engine. One index It can be composed of one shard, but in most cases it is composed of multiple shards, which allows index Continue to grow and divide into different machines.

primary shard The main document is the entry, replica shard Replica sharding is the replica set of the primary shard. It provides the failover when the node where the primary partition is located fails. At the same time, it also increases the read-write throughput.


each shard Contains multiple segment (paragraph), each paragraph is one inverted index (inverted index), in a fragment, each search will search each segment in turn, and then merge its results into the final result of the fragment, and then return.

Every time a new document is written, elasticsearch collects the new documents and stores them in memory (in the transaction log for security), and then writes a new segment to disk every 1s, and refreshes To make it searchable.

This makes the data in the new segment visible to the search, but the segment has not yet been synchronized with the disk because the data may still be lost.

Then, every once in a while, elasticsearch will be executed flush This means that a commit is performed to synchronize segments with the disk, followed by clearing the transaction log, which has been written to disk and is no longer needed.

The more segments, the longer each search takes. Therefore, elasticsearch performs a merge operation in the background, merging a large number of small segments of similar size into a larger segment. The old paragraph will be deleted. When there are a large number of small segments, the process is repeated.

segment It is the smallest particle that cannot be separated. When you update a document, it actually marks the old document as deleted and creates a new index for the new document. The merge process also eliminates the documents marked for deletion in the section.


It's easier to understand when you look at the picture above.

  1. wwaf

    Homeboy benefits: https://6wll.com

  2. goods!
    Please comment on at least one Chinese character

  3. gfghh


    1. @gfghh

      Test response

      I'm Zhashui. If I'm a brother, I'll come to gnaw me

      1. @MIAOCUNFA

        Regular URL is not fun.