“Hot-Warm” Architecture in Elasticsearch 5.x

When using elasticsearch for larger time data analytics use cases, we recommend using time-based indices and a tiered architecture with 3 different types of nodes (Master, Hot-Node and Warm-Node), which we refer to as the "Hot-Warm" architecture. Each node has their own characteristics, which are described below.

Master nodes

We recommend running 3 dedicated master nodes per cluster to provide the greatest resilience. When employing these, you should also have the discovery.zen.minimum_master_nodes setting at 2, which prevents the possibility of a "split-brain" scenario. Utilizing dedicated master nodes, responsible only for handling cluster management and state enhances overall stability. Because they do not contain data nor participate in search and indexing operations they do not experience the same demands on the JVM that may occur during heavy indexing or long, expensive searches. And therefore are not as likely to be affected by long garbage collection pauses. For this reason they can be provisioned with CPU, RAM and Disk configurations much lower than those that would be required for a data node.

Hot nodes

This specialized data node performs all indexing within the cluster. They also hold the most recent indices since these generally tend to be queried most frequently. As indexing is a CPU and IO intensive operation, these servers need to be powerful and backed by attached SSD storage. We recommend running a minimum of 3 Hot nodes for high availability. Depending on the amount of recent data you wish to collect and query though, you may well need to increase this number to achieve your performance goals.

Warm nodes

This type of data nodes are designed to handle a large amount of read-only indices that are not as likely to be queried frequently. As these indices are read-only, warm nodes tend to utilize large attached disks (usually spinning disks) instead of SSDs. As with hot nodes, we recommend a minimum of 3 Warm nodes for high availability. And as before, with the caveat that larger amounts of data may require additional nodes to meet performance requirements. Also note that CPU and memory configurations will often need to mirror those of your hot nodes. This can only be determined by testing with queries similar to what you would experience in a production situation.

The Elasticsearch cluster needs to know which servers contain the hot nodes and which server contain the warm nodes. This can be achieved by assigning arbitrary attributes to each server.

For instance, you could tag the node with node.attr.box_type: hot in elasticsearch.yml, or you could start a node using ./bin/elasticsearch -Enode.attr.box_type=hot

The nodes on the warm zone are "tagged" with node.attr.box_type: warm in elasticsearch.yml or you could start a node using ./bin/elasticsearch -Enode.attr.box_type=warm

The box_type attribute is completely arbitrary and you could name it whatever you like. These arbitrary values will be used to tell Elasticsearch where to allocate an index.

We can ensure today's index is on the hot nodes that utilize the SSD's by creating it with the following settings:

PUT /logs_2016-12-26
{
  "settings": {
    "index.routing.allocation.require.box_type": "hot"
  }
}

After few days if the index no longer needs to be on the most performant hardware, we can move it to the nodes tagged as warm by updating its index settings:

PUT /logs_2016-12-26/_settings 
{ 
  "settings": { 
    "index.routing.allocation.require.box_type": "warm"
  } 
}

Now how can we achieve that using logstash or beats:

If the index template is being managed at the logstash or beats level the index templates should be updated to include allocation filtering. The "index.routing.allocation.require.box_type" : "hot" setting will cause any new indices to be created on the hot nodes.

Example:

{
  "template" : "indexname-*",
  "version" : 50001,
  "settings" : {
             "index.routing.allocation.require.box_type": "hot"
 ...

Another strategy is to add a generic template for any index in the cluster, "template": "*" that creates new indices in hot nodes.

Example:

{
  "template" : "*",
  "version" : 50001,
  "settings" : {
           "index.routing.allocation.require.box_type": "hot"
 ...

When you have determined an index is not undergoing writes and is not being searched frequently it can be migrated from the hot nodes to the warm nodes. This can be accomplished by simply updating its index settings: "index.routing.allocation.require.box_type" : "warm". Elasticsearch will automatically migrate the indices over to the warm nodes.

Finally we can also enable better compression on all the warm data nodes by setting index.codec: best_compression in the elasticsearch.yml. When data is moved to warm nodes we can call the _forcemerge API in order to merge segments: not only will it save memory, disk-space and file handles by having fewer segments, but it will also have the side-effect of rewriting the index using this new best_compression codec.

It would be a bad idea to force merge the index while it was still allocated to the strong boxes, as the optimization process will swamp the I/O on those nodes and impact the indexing speed of today's logs. But the medium boxes aren't doing much, so it is safe to force merge them.

Now that we have seen how to manually change the shard allocation of an index, let's look at how to use one of our tools called Curator to automate the process.

In the example below we are using curator 4.2 to move the indexes from hot nodes to warm nodes after 3 days:

actions:
  1:
    action: allocation
    description: "Apply shard allocation filtering rules to the specified indices"
    options:
      key: box_type
      value: warm
      allocation_type: require
      wait_for_completion: true
      timeout_override:
      continue_if_exception: false
      disable_action: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

Finally we can use curator to force merge the index. Ensure you wait long enough for the reallocation to finish before running optimize. You can do that by setting wait_for_completion in action 1 or change the unit_count to select indices older than 4 days in the action 2, so they've had a chance to fully migrate before a force merge.

  2:
    action: forcemerge
    description: "Perform a forceMerge on selected indices to 'max_num_segments' per shard"
    options:
      max_num_segments: 1
      delay:
      timeout_override: 21600 
      continue_if_exception: false
      disable_action: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

Note timeout_override should be increased the default is 21600 seconds, but it may go faster or slower, depending on your setup