Description
Elasticsearch version: 7.0.0
Plugins installed: []
JVM version: OpenJDK 1.8.0_191
OS version: Ubuntu 16.04 (or Elastic Cloud)
Description of the problem including expected versus actual behavior:
When setting _source.enabled: false
in the index mapping, the _source
should not be stored.
In 7.0.0, when two indices have identical data and mappings (except for one having _source.enabled: false
), the indices will be almost exactly the same size. This isn't the expected behavior.
In 6.7.1, when two indices with identical data and mappings (except for one having source.enabled: false
), the index with _source.enabled: false
is roughly half the size of the one with _source
enabled. This is the expected behavior.
Steps to reproduce:
Overview:
-
Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.
-
Create two index templates with identical mappings, but let the second template use
_source.enabled: false
. Put these two index templates in both clusters. -
Load data into the two indices on both clusters.
-
Force merge the indices to a single segment.
-
Compare the "Storage Size" of the two indices in Kibana for each cluster:
/app/kibana#/management/elasticsearch/index_management/indices
More detailed:
Create the following templates and pipelines in the 7.0.0 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss XX"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}
Create the following indices and templates in the 6.7.1 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}
PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}
PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss ZZ"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}
Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the "logs"
and "logs-nosource"
indices on both clusters.
Force merge the indices to a single segment.
Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the "logs-nosource"
index being roughly half the size of the "logs"
index.