Skip to content

Indices with "_source.enabled: false" same size as indices with "_source.enabled: true" #41628

Closed
@davemoore-

Description

@davemoore-

Elasticsearch version: 7.0.0

Plugins installed: []

JVM version: OpenJDK 1.8.0_191

OS version: Ubuntu 16.04 (or Elastic Cloud)

Description of the problem including expected versus actual behavior:

When setting _source.enabled: false in the index mapping, the _source should not be stored.

In 7.0.0, when two indices have identical data and mappings (except for one having _source.enabled: false), the indices will be almost exactly the same size. This isn't the expected behavior.

In 6.7.1, when two indices with identical data and mappings (except for one having source.enabled: false), the index with _source.enabled: false is roughly half the size of the one with _source enabled. This is the expected behavior.

Steps to reproduce:

Overview:

  1. Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.

  2. Create two index templates with identical mappings, but let the second template use _source.enabled: false. Put these two index templates in both clusters.

  3. Load data into the two indices on both clusters.

  4. Force merge the indices to a single segment.

  5. Compare the "Storage Size" of the two indices in Kibana for each cluster: /app/kibana#/management/elasticsearch/index_management/indices

More detailed:

Create the following templates and pipelines in the 7.0.0 cluster:

PUT _template/logs
{
  "index_patterns": ["logs"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "agent": {
        "type": "text"
      },
      "auth": {
        "type": "keyword"
      },
      "bytes": {
        "type": "long"
      },
      "clientip": {
        "type": "ip"
      },
      "httpversion": {
        "type": "double"
      },
      "ident": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "referrer": {
        "type": "keyword"
      },
      "request": {
        "type": "keyword"
      },
      "response": {
        "type": "long"
      },
      "verb": {
        "type": "keyword"
      }
    }
  }
}
PUT _template/logs-nosource
{
  "index_patterns": ["logs-nosource"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "agent": {
        "type": "text"
      },
      "auth": {
        "type": "keyword"
      },
      "bytes": {
        "type": "long"
      },
      "clientip": {
        "type": "ip"
      },
      "httpversion": {
        "type": "double"
      },
      "ident": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "referrer": {
        "type": "keyword"
      },
      "request": {
        "type": "keyword"
      },
      "response": {
        "type": "long"
      },
      "verb": {
        "type": "keyword"
      }
    }
  }
}
PUT _ingest/pipeline/logs
{
  "description": "Ingest pipeline for logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [
          "dd/MMM/yyyy:HH:mm:ss XX"
        ]
      }
    },
    {
      "remove": {
        "field": "timestamp"
      }
    }
  ]
}

Create the following indices and templates in the 6.7.1 cluster:

PUT _template/logs
{
  "index_patterns": ["logs"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "agent": {
          "type": "text"
        },
        "auth": {
          "type": "keyword"
        },
        "bytes": {
          "type": "long"
        },
        "clientip": {
          "type": "ip"
        },
        "httpversion": {
          "type": "double"
        },
        "ident": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "referrer": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "response": {
          "type": "long"
        },
        "verb": {
          "type": "keyword"
        }
      }
    }
  }
}
PUT _template/logs-nosource
{
  "index_patterns": ["logs-nosource"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "agent": {
          "type": "text"
        },
        "auth": {
          "type": "keyword"
        },
        "bytes": {
          "type": "long"
        },
        "clientip": {
          "type": "ip"
        },
        "httpversion": {
          "type": "double"
        },
        "ident": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "referrer": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "response": {
          "type": "long"
        },
        "verb": {
          "type": "keyword"
        }
      }
    }
  }
}
PUT _ingest/pipeline/logs
{
  "description": "Ingest pipeline for logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [
          "dd/MMM/yyyy:HH:mm:ss ZZ"
        ]
      }
    },
    {
      "remove": {
        "field": "timestamp"
      }
    }
  ]
}

Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the "logs" and "logs-nosource" indices on both clusters.

Force merge the indices to a single segment.

Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the "logs-nosource" index being roughly half the size of the "logs" index.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions