Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify ElasticSearch index template #33

Closed
chschs opened this issue Mar 25, 2014 · 16 comments
Closed

Specify ElasticSearch index template #33

chschs opened this issue Mar 25, 2014 · 16 comments

Comments

@chschs
Copy link

chschs commented Mar 25, 2014

I'm using fluentd with the in_syslog plugin and elasticsearch plugin to get syslog into elasticsearch, with a kibana frontend.

One of the problems I'm having though, is that the fields are indexed in elasticsearch so when I add a terms dashboard in kibana to give me, say, the top-10 hostnames, hostnames with dashes in them are broken up. so mysql-test-01 would come across as three hostnames: mysql, test, and 01.

Logstash got around this issue by making a "raw" version of several fields that is set to not-analyzed upon creation, so that you can run your dashboards against that instead.

More information here: http://www.elasticsearch.org/blog/logstash-1-3-1-released/

With syslog messages going into ES with this plugin, I'm finding that I'd like to have a "raw" or non-analyzed host (hostname) field and ident field (gives me the application). Unfortunately right now both of those fields are analyzed and it's messing with our dashboards.

@lxfontes
Copy link

Hey @chschs, have you tried adding a mapping template to change the index settings ?

example

{
  "mappings": {
    "_default_": {
      "_all": { "enabled": false }, 
      "_source": { "compress": true },
      "properties" : {
        "event_data": { "type": "object", "store": "no" },
        "@fields": { "type": "object", "dynamic": true, "path": "full" }, 
        "@message": { "type": "string", "index": "analyzed" },
        "@source": { "type": "string", "index": "not_analyzed" },
        "@source_host": { "type": "string", "index": "not_analyzed" },
        "@source_path": { "type": "string", "index": "not_analyzed" },
        "@tags": { "type": "string", "index": "not_analyzed" },
        "@timestamp": { "type": "date", "index": "not_analyzed" },
        "@type": { "type": "string", "index": "not_analyzed" }    
      }   
    }
  },
  "settings": {
    "index.cache.field.type" : "soft",
    "index.refresh_interval": "5s",
    "index.store.compress.stored": true,
    "index.number_of_shards": "3", 
    "index.query.default_field": "querystring", 
    "index.routing.allocation.total_shards_per_node": "2"
  }, 
  "template": "logstash-*"
}

This will be used every time a new index with the 'logstash-*' pattern is created

@ajturner
Copy link

ajturner commented Aug 5, 2014

+1 to make this part of this plugin. While we can manually modify the mapping, why require that overhead in an application to update this when Fluent-logstash is already creating the original mapping.

@ajturner
Copy link

ajturner commented Aug 5, 2014

Looking at the code - Fluent is not actually creating or modifying the index and merely writing to the current index. This would have to detect that a new index is being created and subsequently call to update the mapping.

Now I understand what you mean by indices templates. Perhaps worth adding this to the Readme

@pitr
Copy link
Contributor

pitr commented Aug 5, 2014

@ajturner good point, added 139184e

@pitr pitr closed this as completed Aug 5, 2014
@openfirmware
Copy link

I was able to use a custom index based off the one logstash uses to auto-generate .raw versions of fields.

I started by deleting the current day's index ($ curl -XDELETE localhost:9200/logstash-2014.09.02) then using that curl PUT command to set the defaults for the index. I then restarted fluentd and the raw fields were available. You can check if the settings are sticking in ElasticSearch:

$ curl localhost:9200/logstash-2014.09.02/_mapping?pretty

@stanhu
Copy link

stanhu commented May 4, 2015

+1 for making this built into the plugin.

This is the template used by logstash:

https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json

@aleks-v-k
Copy link

There is a valuable reason to make index template support built into the plugin: in a containerized environment we do not know when elasticsearch container will start and we have to PUT index template to it before fluentd will send data. There are some possible workarounds but every of them looks really ugly. So +1 for making built in support of index templates. @pitr
Since version 2.x elasticsearch supports index templates creation only via API

@ssergiienko
Copy link

+1 for making built in support of index templates.

@pitr
Copy link
Contributor

pitr commented Feb 13, 2016

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch:

  1. Check if the index exists
  2. If not, create the index and the mapping

Seems like that might impact the performance. What do you think @aleks-v-k @ssergiienko? What about deleting old indices?

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

@stanhu
Copy link

stanhu commented Feb 13, 2016

The gem would only need to write the index templates once at startup since a wildcard match could be used. As seen in the Logstash template, just adding something like logstash-* should take care of things.

@pitr pitr changed the title Possible to mirror logstash's "raw" not_analyzed field? Specify ElasticSearch index template Feb 16, 2016
@pitr
Copy link
Contributor

pitr commented Feb 16, 2016

Hmm, sounds like a good idea then

@aleks-v-k
Copy link

@pitr Thanks for reply and excuse me for late answer,

To implement this behaviour, this gem would need to do the following additional work before writing a record to elasticsearch

I suppose there is some sort of initialization point in fluent plugin system, so it may be done there: not for each exact index, but for a group of indexes as @stanhu had mentioned.

Can you provide more information about how you run ElasticSearch in "containerized environment"? Is this something Cloudlinux is working on (I'm not familiar with their offerings)?

It is a group of hosts, every of them runs docker daemon and has a pair of fluentd&elasticsearch containers to collect and access logs. It is not for CloudLinux OS, It is a part of Kuberdock project which uses kubernetes. We control configuration of elasticsearch and fluentd containers by our own docker images, but not manually start and stop them. They automatically run on every host added into a kuberdock cluster.

@MrMMorris
Copy link

MrMMorris commented Apr 15, 2016

would love to see this implemented. Using the tag log-opt in docker with {{.ImageName}} results in tags with -'s and :'s. I need to set the tag field to not_analyzed so I can properly search for docker images, but having to do it through the ES API goes against my want to keep everything in source control.

@F21
Copy link

F21 commented Jul 4, 2016

Any chance of this happening? Would love to contribute, but don't have enough time to learn Ruby at the moment.

aerickson pushed a commit to aerickson/fluent-plugin-elasticsearch that referenced this issue Aug 25, 2016
aerickson pushed a commit to aerickson/fluent-plugin-elasticsearch that referenced this issue Aug 25, 2016
@aerickson
Copy link

aerickson commented Aug 26, 2016

I've implemented this and it's working for me.

#194

@pitr
Copy link
Contributor

pitr commented Sep 12, 2016

implemented with #194 thanks @aerickson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests