Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocValuesField \"value\" is too large, must be <= 32766 #595

Closed
kevgrig opened this issue Oct 28, 2017 · 23 comments
Closed

DocValuesField \"value\" is too large, must be <= 32766 #595

kevgrig opened this issue Oct 28, 2017 · 23 comments

Comments

@kevgrig
Copy link

kevgrig commented Oct 28, 2017

Hi, I'm calling reset! on my index and it's throwing the following exception. Chewy is not providing details on how to investigate this:

$ bin/rails console
[...]
irb(main):001:0> UserIndex.reset!
[...]
D, [2017-10-28T13:58:24.834739 #7851] DEBUG -- :   Chewy::Stash::Specification Import (8.5ms) {:index=>1}
Chewy::ImportFailed: Import failed for `Chewy::Stash::Specification` with:
    Index errors:
      `{"type"=>"illegal_argument_exception", "reason"=>"DocValuesField \"value\" is too large, must be <= 32766"}`
        on 1 documents: ["user"]

        from chewy (0.10.1) lib/chewy/type/import.rb:88:in `import!'
        from chewy (0.10.1) lib/chewy/index/specification.rb:25:in `lock!'
        from chewy (0.10.1) lib/chewy/index/actions.rb:199:in `reset!'
        from (irb):1
        from railties (5.1.4) lib/rails/commands/console/console_command.rb:62:in `start'
        from railties (5.1.4) lib/rails/commands/console/console_command.rb:17:in `start'
        from railties (5.1.4) lib/rails/commands/console/console_command.rb:97:in `perform'
        from thor (0.20.0) lib/thor/command.rb:27:in `run'
        from thor (0.20.0) lib/thor/invocation.rb:126:in `invoke_command'
        from thor (0.20.0) lib/thor.rb:387:in `dispatch'
        from railties (5.1.4) lib/rails/command/base.rb:63:in `perform'
        from railties (5.1.4) lib/rails/command.rb:44:in `invoke'
        from railties (5.1.4) lib/rails/commands.rb:16:in `<top (required)>'
        from bin/rails:4:in `require'
        from bin/rails:4:in `<main>'

Nothing in my models with "value":

$ grep -c "\"value\"" db/schema.rb 
0

Index definition: https://github.com/myplaceonline/myplaceonline_rails/blob/master/app/lib/user_index.rb

@pyromaniac
Copy link
Contributor

Hm, interesting, you've got a huge index definition, so it can't be stored in Chewy::Stash. Try to fix the definition of Chewy::Stash, this field https://github.com/toptal/chewy/blob/master/lib/chewy/stash.rb#L13 needs an additional option doc_values: false

@kevgrig
Copy link
Author

kevgrig commented Oct 29, 2017

@pyromaniac Yes, I have nearly 400 models and over 3000 fields. It looks like disabling doc_values means that I can't sort or aggregate on fields. I'm not sure what the Chewy stash is; is sorting or aggregate on the value of a stash ever useful?

kevgrig added a commit to myplaceonline/chewy that referenced this issue Oct 29, 2017
kevgrig added a commit to myplaceonline/chewy that referenced this issue Oct 29, 2017
@kevgrig
Copy link
Author

kevgrig commented Oct 29, 2017

I added doc_values: false but now I'm getting the following error when two different models have a field of the same name but different type:

Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [service_location] cannot be changed from type [keyword] to [text]"}],"type":"illegal_argument_exception","reason":"mapper [service_location] cannot be changed from type [keyword] to [text]"},"status":400}
        from elasticsearch-transport (5.0.4) lib/elasticsearch/transport/transport/base.rb:202:in `__raise_transport_error'
        from elasticsearch-transport (5.0.4) lib/elasticsearch/transport/transport/base.rb:319:in `perform_request'
        from elasticsearch-transport (5.0.4) lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
        from elasticsearch-transport (5.0.4) lib/elasticsearch/transport/client.rb:131:in `perform_request'
        from elasticsearch-api (5.0.4) lib/elasticsearch/api/namespace/common.rb:21:in `perform_request'
        from elasticsearch-api (5.0.4) lib/elasticsearch/api/actions/indices/create.rb:86:in `create'
        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-5321cf0e6450/lib/chewy/index/actions.rb:63:in `create!'
        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-5321cf0e6450/lib/chewy/index/actions.rb:128:in `purge!'
        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-5321cf0e6450/lib/chewy/index/actions.rb:195:in `reset!' [...]

I define types roughly like this:

all_models.each do |klass|
  string_columns = []
  text_columns = []
  integer_columns = []

  klass.columns.each do |column|
    if column.type == :string
      string_columns.push(column)
    elsif column.type == :text
      text_columns.push(column)
    elsif column.type == :integer && column.name.index("_id").nil?
      integer_columns.push(column)
    end
  end
  
  define_type klass do
    string_columns.each do |column|
      field(column.name.to_sym, type: "keyword")
    end
    
    text_columns.each do |column|
      field(column.name.to_sym, type: "text")
    end
    
    integer_columns.each do |column|
      field(column.name.to_sym, type: "integer", include_in_all: false)
    end
  end
  
  klass.update_index("user#" + klass.name.underscore.downcase, :self)
end

I have one model that has a service_location field of type string and another of type text.

If I change the field call to use keyword instead of text, everything works.

@pyromaniac
Copy link
Contributor

Hey, did you find anything else? I'm going to try the binary type out

@pyromaniac
Copy link
Contributor

pyromaniac commented Nov 3, 2017

Hey, can you try #602 ? It supposed to fix your issues

@kevgrig
Copy link
Author

kevgrig commented Nov 4, 2017

@pyromaniac Thank you, but it didn't work. I applied your patch to a branch off v0.10.1 and updated my Gemfile with:

gem "chewy", github: "myplaceonline/chewy", branch: "v0.10.1-pr602"

I ran bin/rails console and executed UserIndex.reset! but it gave the following exception after all of the imports:

D, [2017-11-04T16:12:39.481227 #18702] DEBUG -- :   Chewy::Stash::Specification Import (278.7ms) {:index=>1}
Chewy::ImportFailed: Import failed for `Chewy::Stash::Specification` with:
    Index errors:
      `{"type"=>"illegal_argument_exception", "reason"=>"Limit of total fields [1000] in index [chewy_stash] has been exceeded"}`
        on 1 documents: ["user"]

        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-17ecdddd143c/lib/chewy/type/import.rb:88:in `import!'
        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-17ecdddd143c/lib/chewy/index/specification.rb:25:in `lock!'
        from /home/kevin/.gem/ruby/2.4.0/bundler/gems/chewy-17ecdddd143c/lib/chewy/index/actions.rb:199:in `reset!' [...]

@pyromaniac
Copy link
Contributor

Wow, this is also unexpected. So, there is no other way to do it but a binary field. Thanks.

@pyromaniac
Copy link
Contributor

Hm, btw, why are you defining everything inside a single index? This is not the best idea, I would split indexes per model as well. And then there will be no problems for you.

@kevgrig
Copy link
Author

kevgrig commented Nov 5, 2017

@pyromaniac I'm not sure if it's the best way to do it, but I have a website feature that searches "all data" - so I want to search all models at once. Each model has an id which is the user ID, so I can do a search like:

query = {
  bool: {
    must: {
      match: {
        _all: search
      }
    },
    filter: {
      term: {
        id: user.id
      }
    }
  }
}
UserIndex.query(query).order(visit_count: {order: :desc, missing: :_last}).limit(10).load.objects

Or I can limit it to a particular model:

query = {
  bool: {
    must: [
      {
        match: {
          _all: search
        }
      },
      {
        match: {
          _type: category.singularize
        }
      }
    ],
    [...]
  }
}

@pyromaniac
Copy link
Contributor

pyromaniac commented Nov 5, 2017

First of all, you can use Chewy::Search::Request and pass several indexes there, so don't worry. Also, multiple types per index will be deprecated in ES6 and there will no be types at all in ES7. So let's look to the future

@kevgrig
Copy link
Author

kevgrig commented Nov 5, 2017

Interesting, ok, I'll look into refactoring. One issue is that right now within my UserIndex, I can simply iterate over a list of all models and call define_type and generate the fields from the model's columns. I guess I can just dynamically generate the Chewy::Index classes instead, and keep a List of all of them and then do the query across that List?

@pyromaniac
Copy link
Contributor

I'm pretty sure you can. Just use const_set.

@mpospelov
Copy link

mpospelov commented Nov 22, 2017

Hello, I just achieved the same issue with next index

  define_type SpecialOffer.includes(:salon, :services, :stylists, :categories) do
    field :id, type: :text
    field :salon_id, type: :integer
    field :sort_rank, type: :integer
    field :is_home, type: :boolean
    field :created_at, type: :date, value: -> { created_at.iso8601 }
    field :expires_at, type: :date, value: -> { expires_at&.to_date&.iso8601 }
    field :starts_at, type: :date, value: -> { starts_at&.to_date&.iso8601 }
    field(:name, type: :text) { apply_name_analyzers(self) }
    field(:ar_name, type: :text) { apply_name_analyzers(self) }

    field :salon do
      field :is_approved, type: :boolean
      field :hidden, type: :boolean
      field :deleted, type: :boolean, value: -> { deleted_at.present? }
    end

    field :services do
      field(:name, type: :text) { apply_name_analyzers(self) }
      field(:ar_name, type: :text) { apply_name_analyzers(self) }
    end

    field :stylists do
      field(:name, type: :text) { apply_name_analyzers(self) }
      field(:ar_name, type: :text) { apply_name_analyzers(self) }
    end

    field :categories do
      field :name do
        I18n.available_locales.each do |locale|
          field(locale, type: :text) { apply_name_analyzers(self) }
        end
      end
    end
  end

apply_name_analyzers converts field to multi_field in order to apply multiple analyzers like that

  def self.apply_name_analyzers(multi_field)
    multi_field.field :standard, analyzer: 'standard'
    multi_field.field :english, analyzer: 'english'
    multi_field.field :arabic, analyzer: 'arabic'
    multi_field.field :stemmed, analyzer: 'snowball_analyzer'
    multi_field.field :shingle, analyzer: 'shingle_analyzer'
    multi_field.field :ngrams, analyzer: 'ngram_analyzer', search_analyzer: 'standard'
  end

Error happens on specification.lock!
the following specification passed to index before the error

{: index => [{: id => "special_offers", : specification => {
            "settings" => {
                "analysis" => {
                    "tokenizer" => {
                        "phone_number_tokenizer" => {
                            "type" => "ngram", "min_gram" => 3, "max_gram" => 5, "token_chars" => ["digit"]
                        }
                    }, "filter" => {
                        "name_ngram_filter" => {
                            "type" => "edgeNGram", "min_gram" => 1, "max_gram" => 10
                        }
                    }, "analyzer" => {
                        "snowball_analyzer" => {
                            "type" => "custom", "tokenizer" => "standard", "filter" => ["lowercase", "asciifolding", "snowball"]
                        }, "shingle_analyzer" => {
                            "type" => "custom", "tokenizer" => "standard", "filter" => ["shingle", "lowercase", "asciifolding"]
                        }, "ngram_analyzer" => {
                            "type" => "custom", "tokenizer" => "standard", "filter" => ["lowercase", "asciifolding", "name_ngram_filter"]
                        }, "phone_number_analyzer" => {
                            "tokenizer" => "phone_number_tokenizer"
                        }
                    }
                }
            }, "mappings" => {
                "special_offer" => {
                    "properties" => {
                        "id" => {
                            "type" => "text"
                        }, "salon_id" => {
                            "type" => "integer"
                        }, "sort_rank" => {
                            "type" => "integer"
                        }, "is_home" => {
                            "type" => "boolean"
                        }, "created_at" => {
                            "type" => "date"
                        }, "expires_at" => {
                            "type" => "date"
                        }, "starts_at" => {
                            "type" => "date"
                        }, "name" => {
                            "fields" => {
                                "standard" => {
                                    "analyzer" => "standard", "type" => "text"
                                }, "english" => {
                                    "analyzer" => "english", "type" => "text"
                                }, "arabic" => {
                                    "analyzer" => "arabic", "type" => "text"
                                }, "stemmed" => {
                                    "analyzer" => "snowball_analyzer", "type" => "text"
                                }, "shingle" => {
                                    "analyzer" => "shingle_analyzer", "type" => "text"
                                }, "ngrams" => {
                                    "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                }
                            }, "type" => "text"
                        }, "ar_name" => {
                            "fields" => {
                                "standard" => {
                                    "analyzer" => "standard", "type" => "text"
                                }, "english" => {
                                    "analyzer" => "english", "type" => "text"
                                }, "arabic" => {
                                    "analyzer" => "arabic", "type" => "text"
                                }, "stemmed" => {
                                    "analyzer" => "snowball_analyzer", "type" => "text"
                                }, "shingle" => {
                                    "analyzer" => "shingle_analyzer", "type" => "text"
                                }, "ngrams" => {
                                    "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                }
                            }, "type" => "text"
                        }, "salon" => {
                            "properties" => {
                                "is_approved" => {
                                    "type" => "boolean"
                                }, "hidden" => {
                                    "type" => "boolean"
                                }, "deleted" => {
                                    "type" => "boolean"
                                }
                            }, "type" => "object"
                        }, "services" => {
                            "properties" => {
                                "name" => {
                                    "fields" => {
                                        "standard" => {
                                            "analyzer" => "standard", "type" => "text"
                                        }, "english" => {
                                            "analyzer" => "english", "type" => "text"
                                        }, "arabic" => {
                                            "analyzer" => "arabic", "type" => "text"
                                        }, "stemmed" => {
                                            "analyzer" => "snowball_analyzer", "type" => "text"
                                        }, "shingle" => {
                                            "analyzer" => "shingle_analyzer", "type" => "text"
                                        }, "ngrams" => {
                                            "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                        }
                                    }, "type" => "text"
                                }, "ar_name" => {
                                    "fields" => {
                                        "standard" => {
                                            "analyzer" => "standard", "type" => "text"
                                        }, "english" => {
                                            "analyzer" => "english", "type" => "text"
                                        }, "arabic" => {
                                            "analyzer" => "arabic", "type" => "text"
                                        }, "stemmed" => {
                                            "analyzer" => "snowball_analyzer", "type" => "text"
                                        }, "shingle" => {
                                            "analyzer" => "shingle_analyzer", "type" => "text"
                                        }, "ngrams" => {
                                            "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                        }
                                    }, "type" => "text"
                                }
                            }, "type" => "object"
                        }, "stylists" => {
                            "properties" => {
                                "name" => {
                                    "fields" => {
                                        "standard" => {
                                            "analyzer" => "standard", "type" => "text"
                                        }, "english" => {
                                            "analyzer" => "english", "type" => "text"
                                        }, "arabic" => {
                                            "analyzer" => "arabic", "type" => "text"
                                        }, "stemmed" => {
                                            "analyzer" => "snowball_analyzer", "type" => "text"
                                        }, "shingle" => {
                                            "analyzer" => "shingle_analyzer", "type" => "text"
                                        }, "ngrams" => {
                                            "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                        }
                                    }, "type" => "text"
                                }, "ar_name" => {
                                    "fields" => {
                                        "standard" => {
                                            "analyzer" => "standard", "type" => "text"
                                        }, "english" => {
                                            "analyzer" => "english", "type" => "text"
                                        }, "arabic" => {
                                            "analyzer" => "arabic", "type" => "text"
                                        }, "stemmed" => {
                                            "analyzer" => "snowball_analyzer", "type" => "text"
                                        }, "shingle" => {
                                            "analyzer" => "shingle_analyzer", "type" => "text"
                                        }, "ngrams" => {
                                            "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                        }
                                    }, "type" => "text"
                                }
                            }, "type" => "object"
                        }, "categories" => {
                            "properties" => {
                                "name" => {
                                    "properties" => {
                                        "en" => {
                                            "fields" => {
                                                "standard" => {
                                                    "analyzer" => "standard", "type" => "text"
                                                }, "english" => {
                                                    "analyzer" => "english", "type" => "text"
                                                }, "arabic" => {
                                                    "analyzer" => "arabic", "type" => "text"
                                                }, "stemmed" => {
                                                    "analyzer" => "snowball_analyzer", "type" => "text"
                                                }, "shingle" => {
                                                    "analyzer" => "shingle_analyzer", "type" => "text"
                                                }, "ngrams" => {
                                                    "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                                }
                                            }, "type" => "text"
                                        }, "ar" => {
                                            "fields" => {
                                                "standard" => {
                                                    "analyzer" => "standard", "type" => "text"
                                                }, "english" => {
                                                    "analyzer" => "english", "type" => "text"
                                                }, "arabic" => {
                                                    "analyzer" => "arabic", "type" => "text"
                                                }, "stemmed" => {
                                                    "analyzer" => "snowball_analyzer", "type" => "text"
                                                }, "shingle" => {
                                                    "analyzer" => "shingle_analyzer", "type" => "text"
                                                }, "ngrams" => {
                                                    "analyzer" => "ngram_analyzer", "search_analyzer" => "standard", "type" => "text"
                                                }
                                            }, "type" => "text"
                                        }
                                    }, "type" => "object"
                                }
                            }, "type" => "object"
                        }
                    }
                }
            }
        }
    }]
}

I didn't receive that error until I updated to latest version. Any ideas why it may happen and how I can avoid it? Thanks

@mpospelov
Copy link

I found note https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#_limiting_the_number_of_literal_nested_literal_fields
due to specification field in specification mapping defined as an object on each nesting level it generates a new field, is there any reason to store it as an object?

As I got from the code it's used just for comparing if current specification changed from last indexed version, can't we just use plain string or md5 hash of this specification for this comparison?

@pyromaniac
Copy link
Contributor

Hey, it is a good idea about MD5, but I've got plans for this object proper comparison. Can you try to use the latest master? The limit should be increased there.

@mpospelov
Copy link

right now I'm using the latest master but still got this error

@pyromaniac
Copy link
Contributor

Which error btw?

@mpospelov
Copy link

Chewy::ImportFailed: Import failed for `Chewy::Stash::Specification` with:
    Index errors:
      `{"type"=>"illegal_argument_exception", "reason"=>"Limit of total fields [1000] in index [salony_development_chewy_stash] has been exceeded"}`
        on 1 documents: ["special_offers"]

@pyromaniac
Copy link
Contributor

Oh, right, okay, thanks. Seems like the only way is to use binary field. For now, try maybe to switch off the specification import. I probably need to add some option to disable it.

@mpospelov
Copy link

ok thanks

@mattzollinhofer
Copy link
Contributor

Looks like @kevgrig problem is solved by the newest version. Can we close this issue?

@mpospelov
Copy link

I'm using chewy 5.0.0 for 4 months already and didn't meet this kind of error anymore

@kevgrig
Copy link
Author

kevgrig commented Aug 17, 2018

@mattzollinhofer Yep, issue has already been closed since last November, although it's great that now I can just use the latest version to work around the issue instead of a branch, thanks! Ultimately, I need to re-architect my models for ES6 based on a comment above, which I've been avoiding because things work for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants