Closed
Description
Request Type
Bug
Problem Description
The index engine fails to process the document if it contains a "non full-text" field with more than 32766 bytes.
During document creation, the document won't be indexed and become invisible (even if it stored in the database).
During a data reindex, the process stops and a part of the data is not indexed.
TheHive 4.1.15 triggers a data reindex. If the database contains a huge field, not all data will be visible/usable.
The impacted fields are:
- type (alert)
- source (alert)
- sourceRef (alert)
- name (customField, role, organisation, attachment, caseTemplate)
- login (user)
- title (alert, case, task)
- contentType (attachment)
- tags (case, alert, observable)
- dataType (observable)
- value (resolutionStatus, impactStatus)
- data (observable)
- group (task)
The field "data" is probably the only one that could be filled with more than 32k bytes, in a normal use.
Possible Solutions
- Prevent creating oversized field: [Feature Request] Add constraint on input data #2024 (available from 4.2)
- Add a process to fix the oversized field: Process immense terms during database initialisation ScalliGraph#17
- For observable data, the hash of the data can be used [Enhancement] When observable data is too big, use hash #2288 (and existing data must be fixed)
Only for Elasticsearch, the mapping can be updated to ignore all fields with a size greater than a value using ignore_above
"store_generic": {
"mapping": {
"index": "not_analyzed",
"ignore_above": 32766
},
"match": "*"
}