This repository contains all of the queries used within the Complete Guide to Elasticsearch course.
The core APIs are identical:
- search queries
- indexing documents
- aggregations
- index management
- basic security
Main differences are in "ecosystem solutions" or "vertical solutions".
Elastic provides tightly integrated solutions for:
- Security information a nd event management (SIEM), security analysis
- Observability
- Enterprise search
- Machine learning and AI These solutions tightly integrate parts of the Elastic stack, such as ES and Kibana. More commercial in nature, at least for the advance features. Features are powerful and easy to use out of the box. ES solutions are a bit more user-friendly and integrated, but requires the cost of a license for some things.
OpenSearch takes a more open source approach that relies on plugins and observability. Plugins for alerting, observability and security are bundled with OpenSearch.
elasticsearch-slides-udemy/02-Getting_Started/06-Elasticsearch_vs_OpenSearch.pdf
Elastic Cloud Free trial: https://www.elastic.co/cloud/cloud-trial-overview
Aiven free trial: https://aiven.io/opensearch
Digital Ocean OpenSearch: https://www.digitalocean.com/products/managed-databases-opensearch
AWS OpenSearch: https://aws.amazon.com/opensearch-service/
https://dedicated-laurel-1hfqmn7b.apps.bonsaisearch.net/app/dev_tools#/console
https://www.geeksforgeeks.org/cloud-computing/elasticsearch-concept-of-painless/
https://alexmarquardt.com/category/painless/
https://search-guard.com/blog/elasticsearch-painless-alerting-primer/
https://l.codingexplained.com/r/elastic-cloud-trial?src=es-getting-started
Authenticating API requests, use header: Authorization: ApiKey ZXZWN2g1b0….
Download the archives.
ES contains Java. Kibana contains Node.js.
Extract the archive. Rename the directory to get rid of version number. cd into the directory.
From the elasticsearch directory:
bin/elasticsearch
ES starts and is set up with a superuser 'elastic' and password, found in the stdout from startup.
Save the password.
Resetting elastic user's password:
bin/elasticsearch-reset-password -u elastic
TLS certs are also created. Data is encrypted during transfer.
Enrollment token is also created for Kibana. Valid for 30 minutes.
Generate a new Kibana enrollment token:
bin/elasticsearch-create-enrollment-token --scope kibana
If MacOS, disable gatekeeper for the Kibana directory.
From the parent directory of the kibana directory:
elastic-stack xattr -d -r com.apple.quarantine kibana
cd kibana
bin/kibana
Browse to the URL + token output in the terminal.
Paste the enrollment token into the web input box.
In the Welcome to Elastic login page, login as elastic user with password from terminal.
Unzip. Circumvent "filepath too long" error with 7zip or other program.
Run:
bin\elasticsearch.bat
bin\kibana.bat
Open the url+token from the kibana start terminal.
Paste enrollment token.
(Or click button configure manually)
Login to kibana as elastic user.
Cluster-a group of one or more Elasticsearch nodes instances that are connected together. A cluster is a collection of nodes that work together to store data and provide search and indexing capabilities. It provides horizontal scalability, allowing you to add or remove nodes easily to accommodate changing requirements.
node-an instance of Elasticsearch. Nodes can be deployed on separate machines or run on a single machine for development purposes.
Index-a collection of documents. Each index is divided into multiple primary and replica shards. An index is a collection of documents sharing similar characteristics. It serves as the primary unit for organizing and managing data within Elasticsearch. Each document within an index is uniquely identified by a document ID. Indices are analogous to tables in a relational database.
In Elasticsearch, a shard is a basic unit of data storage and search. Elasticsearch uses a distributed architecture to handle large amounts of data and provide scalable and efficient search capabilities. Sharding is the process of breaking down the index into smaller, more manageable pieces called shards. Understanding shards is crucial for designing and optimizing Elasticsearch clusters.
Elastic is able to distribute your data across nodes by subdividing an index into shards. Each index in Elasticsearch is a grouping of one or more physical shards, where each shard is a self-contained Lucene index containing a subset of the documents in the index. By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch increases indexing and query capacity.
Types of Shards Primary Shards: Primary shards contain the main data and handle all write operations. The number of primary shards is set when creating an index and cannot be changed later.
Replica Shards: Replica shards are copies of the primary shards, serving as failover mechanisms. They improve system resilience and enable parallel search and retrieval operations.
In the console:
GET /_cluster/health
{
"cluster_name": "opensearch_2.19.2_omc_bonsai_us-east-1_common_opensearch-8466",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"discovered_master": true,
"discovered_cluster_manager": true,
"active_primary_shards": 1,
"active_shards": 2,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}Config file:
$ES_HOME/config/elasticsearch.yml
GET /_cat/nodes?v
GET /_cat/indices?v
GET /_cat/indices?v&expand_wildcards=all
GET /[API]/[command]
Kibana automatically sets Content-Type and authentication headers.
Download curl from here: https://curl.se/download.html
Conda or cygwin may already have curl.
curl https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net
{
"name" : "opensearch_172-31-155-75_2.19.2_omc_bonsai_us-east-1_common_opensearch-8466_manager-data-ingest-2617_",
"cluster_name" : "opensearch_2.19.2_omc_bonsai_us-east-1_common_opensearch-8466",
"cluster_uuid" : "lyr-L1ImSoyHujHXhaNpvA",
"version" : {
"distribution" : "opensearch",
"number" : "2.19.2",
"build_type" : "tar",
"build_hash" : "e0ba5eebfa3f060fc76e4e2b5b61193a19470d4f",
"build_date" : "2025-04-29T20:06:33.471315233Z",
"build_snapshot" : false,
"lucene_version" : "9.12.1",
"minimum_wire_compatibility_version" : "7.10.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
(base) blauerbock@Johns-MacBook-Pro-2.local /Users/blauerbock/workspaces/complete-guide-to-elasticsearch [master]
%
blauerbock@Johns-MacBook-Pro-2.local /Users/blauerbock/workspaces/complete-guide-to-elasticsearch [master]
% curl https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 704 100 704 0 0 10524 0 --:--:-- --:--:-- --:--:-- 10666
{
"name": "opensearch_172-31-155-75_2.19.2_omc_bonsai_us-east-1_common_opensearch-8466_manager-data-ingest-2617_",
"cluster_name": "opensearch_2.19.2_omc_bonsai_us-east-1_common_opensearch-8466",
"cluster_uuid": "lyr-L1ImSoyHujHXhaNpvA",
"version": {
"distribution": "opensearch",
"number": "2.19.2",
"build_type": "tar",
"build_hash": "e0ba5eebfa3f060fc76e4e2b5b61193a19470d4f",
"build_date": "2025-04-29T20:06:33.471315233Z",
"build_snapshot": false,
"lucene_version": "9.12.1",
"minimum_wire_compatibility_version": "7.10.0",
"minimum_index_compatibility_version": "7.0.0"
},
"tagline": "The OpenSearch Project: https://opensearch.org/"
}
(base) blauerbock@Johns-MacBook-Pro-2.local /Users/blauerbock/workspaces/complete-guide-to-elasticsearch [master]curl $ESHOST | jq .
You can run queries with cURL or other HTTP clients as well, such as Postman.
You should already have cURL installed with the exception being for some old versions of Windows
Anyway, let’s type out the simplest possible cURL command by simply specifying the endpoint of our Elasticsearch cluster.
If you are using Elastic Cloud, be sure to use the Elasticsearch endpoint from the deployment page and not the Kibana endpoint.
The GET HTTP verb is implicitly assumed if none is specific, but we can also specify it with the -X argument as follows.
Let’s send the request.
With explicit HTTP verb:
curl -X GET http://localhost:9200
curl https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net
As you can see, we get an empty response back from Elasticsearch.
That’s because from version 8 and onwards, we need to use the TLS endpoint instead of plaintext, so let’s change that.
curl -X GET https://localhost:9200
Now we get a certificate error.
The reason is that Elasticsearch generates a self signed certificate by default, which is not trusted by HTTP clients for security reasons.
Note that this only applies to local setups, so if you created a cloud deployment, you will not face this issue.
The easiest way to get around that is to simply use cURL’s --insecure flag as follows.
curl --unsecure -X GET https://localhost:9200
This flag instructs cURL to ignore the certificate error, and if you look closely, you can see that we now get a different error.
This was an easy solution and it works just fine for local development, but the more correct approach is to provide cURL with the CA certificate with the "cacert" argument.
From the elasticsearch root directory (or use absolute path):
curl --cacert config/certs/http_cs.crt -X GET https://localhost:9200
The CA certificate is located within the config/certs directory as you can see.
If your working directory is the Elasticsearch root directory, you can specify the relative path just like I did.
Otherwise you can use an absolute path as well.
Running the command, you can see that the certificate error went away with this approach as well.
Alright, so far so good.
We still get an error, because we need to authenticate with our Elasticsearch cluster.
This also applies if you have created a cloud deployment instead of a local one.
Doing so is simple with cURL’s -u argument.
The value should simply be the username for your deployment.
For local deployments, the password is the one that was generated the first time Elasticsearch started up.
curl --cacert config/certs/http_cs.crt -u elastic -X GET https://localhost:9200
When running the command, cURL will prompt us to enter our password.
Perfect, that worked as intended.
For the endpoint we defined, Elasticsearch returns basic information about our cluster.
As an alternative, you can supply your password for the -u argument as well.
Simply add a colon after the username followed by the password.
With this approach, cURL will not prompt us to enter the password when running the command.
curl --cacert config/certs/http_cs.crt -u elastic:password -X GET https://localhost:9200
The password will, however, be exposed within your terminal, so this is not ideal from a security perspective - especially when communicating with a production cluster.
Anyway, that was the most basic request we could send.
Oftentimes we need to send some data along with our request, such as when searching for data.
Let’s update our request path to use Elasticsearch’s Search API for a products index.
This index doesn’t exist yet, but we will create it later.
The Search API requires us to send a JSON object specifying constraints for our query.
We will get back to searching for data later, so I will just use the simplest possible query which matches all documents.
To specify the data, we can use cURL’s -d argument.
curl --cacert config/certs/http_cs.crt -u elastic:password -X GET https://localhost:9200/products/_search -d '{ "query": { "match_all": {} } }'
curl --cacert config/certs/http_cs.crt -u elastic:password -X GET "${ESHOST}/products/_search" -d '{ "query": { "match_all": {} } }'
curl --cacert config/certs/http_cs.crt -u elastic:password -X GET "${ESHOST}/products/_search" -d '{ "query": { "match_all": {} } }'
curl -X GET -H "Content-Type:application/json" "${ESHOST}/products/_search" -d '{ "query": { "match_all": {} } }'
Don’t worry about the JSON object, but here is a formatted version of it anyway.
Notice how I enclosed it within single quotes to avoid having to escape all of the double quotes with backslashes.
**That approach doesn’t work on Windows because it doesn’t like single quotes.
Instead, you need to wrap the argument within double quotes and then escape each double quote within the JSON object.**
You can see an example on your screen, and you can copy it from within the GitHub repository to save some typing.
curl [...] -d "{ "query": { "match_all": {} } }"
curl "${ESHOST}" | jq .
Anyway, let’s hit Enter and see what we get.
We get an error back saying that the provided Content-Type header is not supported.
When adding data with the -d argument, cURL just assumes that we are mimicking a form submission.
Because Elasticsearch expects to receive JSON, we need to explicitly define which kind of data we are sending.
That’s done by specifying a Content-Type header with a value of application/json.
That can be done with the -H argument as follows.
curl --cacert config/certs/http_cs.crt -u elastic:password -X GET -H "Content-Type:application/json" https://localhost:9200/products/_search -d '{ "query": { "match_all": {} } }'
curl -X GET "${ESHOST}/products/_search" -H "Content-Type:application/json" -d '{ "query": { "match_all": {} } }'That should fix the error, so let’s send the request again.
curl -X GET "${ESHOST}/products/_search" -H "Content-Type:application/json" -d '{ "query": { "match_all": {} } }'
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [products]","index":"products","resource.id":"products","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [products]","index":"products","resource.id":"products","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}%Indeed the header error went away.
We now get a different error stating that the products index doesn’t exist.
That’s to be expected since we haven’t created it yet, so everything is good.
So that’s how to send requests to Elasticsearch with cURL.
If you encounter any problems, try checking the order of the arguments, as cURL is quite sensitive in that regard.
If you prefer to use other HTTP clients, it should be fairly easy to replicate this in Postman or something like that.
Alright, I’ll see you in the next lecture.
https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html
elasticsearch-slides-udemy/02-Getting_Started/17-Adding_more_nodes_to_the_cluster_(for_development).pdf
DELETE /pages
PUT /products
DELETE /pages
PUT /products { "settings": { "number_of_shards": 1, "number_of_replicas": 1 } }
{ "acknowledged": true, "shards_acknowledged": true, "index": "products" }
POST /products/_doc { "name": "Coffee Maker", "price": 64, "in_stock": 10 } { "_index": "products", "_id": "Aodt2ZkBa2Q2SW-AFCPx", "_version": 1, "result": "created", "_shards": { "total": 3, "successful": 3, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
POST /
PUT /products/_doc/100 { "name": "Toaster", "price": 39, "in_stock": 4 } GET /products/_doc/100
POST /products/_update/100
{
"doc": {
"in_stock": 3
}
}
GET /products/_doc/100
POST /products/_update/100
{
"doc": {
"tags": ["electronics"]
}
}
Managing Documents/scripted-updates.md
GET /products/_doc/100
POST /products/_update/100 { "script": { "source": "ctx._source.in_stock--" } } GET /products/_doc/100
GET /products/_doc/100
POST /products/_update/100 { "script": { "source": "ctx._source.in_stock=10" } } GET /products/_doc/100 POST /products/_update/100 { "script": { "source": "ctx._source.in_stock -= params.quantity", "params": { "quantity": 4 } } }
PUT /products/_doc/123?version=521&version_type=external { "name": "Kuerig Machine", "price": 49, "in_stock": 10 }
We need to use a script:
POST /products/_update_by_query { "conflicts": "proceed", "script": { "source": "ctx._source.in_stock--" }, "query": { "match_all": {} } }
"bonsai_exception", "reason": "Update by query is for business+ only"
Steps in Update by Query:
- POST /products/_update_by_query
- Take snapshot of the index.
- Snapshot prevents overwriting changes made after the snapshot was taken.
- Query can take some time.
- Each document's primary term and sequence number is used.
- A doc is only updated if the values match from the snapshot.
- "Optimistic concurrency control"
- Avoid aborting the query using "conflicts": "proceed".
- Version conflicts will be counted but query will not be aborted.
- Search query is sent to each of the shards to find all matching documents.
- When a match is found, a bulk request is sent to update those documents. Uses scroll api internally. Each pair of search and bulk requests are sent sequentially (one at a time).
- Should there be an error in the search query or bulk update, ES will try up to 10 times. If the affected query is still not successful, the whole query is aborted. The failures will then be specified in the results under the failures key. The query is aborted and NOT rolled back. Docs that were updated will remain updated even if the request failed. The query is not run within a transaction, as with RDBMS. If the query can partially succeed or fail, it will return information you can use to deal with it.
The create action will fail if the document already exists.
The index action will add the document if it doesn't already exist; if the document exists, it will be replaced.
curl https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net | jq .
curl -H "Content-Type: application/x-ndjson" -XPOST http://localhost:9200/products/_bulk --data-binary "@products-bulk.json"
curl https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net | jq .
curl -H "Content-Type: application/x-ndjson" -XPOST https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net/products/_bulk --data-binary "@products-bulk.json"
GET /products/_search { "query": { "match_all": {} } }
curl -X GET -H "Content-Type:application/json" "${ESHOST}/products/_search" -d '{ "query": { "match_all": {} } }'
/Users/blauerbock/workspaces/complete-guide-to-elasticsearch/elasticsearch-slides-udemy/04-Mapping_and_Analysis/42-Introduction_to_mapping.pdf
GET /_mapping
GET products/_mapping
GET /reviews/_mapping
GET /reviews/_mapping/field/content
Slides: /Users/blauerbock/workspaces/complete-guide-to-elasticsearch/elasticsearch-slides-udemy/04-Mapping_and_Analysis/43-Overview_of_data_types.pdf
Link: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/field-data-types
nested data type-similar to object but maintains object relationship.
- Allows querying object independently.
keyword-used for exact matching of values.
- used for filtering, aggregating and sorting.
For full-text searches, use the text data type instead.
An inverted index is created for each text field. Each text field has a dedicated inverted index.
An inverted index is a sorted mapping between terms and the documents that contain them.
Other data types use different data types.
Numeric, dates, and geospatial fields are stored as BKD trees. Dates are stored as long values internally.
A BKD tree, or Balanced K-Dimensional tree, is an I/O-efficient dynamic data structure designed for indexing large-scale numeric and multi-dimensional data, particularly in systems like Elasticsearch and Apache Lucene.
GET products/_mapping
GET /reviews/_mapping
GET /reviews/_mapping/field/content
No pdf? No code?
Does not require you to define explicit field mappings before indexing documents.
The first time ES encounters a field, it will automatically create a field mapping for it using "sensible" defaults.
Example:
POST /my-index/_doc
{
"tags": ["computer","electronics"],
"in_stock": 4,
"created_at": "2020/01/01 00:00:00"
}
We specified the date as a string because there is no date datatype in JSON.
ES uses "date detection".
ES always chooses the long data type, since it can't know how large the numbers will be.
ES adds two mappings for "tags", since it doesn't know how you intend to use the tags field.
- text mapping for full text searches
- keyword for exact matches, sorting and aggregations.
- "ignore_above": 256, because it almost never makes sense to use such long values for sorting and aggregations. Reduces unnecessary duplicative use of disk space.
Rules for dynamic mapping:
https://www.udemy.com/course/elasticsearch-complete-guide/learn/lecture/18848584#overview
Elasticsearch Dynamic Mapping
Elasticsearch dynamically maps new fields in incoming documents by default, using predefined rules to infer data types based on the field's content When a new field is detected and contains a non-null value, Elasticsearch adds it to the mapping using these rules: null values do not create a field, boolean values map to the boolean type, numeric values (float or long) are mapped based on detection, and strings are classified as date, numeric, or text/keyword depending on pattern matching Specifically, strings that match date patterns are mapped as date fields, those that pass numeric detection are mapped as float or long, and others are mapped as text with a .keyword sub-field
Dynamic mapping can be controlled using the dynamic parameter in index mappings. Setting it to true enables dynamic field creation, while runtime creates fields that are loaded from _source at query time without being indexed Setting dynamic to false ignores new fields, and strict rejects documents containing unknown fields The default behavior is true, enabling dynamic mapping
Date detection is enabled by default and checks string fields against patterns defined in dynamic_date_formats, which by default includes "strict_date_optional_time" and "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
Date detection can be disabled by setting date_detection to false, causing new string fields to be mapped as text instead of date.
Custom date formats can be defined by modifying dynamic_date_formats to support specific patterns, either as an array (where the first matching pattern determines the mapping) or as a string with || to allow multiple formats
Numeric detection is disabled by default but can be enabled via numeric_detection to automatically map string representations of numbers to float or long types This is useful when applications or languages output numbers as strings.
For greater control, dynamic templates can be defined using the dynamic_templates parameter in index mappings. These templates allow custom rules based on field name patterns (match, path_match), data types (match_mapping_type), or other conditions, enabling specific mappings for new fields Templates are processed in order, with the first matching template taking precedence For example, a template can map all string fields with a "user_" prefix as keyword fields Dynamic templates can also be used to map fields as runtime fields, which are not indexed but loaded from _source during queries
While dynamic mapping is convenient, explicit mapping is recommended for production environments to ensure precise control over data indexing and to avoid potential type conflicts
Every field in ES may contain zero or more values.
Elasticsearch dynamically maps fields based on the data type detected in incoming documents. For arrays, the mapping is determined by the first non-null value within the array; if the array contains null values, no field is added until a concrete value is encountered When a new field is detected, Elasticsearch applies default mapping rules: strings that pass date or numeric detection are mapped as date or numeric types (float or long), while other strings are mapped as text with a .keyword sub-field Numeric values are mapped as long or double depending on their precision, boolean values as boolean, and objects as object type
Dynamic mapping behavior can be controlled using the dynamic parameter in index mappings, which can be set to true (default, automatically adds new fields), false (ignores new fields), or strict (rejects documents with unknown fields) To customize the mapping of dynamically added fields, dynamic templates can be defined. These templates are processed in order, and the first matching template applies For example, a dynamic template can be created to map all string fields as keyword fields instead of the default text type
For arrays, the dynamic mapping rule depends on the first non-null value, and this behavior is consistent across all data types If a field contains an array of mixed types, the mapping is determined by the first non-null value in the array. Elasticsearch does not support dynamic mapping for all data types, and fields like geo_point or geo_shape must be explicitly mapped Additionally, dynamic mapping can be disabled at the index level or within nested objects to prevent unintended field creation
elasticsearch-slides-udemy/04-Mapping_and_Analysis/63-Mapping_recommendations.pdf
Use explicit mapping, at least for production clusters.
- Optimized mappings save disk space.
- Set dynamic mapping to "strict", not false.
- You are always in control.
- Setting dynamic mapping to false lets you add fields for which there are no mappings. These fields are ignored in terms of indexing.
- Strict dynamic mapping avoids surprises and unexpected results.
Don't always map strings as both text and keyword.
- Typically only one is needed.
- Each mapping requires disk space.
- Add a text mapping if you want to do full-text searches.
- Add a keyword mapping if you want to do aggregations, sorting or filtering.
Disable coercion.
- Coercion forgives you for not doing the right thing. Try to do the right thing instead.
- Use the correct data types whenever possible.
Use appropriate numeric data types.
- For whole numbers, the integer data type might be enough.
Mapping parameters.
Set doc_values to false for a field if you don't need sorting, aggregations, and scripting.
Set norms to false if you won't use a field for relevance scoring.
Set index to false if you don't need to filter on values.
- You can still do aggregations, such as for time-series data.
The foregoing parameters make sense for over one million documents.
Stop words are filtered out during the analysis process. They provide little or no value for relevance scoring.
There is not much need to remove stop words due to improvements in the relevance algorithm.
The same analyzer is used for indexing and searching.
https://www.elastic.co/docs/reference/text-analysis/analyzer-reference
standard_analyzer
- Splits by word. Splits text at word boundaries and removes punctuation.
- Done by standard tokenizer.
- Lowercases letters with the lowercase token filter.
- Contains the stop token filter (disabled by default).
simple analyzer
- Split the input text whenever it encounters anything other than a letter.
- Lowercases with the lowercase tokenizer, not a token filter (unusual, a token hack to avoid passing through the input twice).
whitespace analyzer
- Splits text into tokens by whitespace.
- Does NOT lowercase letters.
keyword analyzer
- No-op analyzer that leaves the input intact, outputting it as a single term.
- Used for keyword fields by default.
- Used for exact matching.
pattern analyzer
- Lets you define a regular expression to match token separators.
- The regex should match whatever should split the text into tokens.
- The default pattern matches all non-word characters (\W+).
- Lowercases letters by default, but this can be disabled.
There are also language-specific analyzers.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html
GET /products/_search { "query": { "match_all": {} } }
Term queries are used to search structured data for exact values (filtering).
- Term-level queries are not analyzed.
- the search value is used exactly as for inverted index lookups.
- can be used with data types such as keyword, numbers, dates.
- Don't use term level queries for the text data type.
- results will be unpredictable.
- there is no explicit error message or failure, but you dont' get the results you should.
- Never use term level queries on text data type field.
Term level queries are case-sensitive.
GET /products/_search { "query": { "term": { "is_active": true } } }
GET /products/_search { "query": { "term": { "in_stock": 1 } } }
GET /products/_search { "query": { "term": { "created": "2007/10/14" } } }
GET /products/_search { "query": { "term": { "created": "2007/10/14 12:34:56" } } }
is required if we want to specify parameters for our query.
GET /products/_search
{
"query": {
"term": {
"tags.keyword": {
"value": "Vegetable"
}
}
}
}
GET /products/_search
{
"query": {
"term": {
"tags.keyword": {
"value": "vegetable",
"case_insensitive": true
}
}
}
}
# note "terms" vs "term"
GET /products/_search
{
"query": {
"terms": {
"tags.keyword": ["Soup", "Meat"]
}
}
}
SQL equivalent: tags.keyword CONTAINS "Soup" AND/OR "Meat"
GET /products/_search { "query": { "range": { "in_stock": { "gte": 1, "lte": 5 } } } }
SQL equivalent:
WHERE in_stock >= 1 AND in_stock <= 5
GET /products/_search { "query": { "range": { "in_stock": { "gt": 1, "lt": 5 } } } }
GET /products/_search { "query": { "range": { "created": { "gte": "2007/01/01", "lte": "2020/01/31" } } } }
GET /products/_search { "query": { "range": { "created": { "format": "dd/MM/yyyy", "gte": "01/01/2007", "lte": "31/01/2020" } } } }
Prefix must occur at the beginning of the term.
GET /products/_search
{
"query": {
"prefix": {
"name.keyword": {
"value": "Past"
}
}
}
}
If we search tags instead of name, we get more hits.
GET /products/_search { "query": { "prefix": { "tags.keyword": { "value": "Past" } } } }
GET /products/_search
{
"query": {
"wildcard": {
"tags.keyword": {
"value": "Past?"
}
}
}
}
GET /products/_search
{
"query": {
"wildcard": {
"tags.keyword": {
"value": "Bee*"
}
}
}
}
GET /products/_search
{
"query": {
"regexp": {
"tags.keyword": {
"value": "Bee(f|r)+"
}
}
}
}
GET /products/_search
{
"query": {
"regexp": {
"tags.keyword": {
"value": "Bee(f|r)+"
}
}
}
}
ES uses Apache Lucene regex, in which anchor symbols are not supported (^,$).
All of the above queries can be made case insensitive by adding the case_insensitive parameter, e.g.:
GET /products/_search
{
"query": {
"prefix": {
"name.keyword": {
"value": "Past",
"case_insensitive": true
}
}
}
}
GET /products/_search
{
"query": {
"exists": {
"field": "tags.keyword"
}
}
}
SQL: SELECT * FROM products WHERE tags IS NOT NULL
There is no dedicated query for this, so we do it with the bool query.
GET /products/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "tags.keyword"
}
}
]
}
}
}
SQL: SELECT * FROM products WHERE tags IS NULL
GET /products/_search
{
"query": {
"match": {
"name": "pasta"
}
}
}
Full text queries are analyzed (and therefore case insensitive), so the below query yields the same results.
GET /products/_search
{
"query": {
"match": {
"name": "PASTA"
}
}
}
GET /products/_search
{
"query": {
"match": {
"name": "PASTA CHICKEN"
}
}
}
Defaults to or. The below makes both terms required.
GET /products/_search
{
"query": {
"match": {
"name": {
"query": "pasta chicken",
"operator": "and"
}
}
}
}
https://www.elastic.co/docs/api/
GET /products/_search
{
"query": {
"multi_match": {
"query": "vegetable",
"fields": ["name", "tags"]
}
}
}
GET /products/_search
{
"query": {
"multi_match": {
"query": "vegetable",
"fields": ["name^2", "tags"]
}
}
}
GET /products/_search
{
"query": {
"multi_match": {
"query": "vegetable broth",
"fields": ["name", "description"],
"tie_breaker": 0.3
}
}
}
- Leaf queries search for values and are independent queries.
- term and match queries
- Compound queries wrap other queries to produce a result.
https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-bool-query
Query clauses added within the must occurrence type are required to match.
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
]
}
}
}
SQL: SELECT * FROM products WHERE tags IN ("Alcohol")
Query clauses added within the must_not occurrence type are required to not match.
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "Wine"
}
}
]
}
}
}
SQL: SELECT * FROM products WHERE tags IN ("Alcohol") AND tags NOT IN ("Wine")
Matching query clauses within the should occurrence type boost a matching document's relevance score.
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "Wine"
}
}
],
"should": [
{
"term": {
"tags.keyword": "Beer"
}
}
]
}
}
}
An example with a few more adding more should query clauses:
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "Wine"
}
}
],
"should": [
{
"term": {
"tags.keyword": "Beer"
}
},
{
"match": {
"name": "beer"
}
},
{
"match": {
"description": "beer"
}
}
]
}
}
}
Since only should query clauses are specified, at least one of them must match.
GET /products/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"tags.keyword": "Beer"
}
},
{
"match": {
"name": "beer"
}
}
]
}
}
}
Since a must query clause is specified, all of the should query clauses are optional.
They are therefore only used to boost the relevance scores of matching documents.
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
],
"should": [
{
"term": {
"tags.keyword": "Beer"
}
},
{
"match": {
"name": "beer"
}
}
]
}
}
}
This behavior can be configured with the minimum_should_match parameter as follows.
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
],
"should": [
{
"term": {
"tags.keyword": "Beer"
}
},
{
"match": {
"name": "beer"
}
}
],
"minimum_should_match": 1
}
}
}
Query clauses defined within the filter occurrence type must match.
This is similar to the must occurrence type. The difference is that
filter query clauses do not affect relevance scores and may be cached.
GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"tags.keyword": "Alcohol"
}
}
]
}
}
}
SQL: SELECT * FROM products WHERE (tags IN ("Beer") OR name LIKE '%Beer%') AND in_stock <= 100
Variation #1
GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"in_stock": {
"lte": 100
}
}
}
],
"must": [
{
"bool": {
"should": [
{ "term": { "tags.keyword": "Beer" } },
{ "match": { "name": "Beer" } }
]
}
}
]
}
}
}
Variation #2
GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"in_stock": {
"lte": 100
}
}
}
],
"should": [
{ "term": { "tags.keyword": "Beer" } },
{ "match": { "name": "Beer" } }
],
"minimum_should_match": 1
}
}
}
SQL: SELECT * FROM products WHERE tags IN ("Beer") AND (name LIKE '%Beer%' OR description LIKE '%Beer%') AND in_stock <= 100
Variation #1
GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"in_stock": {
"lte": 100
}
}
},
{
"term": {
"tags.keyword": "Beer"
}
}
],
"should": [
{ "match": { "name": "Beer" } },
{ "match": { "description": "Beer" } }
],
"minimum_should_match": 1
}
}
}
Variation #2
GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"in_stock": {
"lte": 100
}
}
},
{
"term": {
"tags.keyword": "Beer"
}
}
],
"must": [
{
"multi_match": {
"query": "Beer",
"fields": ["name", "description"]
}
}
]
}
}
}
- Filter execution context
- No relevance scores are calculated.
- Match or no match.
- ES doesn't spend resources on calculating relevance scores.
GET /products/_search
{
"size": 20,
"query": {
"match": {
"name": "juice"
}
}
}
GET /products/_search
{
"size": 20,
"query": {
"boosting": {
"positive": {
"match": {
"name": "juice"
}
},
"negative": {
"match": {
"name": "apple"
}
},
"negative_boost": 0.5
}
}
}
GET /products/_search
{
"query": {
"boosting": {
"positive": {
"match_all": {}
},
"negative": {
"match": {
"name": "apple"
}
},
"negative_boost": 0.5
}
}
}
Boost the relevance scores for pasta products.
GET /recipes/_search
{
"query": {
"bool": {
"must": [
{ "match_all": {} }
],
"should": [
{
"term": {
"ingredients.name.keyword": "Pasta"
}
}
]
}
}
}
Reduce the relevance scores for bacon products.
GET /recipes/_search
{
"query": {
"boosting": {
"positive": {
"match_all": {}
},
"negative": {
"term": {
"ingredients.name.keyword": "Bacon"
}
},
"negative_boost": 0.5
}
}
}
GET /recipes/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"ingredients.name.keyword": "Pasta"
}
},
"negative": {
"term": {
"ingredients.name.keyword": "Bacon"
}
},
"negative_boost": 0.5
}
}
}
GET /recipes/_search
{
"query": {
"boosting": {
"positive": {
"bool": {
"must": [
{ "match_all": {} }
],
"should": [
{
"term": {
"ingredients.name.keyword": "Pasta"
}
}
]
}
},
"negative": {
"term": {
"ingredients.name.keyword": "Bacon"
}
},
"negative_boost": 0.5
}
}
}
GET /products/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "name": "vegetable" } },
{ "match": { "tags": "vegetable" } }
]
}
}
}
GET /products/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "name": "vegetable" } },
{ "match": { "tags": "vegetable" } }
],
"tie_breaker": 0.3
}
}
}
GET /recipes/_search { "query": { "match_all": {} } }
Follow these instructions and specify recipes-bulk.json as the file name.
recipes-bulk.json
curl -H "Content-Type: application/x-ndjson" -XPOST https://bcec8e0e4c:0122727a305d76ffd8ce@dedicated-laurel-1hfqmn7b.us-east-1.bonsaisearch.net/recipes/_bulk --data-binary "@recipes-bulk.json"
curl -X GET -H "Content-Type:application/json" "${ESHOST}/products/_search" -d '{ "query": { "match_all": {} } }'
GET /recipes/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"ingredients.name": "parmesan"
}
},
{
"range": {
"ingredients.amount": {
"gte": 100
}
}
}
]
}
}
}
DELETE /recipes
PUT /recipes
{
"mappings": {
"properties": {
"title": { "type": "text" },
"description": { "type": "text" },
"preparation_time_minutes": { "type": "integer" },
"steps": { "type": "text" },
"created": { "type": "date" },
"ratings": { "type": "float" },
"servings": {
"properties": {
"min": { "type": "integer" },
"max": { "type": "integer" }
}
},
"ingredients": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"amount": { "type": "integer" },
"unit": { "type": "keyword" }
}
}
}
}
}
GET /recipes/_search
{
"query": {
"nested": {
"path": "ingredients",
"query": {
"bool": {
"must": [
{
"match": {
"ingredients.name": "parmesan"
}
},
{
"range": {
"ingredients.amount": {
"gte": 100
}
}
}
]
}
}
}
}
}
The usage for your Bonsai cluster has exceeded the limits for its Sandbox plan. Shard Overage: 12 / 10
PUT /products { "products": { "aliases": {}, "mappings": { "properties": { "created": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis", "print_format": "yyyy/MM/dd HH:mm:ss" }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "doc": { "properties": { "in_stock": { "type": "long" } } }, "in_stock": { "type": "long" }, "is_active": { "type": "boolean" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "price": { "type": "long" }, "sold": { "type": "long" }, "tags": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "settings": { "index": { "replication": { "type": "DOCUMENT" }, "number_of_shards": "1", "auto_expand_replicas": null, "provided_name": "products", "priority": "0", "number_of_replicas": "1" } } } }
PUT /_all/_settings?preserve_existing=true'{ "index.number_of_shards" : "1", "index.number_of_replicas" : "1" }
From lesson 20:
PUT /products { "settings": { "number_of_shards": 1, "number_of_replicas": 1 } }
https://elasticsearch-cheatsheet.jolicode.com/
total_pages = ceil(total_hits/page_size)
from = (page_size * (page_number - 1))
Limited to 10,000 results.
Queries are stateless.
Elasticsearch organizes aggregations into three categories:
https://www.elastic.co/docs/reference/aggregations/
Metric aggregations that calculate metrics, such as a sum or average, from field values.
Bucket aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria.
Pipeline aggregations that take input from other aggregations instead of documents or fields.