Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 3.06 KB

00_Intro.asciidoc

File metadata and controls

56 lines (45 loc) · 3.06 KB

Data in, data out

Whatever program we write, the intention is the same: to organize data in a way that serves our purposes. But data doesn’t consist just of random bits and bytes. We build relationships between data elements in order to represent entities or ``things'' that exist in the real world. A name and an email address have more meaning if we know that they belong to the same person.

In the real world, though, not all entities of the same type look the same. One person might have a home telephone number, while another person only has a cellphone number, and another might have both. One person might have three email addresses, while another has none. A Spanish person will probably have two last names while an English person will probably have only one.

One of the reasons that object-oriented programming languages are so popular is that objects help us to represent and manipulate real world entities with potentially complex data structures. So far so good.

The problem comes when we need to store these entities. Traditionally, we have stored our data in columns and rows in a relational database, the equivalent of using a spreadsheet. All the flexibility gained from using objects is lost due to the inflexibility of our storage medium.

But what if we could store our objects as objects? Instead of modeling our application around the limitations of spreadsheets, we can instead focus on actually using the data. The flexibility of objects is returned to us.

An object is a language-specific, in-memory data structure. To send it across the network, or store it, we need to be able to represent it in some standard format. JSON (JavaScript Object Notation) is a way of representing objects in human-readable text. It has become the de facto standard for exchanging data in the NoSQL world. When an object has been serialized into JSON it is known as a JSON document.

Elasticsearch is a distributed document store. It can store and retrieve complex data structures — serialized as JSON documents — in real-time. In other words, as soon as a document has been stored in Elasticsearch, it can be retrieved from any node in the cluster.

Of course, we don’t only need to store data, we must also query it, en masse and at speed. While a number of NoSQL solutions exist which allow us to store objects as documents, they still require us to think about how we want to query our data, and which fields require a index in order to make data retrieval fast.

In Elasticsearch, all data in every field is indexed by default. That is, every field has a dedicated inverted index for fast retrieval. And, unlike most other databases, it can use all of those inverted indices in the same query, to return results at breathtaking speed.

In this chapter we will discuss the APIs that we use to create, retrieve, update and delete documents. For the moment, we don’t care about the data inside our documents or how to query it. All we care about is how to store our documents safely in Elasticsearch, and how to get them back again.