Skip to content

Commit

Permalink
Fixing merge conflict in elasticsearch.md
Browse files Browse the repository at this point in the history
  • Loading branch information
maggask committed Nov 13, 2014
2 parents bb07e31 + 06bc037 commit b26b027
Show file tree
Hide file tree
Showing 3 changed files with 210 additions and 71 deletions.
61 changes: 0 additions & 61 deletions Week10/MongoDB

This file was deleted.

122 changes: 122 additions & 0 deletions Week10/MongoDB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#MongoDB

MongoDB is a NoSQL database that uses a document-oriented data model. MongoDB does not use tables and rows as relational
databases do but instead all data is stored in documents and collections. MongoDB is therefore schema free.
Data is stored on BSON format which is a binary-encoded serialization of JSON-like documents.
These objects are added to a collection. Collections are similar to tables in a relational database.

MongoDB is a fast and scalable database, it is good for many things but it is no recommended to use this as a database for
applications that store sensitive data.

It is easy to run many instances of MongoDB, if that is done the instances replicate the data between them.

MongoDB does not support traditional sql query language. Instead it offers its own query language and it is easy
to find good information about that on the official MongoDB website.

MongoDB is a database server that have to be installed to be used.
##Setup
Ubuntu:
> sudo apt-get install mongodb
Mac OSX using brew:
> brew install mongodb
Mac OSX alternative:
> curl -O http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.4.tgz
> tar -zxvf mongodb-osx-x86_64-2.6.4.tgz
> mkdir -p mongodb
> cp -R -n mongodb-osx-x86_64-2.6.4/ mongodb
The server is started from the command line with the command:

> sudo service mongod start
This command starts up the mongo daemon.

MongoDB needs a data folder and it has to be created with sudo rights. By default it stores its data in /var/lib and logs in /var/log/mongdb, but you can also create your one like this:

> sudo mkdir -p /data/db
And then behind closed doors let’s give everyone access rights to this folder:

> chmod –R 777 /data/db
#Mongo

Mongo is a console based client that can be used to query data in MongoDB. There are also other tools available online
that have more visual interface as [Robomongo](http://robomongo.org/)

This command lists all databases
> show database
This command switches or creates the database mydb
> use mydb
This command shows all collections in mydb.
> show collections
To create a collection, create the JSON object
> var x = {‘user’: ‘hlysig’, ‘course’:’Forritun 1’, ‘grade’:’4’}
To insert into the grades collection give the command:
> db.grades.insert(x)
Further information can be found on [mongoDB](http://www.mongodb.org)

#Mongoose

[Mongoose](http://mongoosejs.com/) is a framework that can be used in applications to connect to MongoDB.
##Setup
> npm install mongoose
##Getting started
To include mongoose in your project
> var mongoose = require('mongoose');
> mongoose.connect('mongodb://localhost/test');
Then to create a new schema you can do something this:

```
var mySchema = new mongoose.Schema({
name: String,
birthday: { type: Date, default: Date.now },
age: Number
});
```
You can even add method to our schema like this. Please note that if we want methods we have to add them before compiling with mongoose.model()
```
mySchema.methods.info = function () {
var greeting = this.name
? "My name is " + this.name
: "I'm sorry I don't have a name"
}
```
Then to create our mySchema we need to pass it into mongoose.model(modelName, schema) and then call the save method

```
var Person = mongoose.model('Person', mySchema);
```
Now we can create our person
```
var person = new Person({ name: 'Dabs', age: 7 }) // We don't need to set our date, since we have a default
```
And then to save it in MongoDB we call save
```
person.save(function (err, person) {
if (err) return console.error(err);
person.info(); //Should say "My name is Dabs"
```
To query for all persons we can do
```
Person.find(function (err, persons) {
if (err) return console.error(err);
console.log(persons);
})
```
For more information about queries see [here](http://mongoosejs.com/docs/queries.html)



98 changes: 88 additions & 10 deletions Week10/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,11 @@ If we `PUT` the document again, using the same id, the document is updated in th

curl -XPUT http://localhost:9200/entries/entry/1 -d '{
"title": "Today I learned to search",
"data": "Lets change the blog content",
"data": "Lets change the blog content",
"created": "2014-12-24T14:24:23"}'
{"_index":"entries","_type":"entry","_id":"1","_version":2,"created":false}%
{"_index":"entries","_type":"entry","_id":"1","_version":2,"created":false}%

% curl http://localhost:9200/entries/entry/1
% curl http://localhost:9200/entries/entry/1
{"_index":"entries","_type":"entry","_id":"1","_version":2,"found":true,"_source":{
"title": "Today I learned to search",
"data": "Lets change the blog content",
Expand All @@ -148,8 +148,8 @@ You can remove a given document from the index using the `DELETE` HTTP method.

% curl -XDELETE http://localhost:9200/entries/entry/1
{"found":true,"_index":"entries","_type":"entry","_id":"1","_version":3}%
% curl http://localhost:9200/entries/entry/1

% curl http://localhost:9200/entries/entry/1
{"_index":"entries","_type":"entry","_id":"1","found":false}%

If we try to fetch the document after the delete, we can see that it has been
Expand Down Expand Up @@ -223,7 +223,7 @@ To perform a query we POST on the _search endpoint with our query in the body of
curl -XPOST http://localhost:9200/entries/_search -d '
{
"query":{
// query goes here!
// query goes here!
}
}'

Expand All @@ -239,7 +239,7 @@ Let's create a query and search for entries that contain the work "scaffolding"
"query_string": {
"query": "scaffolding",
"default_field" : "data"
}
}
}
}'

Expand All @@ -251,15 +251,15 @@ When executing this query:
quote> "query_string": {
quote> "query": "scaffolding",
quote> "default_field" : "data"
quote> }
quote> }
quote> }
quote> }'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.095891505,"hits":[{"_index":"entries","_type":"entry","_id":"2","_score":0.095891505,"_source":{
"title": "Ruby on rails is just a bubble in bathtub",
"data": "Yeah, Ruby on rails is awesome, but scaffolding is not!",
"created": "2013-04-02T12:34:02",
"tags": ["programming", "ruby", "rails"]
}}]}}%
}}]}}%

We get back one entry where we have the text scaffolding in and that is document with id 2.

Expand Down Expand Up @@ -294,4 +294,82 @@ We get back one entry where we have the text scaffolding in and that is document
}
}
}
}'
}'

# Advanced key storing
When storing keys in Elastic search it automatically stores the keys cleverly, for example when storing a key that includes a dash it splits the key on dashes.

If we have a slug for example (some-pretty-long-slug), elastic would want to split this into smaller pieces to make the lookup faster. In some cases this is not good since we might want to look up by the slug (which should be unique). In order to prevent elastic search from changing the key we need to create our own mapping for elastic search.

Let's use Kodemon from PA3 as an example. There we had keys that had a combined name of function and file. These keys should be unique (although it's not 100% accurate since 2 files could have the same name in different directories) and we want to look up each key as a unique key in the index.

>If you want to run the following code make sure Elastic Search is running with the default settings, or change the code according to your own setup.
>Do note that this will not work if you have data in elastic search that do not want to loose. To be able to use this method for existing data you would need to create data migration which is not covered in this section.
We start of by creating the index. We can use the following script to do so
```bash
# Delete the kodemon index if you have one already
curl -XDELETE http://127.0.0.1:9200/kodemon
# Create the kodemon/execution index with one document
curl -XPOST http://127.0.0.1:9200/kodemon/execution -d '{ "key": "script-hehe", "token": "token" }'
# Delete all documents from the execution index
curl -XDELETE http://127.0.0.1:9200/kodemon/execution
```

Now we can be sure that we have the index we want to create a custom mapping for.
Let's assume the data is in the following format when it gets posted to elastic search.
```json
{
"execution_time": 0.0019073486328125,
"timestamp": "2014-11-05T02:12:39.000Z",
"token": "test-token",
"key": "second.py-cool_function",
"_id": "54598797538bb0f6084f0072"
}
```

The next thing we want to do is to make sure that our the propertie key will not be stored as ```second.py``` and ```cool_function``` but as a single piece ```second.py-cool_function```.

In order to do that we could execute the following script.
```bash
curl -X POST http://127.0.0.1:9200/kodemon/execution/_mapping?pretty=true -d '
{
"execution": {
"properties": {
"key": {
"type": "multi_field",
"fields": {
"original_key": {
"type": "string",
"index": "not_analyzed"
},
"key": {
"type": "string",
"index": "analyzed"
}
}
},
"token": {
"type": "multi_field",
"fields": {
"original_token": {
"type": "string",
"index": "not_analyzed"
},
"token": {
"type": "string",
"index": "analyzed"
}
}
}
}
}
}'
```

The following code takes all executions in and looks at the properties key and token specifically to see how to store them, the rest gets it's default mapping.

We take both the key and token properties and tell elastic search to store these single fields as two properties. First as the original string which we say is not analyzed, which means that if we now query the propertie key by key.original_key then we are asking elastic to filter out where each key is exactly as it was when it got saved (ex. ```second.py-cool_function```).

On the other hand if we still want to be able to look it up in a clever way elastic has also saved the properties like it wants to. So if we query by key as before we'll get the key split up on dashes like the default mapping behavior wants to. You can read more about mapping [here](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html)

0 comments on commit b26b027

Please sign in to comment.