Fixing merge conflict in elasticsearch.md

reykjavik-university · Nov 13, 2014 · b26b027 · b26b027
2 parents bb07e31 + 06bc037
commit b26b027
Show file tree

Hide file tree

Showing 3 changed files with 210 additions and 71 deletions.
diff --git a/Week10/MongoDB b/Week10/MongoDB
diff --git a/Week10/MongoDB.md b/Week10/MongoDB.md
@@ -0,0 +1,122 @@
+#MongoDB
+
+MongoDB is a NoSQL database that uses a document-oriented data model.  MongoDB does not use tables and rows as relational 
+databases do but instead all data is stored in documents and collections.  MongoDB is therefore schema free.   
+Data is stored on BSON format which is a binary-encoded serialization of JSON-like documents.  
+These objects are added to a collection.  Collections are similar to tables in a relational database. 
+
+MongoDB is a fast and scalable database, it is good for many things but it is no recommended to use this as a database for 
+applications that store sensitive data.    
+
+It is easy to run many instances of MongoDB, if that is done the instances replicate the data between them.   
+
+MongoDB does not support traditional sql query language.   Instead it offers its own query language and it is easy 
+to find good information about that on the official MongoDB website.
+
+MongoDB is a database server that have to be installed to be used.   
+##Setup
+Ubuntu: 
+> sudo apt-get install mongodb
+
+Mac OSX using brew:
+> brew install mongodb
+
+Mac OSX alternative:
+> curl -O http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.4.tgz
+
+> tar -zxvf mongodb-osx-x86_64-2.6.4.tgz
+
+> mkdir -p mongodb
+
+> cp -R -n mongodb-osx-x86_64-2.6.4/ mongodb
+
+The server is started from the command line with the command: 
+
+> sudo service mongod start    
+This command starts up the mongo daemon.
+
+MongoDB needs a data folder and it has to be created with sudo rights. By default it stores its data in /var/lib and logs in /var/log/mongdb, but you can also create your one like this:
+
+> sudo mkdir -p /data/db
+
+And then behind closed doors let’s give everyone access rights to this folder:
+
+> chmod –R 777 /data/db
+
+#Mongo
+
+Mongo is a console based client that can be used to query data in MongoDB.  There are also other tools available online 
+that have more visual interface as [Robomongo](http://robomongo.org/)
+
+This command lists all databases
+> show database 
+
+This command switches or creates the database mydb
+> use mydb
+
+This command shows all collections in mydb.
+> show collections  
+
+To create a collection, create the JSON object 
+> var x = {‘user’: ‘hlysig’, ‘course’:’Forritun 1’, ‘grade’:’4’}
+
+To insert into the grades collection give the command:
+> db.grades.insert(x)
+
+Further information can be found on [mongoDB](http://www.mongodb.org)
+
+#Mongoose
+
+[Mongoose](http://mongoosejs.com/) is a framework that can be used in applications to connect to MongoDB.
+##Setup
+> npm install mongoose
+
+##Getting started
+To include mongoose in your project 
+> var mongoose = require('mongoose');
+
+> mongoose.connect('mongodb://localhost/test');
+
+Then to create a new schema you can do something this:
+
+```
+var mySchema = new mongoose.Schema({
+	name: String,
+	birthday: { type: Date, default: Date.now },
+	age: Number
+});
+```
+You can even add method to our schema like this. Please note that if we want methods we have to add them before compiling with mongoose.model()
+```
+mySchema.methods.info = function () {
+  var greeting = this.name
+    ? "My name is " + this.name
+    : "I'm sorry I don't have a name"
+}
+```
+Then to create our mySchema we need to pass it into mongoose.model(modelName, schema) and then call the save method
+
+```
+var Person = mongoose.model('Person', mySchema);
+```
+Now we can create our person
+```
+var person = new Person({ name: 'Dabs', age: 7 }) // We don't need to set our date, since we have a default
+```
+And then to save it in MongoDB we call save
+```
+person.save(function (err, person) {
+  if (err) return console.error(err);
+  person.info(); //Should say "My name is Dabs"
+```
+To query for all persons we can do 
+```
+Person.find(function (err, persons) {
+  if (err) return console.error(err);
+  console.log(persons);
+})
+```
+For more information about queries see [here](http://mongoosejs.com/docs/queries.html)
+
+
+
diff --git a/Week10/elasticsearch.md b/Week10/elasticsearch.md
@@ -131,11 +131,11 @@ If we `PUT` the document again, using the same id, the document is updated in th
 
     curl -XPUT http://localhost:9200/entries/entry/1 -d '{
     "title": "Today I learned to search",
-    "data": "Lets change the blog content", 
+    "data": "Lets change the blog content",
     "created": "2014-12-24T14:24:23"}'
-    {"_index":"entries","_type":"entry","_id":"1","_version":2,"created":false}%   
+    {"_index":"entries","_type":"entry","_id":"1","_version":2,"created":false}%
 
-    % curl http://localhost:9200/entries/entry/1            
+    % curl http://localhost:9200/entries/entry/1
     {"_index":"entries","_type":"entry","_id":"1","_version":2,"found":true,"_source":{
     "title": "Today I learned to search",
     "data": "Lets change the blog content",
@@ -148,8 +148,8 @@ You can remove a given document from the index using the `DELETE` HTTP method.
 
     % curl -XDELETE http://localhost:9200/entries/entry/1
     {"found":true,"_index":"entries","_type":"entry","_id":"1","_version":3}%
-    
-    % curl http://localhost:9200/entries/entry/1         
+
+    % curl http://localhost:9200/entries/entry/1
     {"_index":"entries","_type":"entry","_id":"1","found":false}%
 
 If we try to fetch the document after the delete, we can see that it has been
@@ -223,7 +223,7 @@ To perform a query we POST on the _search endpoint with our query in the body of
     curl -XPOST http://localhost:9200/entries/_search -d '
     {
         "query":{
-            // query goes here! 
+            // query goes here!
         }
     }'
 
@@ -239,7 +239,7 @@ Let's create a query and search for entries that contain the work "scaffolding"
             "query_string": {
                 "query": "scaffolding",
                 "default_field" : "data"
-            } 
+            }
         }
     }'
 
@@ -251,15 +251,15 @@ When executing this query:
     quote>             "query_string": {
     quote>                 "query": "scaffolding",
     quote>                 "default_field" : "data"
-    quote>             } 
+    quote>             }
     quote>         }
     quote>     }'
     {"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.095891505,"hits":[{"_index":"entries","_type":"entry","_id":"2","_score":0.095891505,"_source":{
       "title": "Ruby on rails is just a bubble in bathtub",
       "data": "Yeah, Ruby on rails is awesome, but scaffolding is not!",
       "created": "2013-04-02T12:34:02",
       "tags": ["programming", "ruby", "rails"]
-    }}]}}%            
+    }}]}}%
 
 We get back one entry where we have the text scaffolding in and that is document with id 2.
 
@@ -294,4 +294,82 @@ We get back one entry where we have the text scaffolding in and that is document
                 }
             }
         }
-    }'
+    }'
+
+# Advanced key storing
+When storing keys in Elastic search it automatically stores the keys cleverly, for example when storing a key that includes a dash it splits the key on dashes.
+
+If we have a slug for example (some-pretty-long-slug), elastic would want to split this into smaller pieces to make the lookup faster. In some cases this is not good since we might want to look up by the slug (which should be unique). In order to prevent elastic search from changing the key we need to create our own mapping for elastic search.
+
+Let's use Kodemon from PA3 as an example. There we had keys that had a combined name of function and file. These keys should be unique (although it's not 100% accurate since 2 files could have the same name in different directories) and we want to look up each key as a unique key in the index.
+
+>If you want to run the following code make sure Elastic Search is running with the default settings, or change the code according to your own setup.
+
+>Do note that this will not work if you have data in elastic search that do not want to loose. To be able to use this method for existing data you would need to create data migration which is not covered in this section.
+
+We start of by creating the index. We can use the following script to do so
+```bash
+# Delete the kodemon index if you have one already
+curl -XDELETE http://127.0.0.1:9200/kodemon
+# Create the kodemon/execution index with one document
+curl -XPOST http://127.0.0.1:9200/kodemon/execution -d '{ "key": "script-hehe", "token": "token" }'
+# Delete all documents from the execution index
+curl -XDELETE http://127.0.0.1:9200/kodemon/execution
+```
+
+Now we can be sure that we have the index we want to create a custom mapping for.
+Let's assume the data is in the following format when it gets posted to elastic search.
+```json
+{
+    "execution_time": 0.0019073486328125,
+    "timestamp": "2014-11-05T02:12:39.000Z",
+    "token": "test-token",
+    "key": "second.py-cool_function",
+    "_id": "54598797538bb0f6084f0072"
+}
+```
+
+The next thing we want to do is to make sure that our the propertie key will not be stored as ```second.py``` and ```cool_function``` but as a single piece ```second.py-cool_function```.
+
+In order to do that we could execute the following script.
+```bash
+curl -X POST http://127.0.0.1:9200/kodemon/execution/_mapping?pretty=true -d '
+{
+    "execution": {
+        "properties": {
+            "key": {
+                "type": "multi_field",
+                "fields": {
+                    "original_key": {
+                        "type": "string",
+                        "index": "not_analyzed"
+                    },
+                    "key": {
+                        "type": "string",
+                        "index": "analyzed"
+                    }
+                }
+            },
+            "token": {
+                "type": "multi_field",
+                "fields": {
+                    "original_token": {
+                        "type": "string",
+                        "index": "not_analyzed"
+                    },
+                    "token": {
+                        "type": "string",
+                        "index": "analyzed"
+                    }
+                }
+            }
+        }
+    }
+}'
+```
+
+The following code takes all executions in and looks at the properties key and token specifically to see how to store them, the rest gets it's default mapping.
+
+We take both the key and token properties and tell elastic search to store these single fields as two properties. First as the original string which we say is not analyzed, which means that if we now query the propertie key by key.original_key then we are asking elastic to filter out where each key is exactly as it was when it got saved (ex. ```second.py-cool_function```).
+
+On the other hand if we still want to be able to look it up in a clever way elastic has also saved the properties like it wants to. So if we query by key as before we'll get the key split up on dashes like the default mapping behavior wants to. You can read more about mapping [here](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html)