Skip to content

whatthefoobar/data-interaction-course-materials

 
 

Repository files navigation

Data Interaction Course Materials

This course aims to give you an understanding of back-end development and in it you will learn how to build an HTTP server in node.js and integrating it with a MongoDB database. The course will focus a lot on JavaScript and give you an understanding for how the language works so you will be able to solve issues in your code more readily.

You can find working code samples for each chapter along with some exercises in the course Github repository.

Setting up your Environment

In order to follow along with this course you will want to have the following installed:

  • node.js v16 or higher
  • npm
  • An editor with proper JavaScript support (VSCode, Sublime, Vim, Emacs, …)
  • Git

You can check that you have all necessary command line tools by running the following commands in your terminal:

node --version
npm --version
git --version

MongoDB

Finally you will need to have access to MongoDB. The company MongoDB is pushing really hard for you to use the free-tier of their cloud service Atlas. Unfortunately it requires you to create an account and forces you to go through a lot of settings and options. It’s important that you choose the free tier, otherwise you can just use the defaults. They will also apparently bombard you with marketing emails, so if you decide to create an account I recommend you use a throwaway email account.

If you don’t like to go through the hassle of setting up an account you can either 1) install MongoDB locally following the instructions in link above or 2) use the Docker image if you are comfortable with Docker. Personally I installed it locally.

MongoDB Compass offers a graphical interface to your database and often comes bundled when installing it locally. If you don’t have it installed you can follow these instructions.

Installing and running MongoDB on MacOS

As long as you have Homebrew installed it’s very easy to install both MongoDB and MongoDB Compass on your computer:

brew tap mongodb/brew
brew install mongodb-community@5.0
brew install mongodb-compass

Installing the mongodb-community package should give you two commands:

  • mongosh for interacting with a MongoDB database through the terminal.
  • mongod for starting a MongoDB database process (the final d stands for daemon and means a long-running process).

When installing through Homebrew it seems that running mongod by itself doesn’t work. So instead you need to start MongoDB as a Homebrew service, which means that MongoDB will be running in the background. You do this by running:

brew services start mongodb/brew/mongodb-community

If everything works, you should be able to run mongosh and be taken to a MongoDB prompt.

./assets/mongodb-prompt.png

If this is not working you can list your brew services by running brew services list and see the status of mongodb-community. you can show detailed information about it by running:

brew services info mongodb-community

Installing and running MongoDB on Windows

As long as you have Windows 10 (October 2018 update) or newer it’s very easy to install both MongoDB and MongoDB Compass on your computer using winget:

winget install mongodb.server
winget install mongodb.compass.full

Installing mongodb.server installs mongodb as a service that automatically starts when you start your computer. You can change it with the services application (or sc if you rather like using a terminal).

The MongoDB Compass application is easily launched by pressing Start and typing mongodb. To connect Compass to your server, you simply press “Connect” (no connection string required).

Introduction to node.js

In short, node.js is JavaScript for servers and is now one of the most prevalent programming languages in the world. How come it quickly got so popular?

  • The same language across the stack (front-end and back-end)
  • Simplify the transition to full-stack for front-end developers
  • The asynchronous nature of JavaScript makes it great for easily building high performance HTTP servers

In summary: familiarity and performance

Node.JS was created 12 years ago by creating a system interface to Chrome’s V8 JavaScript engine. That means that Node.JS is running the same version of JavaScript as Chrome and other Chromium-based browsers such as Microsoft Edge, Brave etc. Which V8 version Node uses dictates what JavaScript features it supports. If you are curious you can check which exact version of V8 your node.js installation is using by running the following command in a terminal:

node -p process.versions.v8

node.js vs the Browser

Moving JavaScript out of the browser and onto the server results in a few important differences:

  • There’s no browser environment, that is you do not have access to the global window and document objects.
  • You instead have the global variable global to refer to the global scope.
  • You have the global variable process for reading environment variables etc.
  • You have access to built-in modules for doing things like reading and writing files and networking etc.

Hello Node

We are going to play around with node.js a bit. First create a new directory called hello-node and move into it. Now create a file called index.js and write the following piece of code:

console.log("Hello node! \(>0<)/")

Now you can run your program with the command node index.js and you should see Hello node! \(>0<)/ printed to your terminal. We have run JavaScript outside of the browser and successfully printed text, hooray!

Using built-in modules

Let’s use the built-in file system module fs to play around with files.

import fs from "fs";

const databases = [
  { name: 'MongoDB', type: 'document' },
  { name: 'PostgreSQL', type: 'relational' },
  { name: 'Neo4j', type: 'graph' },
  { name: 'Redis', type: 'in-memory' },
];

fs.writeFileSync("test.txt", JSON.stringify(databases, null, 2));

const contents = fs.readFileSync("test.txt").toString();

console.log(`File contents: ${contents}`);

The difference between the module systems lies not only in cosmetics but also semantics, ES6 modules being a lot more restrictive in when and how you can import modules. Given the flexibility of CommonJS modules we might never see a full transition to ES6 modules.

Writing our own module

Let’s create new module with a function that randomly picks an element from a list. And let’s call it from index.js.

export default function randomElement(xs) {
  const randomIndex = Math.floor((Math.random() * 10) % xs.length)

  return xs[index];
}
import fs from "fs";
import randomElement from './random-element.js';

const databases = [
  { name: 'MongoDB', type: 'document' },
  { name: 'PostgreSQL', type: 'relational' },
  { name: 'Neo4j', type: 'graph' },
  { name: 'Redis', type: 'in-memory' },
];

// ...

const randomDatabase = randomElement(databases);

console.log(`Got database: ${randomDatabase}`);

Messing around with the global scope

Using modules is not the only way of sharing functionality, you can also manipulate the global scope by modifying the global variable.

let count = 0;

global.ourGlobalFunction = (source) => {
  count++;
  console.log(`Call count: ${count} (from ${source})`);
};
import fs from "fs";
import randomElement from './random-element.js';
import './modifying-global-scope.js';

global.ourGlobalFunction(import.meta.url);

// Since the scope is global we can even call it directly as well
ourGlobalFunction(import.meta.url);

// ...

Exercise Try calling ourGlobalFunction from randomElement.js. Try both within the function and outside. Is it working? If not, why not?

Finally, please do not modify ~global~ in /real/ code. it breaks encapsulation and makes it more difficult to understand what’s going on.

Reading environment variables

Another thing we can do in node.js that we can’t do in the browser is to get information about the current environment especially things like environment variables.

We can access environment variables via the process variable:

console.log('USER:', process.env.USER); // Prints your username
console.log('MY_VARIABLE', process.env.MY_VARIABLE); // Prints undefined

Our First API

What is an Application Programming Interface?

  • An API is a set of exposed methods for interacting with a program or package.
  • When you write a JavaScript module and export functions to interact with it you are designing an API.
  • When you are interacting with a third-party package, for example express, you are using its API.
  • Designing an API allows you to create a layer of abstraction which hides implementation details and simplifies using your service or package.

Often when we say API we actually mean an HTTP API to be specific, that is an API which is used over the internet using HTTP.

Creating our API

Express is by far the most popular NPM package for creating HTTP APIs in node.js and has been around almost as long as the language itself. Start by creating a new directory called hello-express and initialize it using npm init (also don’t forget to update package.json if you want to use ES6 modules). Now let’s install Express:

npm install express

Now let’s create our first API by creating a new file called index.js in the project root directory and write the following code:

import express from 'express';

const app = express();

app.get('/hello', (req, res) => {
  res.send('Hello there!').end();
});

const PORT = 8080;

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`)
});

There is a lot to unpack here…

  • We begin by creating an instance of an Express app.
  • Then we register a handler on the /hello endpoint which will respond with Hello there!.
  • Lastly we start a server listening on port 8080.

Starting our server

Run your program by executing node index.js. The first thing you will notice is that your program never quits: you see the message Server running at http://localhost:8080 but you don’t get a new prompt. This is because your program is running a server which is meant to serve responses to requests from clients and your program needs to be kept alive and running to be able to do that.

A client is whatever uses, or consumes, the API served by your server and can be anything from a web browser, website, another server or a command-line tool etc. For now, let’s use our browser as the client and access the URL printed out by the program: http://localhost:8080. You should see an error message saying something like Cannot GET /.

./assets/cannot-get-slash.png

This means that we tried to GET something at the endpoint /. We’ll get more into what GET actually means later when we talk about HTTP, but for now let’s try changing the endpoint and go to http://localhost:8080/hello instead. Now you should instead see the expected message Hello there!.

./assets/hello-express-endpoint.png

So what went wrong the first time? There are four pieces of information needed to interact with a server:

  • The protocol the server expects (http)
  • The machine the server is running on (our machine localhost or 127.0.0.1 if we use its IP address). This is also called the host.
  • The port the server is listening on (8080)
  • The endpoint we want to consume (/hello)

A server only responds on the port it is listening on and only handles requests on endpoints which have been registered on it. When not specifying an endpoint, the browser will pick the default one which is / and since we never registered a handler for that endpoint the request failed. You can think of endpoints as file paths on your own computer.

Adding another endpoint

// ...

app.get('/another-page', (req, res) => {
  res.send('Another page!').end();
});

// ...

If we add another endpoint and try to access it in the browser: http://localhost:8080/another-page we get the same error message as we did before.

The reason is that the server process is already running and changes made to the code will not be reflected until it is restarted. You can stop the server by selecting the terminal where it is running and press Ctrl-c (that means pressing the Ctrl button and the c key at the same time). This will terminate your server and get you back to the terminal prompt.

If you now run node index.js again you will be able to access http://localhost:8080/another-page.

Live-reload and other tooling

A workflow like the above is not only annoying but it can also lead to long troubleshooting sessions trying to figure out why something isn’t working, when in the end you just had to restart the server. Thankfully there is an NPM package which helps us automate this workflow: nodemon. Since we only need it for development we install it as a development dependency:

npm install --save-dev nodemon

Now we add a convenience script called dev in package.json to make it easy to use it:

{
  // ...
  "scripts": {
    "dev": "nodemon index.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
  // ...
}

By running npm run dev your server will be started up and nodemon will watch your files for changes and restart the server when necessary.

There is another tool I highly recommend you install and that is prettier. This tool formats your code automatically and you should be able to make your editor run it every time you save. Here is a VSCode plugin and here is one for Emacs.

Back to our endpoint

Let’s make our new endpoint do something more interesting: let’s see what happens if we serve a string which looks like HTML.

// ...

app.get("/another-page", (req, res) => {
  res
    .send(
      `
<html>
<head>
  <style>
  body {
    margin: 32px;
    background: hotpink;
    color: darkgreen;
    font-family: arial;
  }
  </style>
</head>
<body>
  <h1>Our beautiful page</h1>
  <marquee>We're serving a string which is rendered as a web page!</marquee>
</body>
</html>
`
    )
    .end();
});

// ...

And we can see that our browser interprets it as HTML! The secret is that the browser interprets EVERYTHING as HTML, so we shouldn’t be surprised.

While it’s pretty cool that we can serve web pages as plain strings, what you usually want to do is to serve HTML files instead. We move our HTML to a file which we can call beautiful-page.html.

<html>
<head>
  <style>
  body {
    margin: 32px;
    background: hotpink;
    color: darkgreen;
    font-family: arial;
  }
  </style>
</head>
<body>
  <h1>Our beautiful page</h1>
  <marquee>We're serving a string which is rendered as a web page!</marquee>
</body>
</html>

And we change our handler to read that file and serve its contents.

import express from "express";
import fs from "fs";

// ...

app.get("/another-page", (req, res) => {
  const contents = fs.readFileSync("beautiful-page.html").toString();

  res.send(contents).end();
});

// ...

The page should load like before but the code looks a lot nicer without the inline HTML.

A website made up from files like this is called a static website. This is how the whole web worked through-out the 90s and the beginning of the 00s until Single Page Applications (SPAs) became a thing. In this course we will assume you will write your website as a SPA (in React), so we won’t be serving static pages. In addition, the above code is highly inefficient and is just for illustrative purposes. First we are reading the HTML file for every request even though the contents doesn’t change, this will lead to a lot of file system access which impacts performance. Second, we send the page a single string all at once which also impacts performance. If you are interested in how to serve static web pages using Express you can have a look at this documentation.

HTTP + API Deep-dive

Intro to MongoDB

MongoDB is a document (NoSQL) database and has a few important characteristics which makes it a suitable as a first database:

  • Flexible data schemas.
  • Intuitive data models (basically looks like JSON).
  • Simple yet powerful query language.

MongoDB, and document databases in general, are often used in MVPs and prototypes when you are still exploring and have yet to decide on the data models to use. This does not mean however that they are not production-ready: document databases are among the most scalable databases out there and allow for efficient horizontal scaling (this means running multiple connected instances in a database cluster).

While we discuss MongoDB specifically in this section many of the concepts are applicable to other document databases as well such as CouchDB and elasticsearch, though the terminology might be a bit different.

A MongoDB system consists of one or several databases, which each can have one or multiple collections and each collection contains documents. Documents are the central concept of a document database, naturally.

Schemas

The main selling point of MongoDB compared to relational (SQL) databases (MySQL, Postgres, …) is the flexibility. In relational databases you have to define how your data is structured and the relationship between different kinds of data models. The structure of your data is called its schema or sometimes its data model and defines the properties it has and what data types these properties have. Here’s a made-up example of how a schema might look like:

PersonSchema = {
  "id": "string",
  "name": "string",
  "age": "integer",
  "weight": "float",
}

In a relational database a schema like the above ensures for instance that a Person’s name is a string and that its weight is a float. If you would try to store a Person with a string weight the operation would fail. This makes it difficult for bad and ill-structured data to enter the database.

In a document database schemas still exist, but they are just suggestions and are meant to improve performance when querying the data. As you most likely will see when you start to work with MongoDB yourself is that it will happily accept a float as the name, or even allow you to insert documents with a completely different set of properties in the same collection.

./assets/mongodb-compass-table-example.png

This flexibility is something to be mindful of and I recommend using MongoDB Compass to explore your data set from time to time to ensure that it looks like you expect it to.

Operations

Operations are ways of interacting with your database in the terms of data, the most general operations being:

  • Create data
  • Read data
  • Update data
  • Delete data

These are often called CRUD operations for short.

The following sections describes what the common CRUD operations are in MongoDB and examples assume that you have a connected db database instance available:

const client = mongodb.MongoClient('mongodb://localhost:27017');
await client.connect();

const db = client.db('mongodb-intro');

The code assumes that you have the mongodb package in scope and you are in an async context where you can use async.

Inserting

In MongoDB the act of creating data in a collection is called inserting.

await db.collection('languages').insertOne({
  name: 'JavaScript',
  family: 'C',
  year: 1995
});
const languages = [{
    name: 'Haskell',
    family: 'ML',
    year: 1990
  }, {
    name: 'Rust',
    family: 'ML',
    year: 2010,
  }, {
    name: 'Java',
    family: 'C',
    year: 1995,
  }, {
    name: 'Common Lisp',
    family: 'Lisp',
    year: 1984,
  }];

await db.collection('languages').insertMany(languages)

Finding (Filtering or Querying)

The operations for reading data are called find in the API but are often referred to as filtering or querying as well.

const cursor = db.collection("languages").find({});
const results = cursor.toArray();

console.log(results);

The find operation can potentially return a huge amount of documents depending on the size of your data set so it does not return the results directly, but a cursor pointing to the results. This allows you to either do further processing or return a subset of the results. You can get all of the matching results by calling its toArray() method as in the example above.

The simplest filter apart from an empty one is to match on properties exactly. In this example we are picking out allow of the programming languages related to C in our data set.

const filter = {
  family: 'C' // Matching property exactly
}
const results = await db.collection('languages').find(filter).toArray();

console.log(results);

The findOne operation will return the first document it finds which matches the filter.

const filter = {
  type: 'ML'
}

const result = await db.collection('languages').findOne(filter);

For more advanced filtering we use query operators, you can quickly identify them since they start with a $. Some common ones are $gte (greater-than-or-equal), $lte (less-than-or-equal) and $regex for matching against a regular expression.

const filter = {
  name: { $regex: /Java/ }
}
const results = await db.collection('languages').find(filter).toArray();

console.log(results);

We can also combine multiple operators to express more complex queries; the next example finds all of the languages created in the 90s.

const filter = {
  year: {
    $gte: 1990,
    $lte: 1999
  }
};

You can sort your results with the cursor’s sort method by passing it an object containing the property you want to sort on and 1 for ascending results (low to high) or -1 for ascending (high to low).

const cursor = await db.collection('languages').find({});
const results = cursor.sort({ year: 1 }).toArray();

console.log(results);

Deleting

Deleting documents is very similar to finding documents just replace the find or findOne methods with deleteMany or deleteOne, the methods use the same kind of filters.

await db.collection('languages').deleteOne({
  name: 'Java'
});

Updating

Updating can be seen as a combination of a find operation and a write operation. As with the other operations you can either call updateOne or updateMany to update multiple documents at the same time and these methods take two arguments: a filter object to specify which documents will be affected, and an update object defining the modification.

const filter = { name: 'JavaScript'};
const modification = { $set: { year: 2022 } };

await db.collection('languages').updateOne(filter, modification)

What is an ObjectId?

JavaScript Deep-Dive

Resources and useful links

General

Express - Async/Await in Express

MongoDB

About

course materials for data interaction module HI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CSS 46.8%
  • JavaScript 40.5%
  • Emacs Lisp 8.5%
  • Nix 3.9%
  • Shell 0.3%