Skip to content

Commit

Permalink
copyedits and formatting fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
kaitlinnewson committed Nov 28, 2023
1 parent 17c9bc3 commit 97f14d3
Show file tree
Hide file tree
Showing 13 changed files with 58 additions and 63 deletions.
33 changes: 17 additions & 16 deletions episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ into a field that should contain a number. Understanding the nature of relationa
databases, and using SQL, will help you in using databases in programming languages
such as R or Python.

Many web applications (including WordPress and ecommerce sites like Amazon) run on a SQL (relational) database. Understanding SQL is the first step in eventually building custom web applications that can serve data to users.
Many web applications (including WordPress and e-commerce sites like Amazon) run on a SQL (relational) database. Understanding SQL is the first step in eventually building custom web applications that can serve data to users.

## Why are people working in library- and information-related roles well suited to SQL?

Expand All @@ -77,7 +77,7 @@ direct way of finding information.

- You can use SQL to query your library database and explore new views that are not necessarily provided via library systems patron facing interfaces.

- SQL can be used to keep an inventory of items, for instance, for a library's makerspace, or it can be used to track licenses for journals.
- SQL can be used to keep an inventory of items, for instance, for a library's makerspace, or it can be used to track licences for journals.

- For projects involving migrating and cleaning data from one system to another, SQL can be a handy tool.

Expand Down Expand Up @@ -108,34 +108,35 @@ Let's all open the database we downloaded via the setup in DB Browser for SQLite
You can see the tables in the database by looking at the left hand side of the
screen under Tables.

To see the contents of a table, click on that table and then click on the Browse
Data tab above the table data.
To see the contents of a table, click on "Browse Data" then select the table in the "Table" dropdown in the upper left corner.

If we want to write a query, we click on the Execute SQL tab.
If we want to write a query, we click on the "Execute SQL" tab.

There are two ways to add new data to a table without writing SQL:

1. Enter data into a CSV file and append
2. Click the "Browse Data" tab, then click the "New Record" button.

The steps for adding data from a CSV file are:
To add data from a CSV file:

1. Choose "File" > "Import" > "Table" from CSV file...
2. DB Browser for SQLite will prompt you if you want to add the data to the existing table.
1. Choose "File" > "Import" > "Table from CSV file..."
2. Select a CSV file to import
3. Review the import settings and confirm that the column names and fields are correct
4. Click "OK" to import the data. If the table name matches an existing table and the number of columns match, DB Browser will ask if you want to add the data to the existing table.

## Dataset Description

The data we will be using consists of 5 csv files that contain tables of article titles, journals, languages, licenses, and publishers. The information in these tables are from a sample of 51 different journals published during 2015.
The data we will use was created from 5 csv files that contain tables of article titles, journals, languages, licences, and publishers. The information in these tables are from a sample of 51 different journals published during 2015.

**articles**

- Contains individual article Titles and the associated citations and metadata
- Contains individual article titles and the associated citations and metadata.
- (16 fields, 1001 records)
- Field names: `id`, `Title`, `Authors`, `DOI`, `URL`, `Subjects`, `ISSNs`, `Citation`, `LanguageID`, `LicenseID`, `Author_Count`, `First_Author`, `Citation_Count`, `Day`, `Month`, `Year`
- Field names: `id`, `Title`, `Authors`, `DOI`, `URL`, `Subjects`, `ISSNs`, `Citation`, `LanguageID`, `LicenceID`, `Author_Count`, `First_Author`, `Citation_Count`, `Day`, `Month`, `Year`

**journals**

- Contains various journal Titles and associated metadata. The table also associates Journal Titles with ISSN numbers that are then referenced in the 'articles' table by the `ISSNs` field.
- Contains various journal titles and associated metadata. The table also associates Journal Titles with ISSN numbers that are then referenced in the 'articles' table by the `ISSNs` field.
- (5 fields, 51 records)
- Field names: `id`, `ISSN-L`,`ISSNs`, `PublisherID`, `Journal_Title`

Expand All @@ -145,9 +146,9 @@ The data we will be using consists of 5 csv files that contain tables of article
- (2 fields, 4 records)
- Field names: `id`, `Language`

**licenses**
**licences**

- ID table which associates License codes with id numbers. These id numbers are then referenced in the 'articles' table by the `LicenseID` field.
- ID table which associates Licence codes with id numbers. These id numbers are then referenced in the 'articles' table by the `LicenceID` field.
- (2 fields, 4 records)
- Field names: `id`, `Licence`

Expand All @@ -163,14 +164,14 @@ The main data types that are used in doaj-article-sample database are `INTEGER`

## SQL Data Type Quick Reference

Different database software/platforms have different names and sometimes different definitions of data types, so you'll need to understand the data types for any platform you are using. The following table explains some of the common data types and how they are represented in SQLite; [more details available on the SQLite website](https://www.sqlite.org/datatype3.html).
Different database software/platforms have different names and sometimes different definitions of data types, so you'll need to understand the data types for any platform you are using. The following table explains some of the common data types and how they are represented in SQLite; [more details available on the SQLite website](https://www.sqlite.org/datatype3.html).

| Data type | Details | Name in SQLite |
| :--------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------------------------------------------------------------------------------------------------------------------- |
| boolean or binary | this variable type is often used to represent variables that can only have two values: yes or no, true or false. | doesn't exist - need to use integer data type and values of 0 or 1. |
| integer | sometimes called whole numbers or counting numbers. Can be 1,2,3, etc., as well as 0 and negative whole numbers: -1,-2,-3, etc. | INTEGER |
| float, real, or double | a decimal number or a floating point value. The largest possible size of the number may be specified. | REAL |
| text or string | and combination of numbers, letters, symbols. Platforms may have different data types: one for variables with a set number of characters - e.g., a zip code or postal code, and one for variables with an open number of characters, e.g., an address or description variable. | TEXT |
| text or string | any combination of numbers, letters, symbols. Platforms may have different data types: one for variables with a set number of characters - e.g., a zip code or postal code, and one for variables with an open number of characters, e.g., an address or description variable. | TEXT |
| date or datetime | depending on the platform, may represent the date and time or the number of days since a specified date. This field often has a specified format, e.g., YYYY-MM-DD | doesn't exist - need to use built-in date and time functions and store dates in real, integer, or text formats. See [Section 2.2 of SQLite documentation](https://www.sqlite.org/datatype3.html#date_and_time_datatype) for more details. |
| blob | a Binary Large OBject can store a large amount of data, documents, audio or video files. | BLOB |

Expand Down
6 changes: 3 additions & 3 deletions episodes/02-selecting-sorting-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ exercises: 5

## What is a query?

A query is a question or request for data. For example, "How many journals does our library subscribe to?" When we query a database, we can ask the same question using a common language called Structured Query Language or SQL in what is called a statement. Some of the most useful queries - the ones we are introducing in this first section - are used to return results from a table that match specific criteria.
A query is a question or request for data. For example, "How many journals does our library subscribe to?". When we query a database, we can ask the same question using Structured Query Language (SQL) in what is called a statement. Some of the most useful queries - the ones we are introducing in this first section - are used to return results from a table that match specific criteria.

## Writing my first query

Let's start by opening DB Browser for SQLite and the doaj-article-sample database (see Setup). Choose `Browse Data` and the `articles` table. The articles table contains columns or fields such as `Title`, `Authors`, `DOI`, `URL`, etc.
Let's start by opening DB Browser for SQLite and the doaj-article-sample database (see [Setup](/)). Click "Browse Data" and select the `articles` table in the "Table" dropdown menu. The articles table contains columns or fields such as `Title`, `Authors`, `DOI`, `URL`, etc.

Let's write a SQL query that selects only the `Title` column from the `articles` table.

Expand Down Expand Up @@ -60,7 +60,7 @@ SELECT Title, Authors, ISSNs, Year, DOI
FROM articles;
```

Or we can select all of the columns in a table using the wildcard `*`.
Or we can select all of the columns in a table using the wildcard `*`:

```sql
SELECT *
Expand Down
4 changes: 2 additions & 2 deletions episodes/03-filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ exercises: 10

## Filtering

SQL is a powerful tool for filtering data in databases based on a set of conditions. Let's say we only want data for a specific ISSN, for instance, for the *Acta Crystallographica* journal from the `articles` table. The journal has an ISSN code `2056-9890`. To filter by this ISSN code, we will use the `WHERE` clause.
SQL is a powerful tool for filtering data in databases based on a set of conditions. Let's say we only want data for a specific ISSN, for instance, for the *Acta Crystallographica* journal from the `articles` table. The journal has an ISSN code `2056-9890`. To filter by this ISSN code, we will use the `WHERE` clause.

```sql
SELECT *
Expand All @@ -44,7 +44,7 @@ ISSNs codes "2076-0787" and "2077-1444", we can combine the tests using OR:
```sql
SELECT *
FROM articles
WHERE (issns = '2076-0787') OR (issns = '2077-1444');
WHERE (ISSNs = '2076-0787') OR (ISSNs = '2077-1444');
```

When you do not know the entire value you are searching for, you can use comparison keywords such as `LIKE`, `IN`, `BETWEEN...AND`, `IS NULL`. For instance, we can use `LIKE` in combination with `WHERE` to search for data that matches a pattern.
Expand Down
4 changes: 2 additions & 2 deletions episodes/04-ordering-commenting.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ WHERE (ISSNs IN ('2076-0787', '2077-1444', '2067-2764|2247-6202'));
```

We started with something simple, then added more clauses one by one, testing
their effects as we went along. For complex queries, this is a good strategy, to make sure you are getting what you want. Sometimes it might help to take a subset of the data that you can easily see in a temporary database to practice your queries on before working on a larger or more complicated database.
their effects as we went along. For complex queries, this is a good strategy, to make sure you are getting what you want. Sometimes it might help to take a subset of the data that you can easily see in a temporary database to practice your queries on before working on a larger or more complicated database.

When the queries become more complex, it can be useful to add comments to express to yourself, or to others, what you are doing with your query. Comments help explain the logic of a section and provide context for anyone reading the query. It's essentially a way of making notes within your SQL. In SQL, comments begin using <code class="language-plaintext highlighter-rouge">\--</code> and end at the end of the line. To mark a whole paragraph as a comment, you can enclose it with the characters /\* and \*/. For example, a commented version of the above query can be written as:

Expand All @@ -93,11 +93,11 @@ ON publishers.id = journals.PublisherId;
```

To see the introduction and explanation of JOINS, please click to [Episode 6](06-joins-aliases.md).
{: .sql}

:::::::::::::::::::::::::::::::::::::::: keypoints

- Queries often have the structure: SELECT data FROM table WHERE certain criteria are present.
- Comments can make our queries easier to read and understand.

::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
6 changes: 3 additions & 3 deletions episodes/05-aggregating-calculating.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Aggregating & calculating values
title: Aggregating and calculating values
teaching: 15
exercises: 5
---
Expand Down Expand Up @@ -72,7 +72,7 @@ For example, we can adapt the last request we wrote to only return information a
SELECT ISSNs, COUNT(*)
FROM articles
GROUP BY ISSNs
HAVING count(Title) >= 10;
HAVING COUNT(Title) >= 10;
```

The `HAVING` keyword works exactly like the `WHERE` keyword, but uses aggregate functions instead of database fields. When you want to filter based on an aggregation like `MAX, MIN, AVG, COUNT, SUM`, use `HAVING`; to filter based on the individual values in a database field, use `WHERE`.
Expand Down Expand Up @@ -108,7 +108,7 @@ In SQL, we can also perform calculations as we query the database. Also known as
```sql
SELECT Title, ISSNs, Author_Count -1 as CoAuthor_Count
FROM articles
ORDER BY Author_Count -1 DESC;
ORDER BY CoAuthor_Count DESC;
```

In section [6\. Joins and aliases](06-joins-aliases.md) we are going to learn more about the SQL keyword `AS` and how to make use of aliases - in this example we simply used the calculation and `AS` to represent that the new column is different from the original SQL table data.
Expand Down
2 changes: 1 addition & 1 deletion episodes/06-joins-aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Write a query that `JOINS` the `articles` and `journals` tables and that returns
## Solution

```sql
SELECT journals.Journal_Title, count(*), avg(articles.Citation_Count)
SELECT journals.Journal_Title, COUNT(*), AVG(articles.Citation_Count)
FROM articles
JOIN journals
ON articles.ISSNs = journals.ISSNs
Expand Down
Loading

0 comments on commit 97f14d3

Please sign in to comment.