-
Notifications
You must be signed in to change notification settings - Fork 1.8k
/
Copy path15-database.php
1258 lines (1216 loc) · 54.7 KB
/
15-database.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?php if ( file_exists("../booktop.php") ) {
require_once "../booktop.php";
ob_start();
}?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>-</title>
<style>
html {
color: #1a1a1a;
background-color: #fdfdfd;
}
body {
margin: 0 auto;
max-width: 36em;
padding-left: 50px;
padding-right: 50px;
padding-top: 50px;
padding-bottom: 50px;
hyphens: auto;
overflow-wrap: break-word;
text-rendering: optimizeLegibility;
font-kerning: normal;
}
@media (max-width: 600px) {
body {
font-size: 0.9em;
padding: 12px;
}
h1 {
font-size: 1.8em;
}
}
@media print {
html {
background-color: white;
}
body {
background-color: transparent;
color: black;
font-size: 12pt;
}
p, h2, h3 {
orphans: 3;
widows: 3;
}
h2, h3, h4 {
page-break-after: avoid;
}
}
p {
margin: 1em 0;
}
a {
color: #1a1a1a;
}
a:visited {
color: #1a1a1a;
}
img {
max-width: 100%;
}
h1, h2, h3, h4, h5, h6 {
margin-top: 1.4em;
}
h5, h6 {
font-size: 1em;
font-style: italic;
}
h6 {
font-weight: normal;
}
ol, ul {
padding-left: 1.7em;
margin-top: 1em;
}
li > ol, li > ul {
margin-top: 0;
}
blockquote {
margin: 1em 0 1em 1.7em;
padding-left: 1em;
border-left: 2px solid #e6e6e6;
color: #606060;
}
code {
font-family: Menlo, Monaco, Consolas, 'Lucida Console', monospace;
font-size: 85%;
margin: 0;
hyphens: manual;
}
pre {
margin: 1em 0;
overflow: auto;
}
pre code {
padding: 0;
overflow: visible;
overflow-wrap: normal;
}
.sourceCode {
background-color: transparent;
overflow: visible;
}
hr {
background-color: #1a1a1a;
border: none;
height: 1px;
margin: 1em 0;
}
table {
margin: 1em 0;
border-collapse: collapse;
width: 100%;
overflow-x: auto;
display: block;
font-variant-numeric: lining-nums tabular-nums;
}
table caption {
margin-bottom: 0.75em;
}
tbody {
margin-top: 0.5em;
border-top: 1px solid #1a1a1a;
border-bottom: 1px solid #1a1a1a;
}
th {
border-top: 1px solid #1a1a1a;
padding: 0.25em 0.5em 0.25em 0.5em;
}
td {
padding: 0.125em 0.5em 0.25em 0.5em;
}
header {
margin-bottom: 4em;
text-align: center;
}
#TOC li {
list-style: none;
}
#TOC ul {
padding-left: 1.3em;
}
#TOC > ul {
padding-left: 0;
}
#TOC a:not(:hover) {
text-decoration: none;
}
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
/* The extra [class] is a hack that increases specificity enough to
override a similar rule in reveal.js */
ul.task-list[class]{list-style: none;}
ul.task-list li input[type="checkbox"] {
font-size: inherit;
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<h1 id="using-databases-and-sql">Using Databases and SQL</h1>
<h2 id="what-is-a-database">What is a database?</h2>
<p></p>
<p>A <em>database</em> is a file that is organized for storing data.
Most databases are organized like a dictionary in the sense that they
map from keys to values. The biggest difference is that the database is
on disk (or other permanent storage), so it persists after the program
ends. Because a database is stored on permanent storage, it can store
far more data than a dictionary, which is limited to the size of the
memory in the computer.</p>
<p></p>
<p>Like a dictionary, database software is designed to keep the
inserting and accessing of data very fast, even for large amounts of
data. Database software maintains its performance by building
<em>indexes</em> as data is added to the database to allow the computer
to jump quickly to a particular entry.</p>
<p>There are many different database systems which are used for a wide
variety of purposes including: Oracle, MySQL, Microsoft SQL Server,
PostgreSQL, and SQLite. We focus on SQLite in this book because it is a
very common database and is already built into Python. SQLite is
designed to be <em>embedded</em> into other applications to provide
database support within the application. For example, the Firefox
browser also uses the SQLite database internally as do many other
products.</p>
<p><a href="http://sqlite.org/" class="uri">http://sqlite.org/</a></p>
<p>SQLite is well suited to some of the data manipulation problems that
we see in Informatics.</p>
<h2 id="database-concepts">Database concepts</h2>
<p>When you first look at a database it looks like a spreadsheet with
multiple sheets. The primary data structures in a database are:
<em>tables</em>, <em>rows</em>, and <em>columns</em>.</p>
<figure>
<img src="../images/relational.svg" alt="Relational Databases" style="height: 2.0in;"/>
<figcaption>
Relational Databases
</figcaption>
</figure>
<p>In technical descriptions of relational databases the concepts of
table, row, and column are more formally referred to as
<em>relation</em>, <em>tuple</em>, and <em>attribute</em>, respectively.
We will use the less formal terms in this chapter.</p>
<h2 id="database-browser-for-sqlite">Database Browser for SQLite</h2>
<p>While this chapter will focus on using Python to work with data in
SQLite database files, many operations can be done more conveniently
using software called the <em>Database Browser for SQLite</em> which is
freely available from:</p>
<p><a href="http://sqlitebrowser.org/"
class="uri">http://sqlitebrowser.org/</a></p>
<p>Using the browser you can easily create tables, insert data, edit
data, or run simple SQL queries on the data in the database.</p>
<p>In a sense, the database browser is similar to a text editor when
working with text files. When you want to do one or very few operations
on a text file, you can just open it in a text editor and make the
changes you want. When you have many changes that you need to do to a
text file, often you will write a simple Python program. You will find
the same pattern when working with databases. You will do simple
operations in the database manager and more complex operations will be
most conveniently done in Python.</p>
<h2 id="creating-a-database-table">Creating a database table</h2>
<p>Databases require more defined structure than Python lists or
dictionaries<a href="#fn1" class="footnote-ref" id="fnref1"
role="doc-noteref"><sup>1</sup></a>.</p>
<p>When we create a database <em>table</em> we must tell the database in
advance the names of each of the <em>columns</em> in the table and the
type of data which we are planning to store in each <em>column</em>.
When the database software knows the type of data in each column, it can
choose the most efficient way to store and look up the data based on the
type of data.</p>
<p>You can look at the various data types supported by SQLite at the
following url:</p>
<p><a href="http://www.sqlite.org/datatypes.html"
class="uri">http://www.sqlite.org/datatypes.html</a></p>
<p>Defining structure for your data up front may seem inconvenient at
the beginning, but the payoff is fast access to your data even when the
database contains a large amount of data.</p>
<p>The code to create a database file and a table named
<code>Track</code> with two columns in the database is as follows:</p>
<p> </p>
<pre class="python"><code>import sqlite3
conn = sqlite3.connect('music.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Track')
cur.execute('CREATE TABLE Track (title TEXT, plays INTEGER)')
conn.close()
# Code: https://www.py4e.com/code3/db1.py</code></pre>
<p> </p>
<p>The <code>connect</code> operation makes a “connection” to the
database stored in the file <code>music.sqlite</code> in the current
directory. If the file does not exist, it will be created. The reason
this is called a “connection” is that sometimes the database is stored
on a separate “database server” from the server on which we are running
our application. In our simple examples the database will just be a
local file in the same directory as the Python code we are running.</p>
<p>A <em>cursor</em> is like a file handle that we can use to perform
operations on the data stored in the database. Calling
<code>cursor()</code> is very similar conceptually to calling
<code>open()</code> when dealing with text files.</p>
<figure>
<img src="../images/cursor.svg" alt="A Database Cursor" style="height: 2.0in;"/>
<figcaption>
A Database Cursor
</figcaption>
</figure>
<p>Once we have the cursor, we can begin to execute commands on the
contents of the database using the <code>execute()</code> method.</p>
<p>Database commands are expressed in a special language that has been
standardized across many different database vendors to allow us to learn
a single database language. The database language is called
<em>Structured Query Language</em> or <em>SQL</em> for short.</p>
<p><a href="http://en.wikipedia.org/wiki/SQL"
class="uri">http://en.wikipedia.org/wiki/SQL</a></p>
<p>In our example, we are executing two SQL commands in our database. As
a convention, we will show the SQL keywords in uppercase and the parts
of the command that we are adding (such as the table and column names)
will be shown in lowercase.</p>
<p>The first SQL command removes the <code>Track</code> table from the
database if it exists. This pattern is simply to allow us to run the
same program to create the <code>Track</code> table over and over again
without causing an error. Note that the <code>DROP TABLE</code> command
deletes the table and all of its contents from the database (i.e., there
is no “undo”).</p>
<pre class="python"><code>cur.execute('DROP TABLE IF EXISTS Track ')</code></pre>
<p>The second command creates a table named <code>Track</code> with a
text column named <code>title</code> and an integer column named
<code>plays</code>.</p>
<pre class="python"><code>cur.execute('CREATE TABLE Track (title TEXT, plays INTEGER)')</code></pre>
<p>Now that we have created a table named <code>Track</code>, we can put
some data into that table using the SQL <code>INSERT</code> operation.
Again, we begin by making a connection to the database and obtaining the
<code>cursor</code>. We can then execute SQL commands using the
cursor.</p>
<p>The SQL <code>INSERT</code> command indicates which table we are
using and then defines a new row by listing the fields we want to
include <code>(title, plays)</code> followed by the <code>VALUES</code>
we want placed in the new row. We specify the values as question marks
<code>(?, ?)</code> to indicate that the actual values are passed in as
a tuple <code>( 'My Way', 15 )</code> as the second parameter to the
<code>execute()</code> call.</p>
<pre class="python"><code>import sqlite3
conn = sqlite3.connect('music.sqlite')
cur = conn.cursor()
cur.execute('INSERT INTO Track (title, plays) VALUES (?, ?)',
('Thunderstruck', 20))
cur.execute('INSERT INTO Track (title, plays) VALUES (?, ?)',
('My Way', 15))
conn.commit()
print('Track:')
cur.execute('SELECT title, plays FROM Track')
for row in cur:
print(row)
cur.execute('DELETE FROM Track WHERE plays < 100')
conn.commit()
cur.close()
# Code: https://www.py4e.com/code3/db2.py</code></pre>
<p>First we <code>INSERT</code> two rows into our table and use
<code>commit()</code> to force the data to be written to the database
file.</p>
<figure>
<img src="../images/tracks.svg" alt="Rows in a Table" style="height: 1.5in;"/>
<figcaption>
Rows in a Table
</figcaption>
</figure>
<p>Then we use the <code>SELECT</code> command to retrieve the rows we
just inserted from the table. On the <code>SELECT</code> command, we
indicate which columns we would like <code>(title, plays)</code> and
indicate which table we want to retrieve the data from. After we execute
the <code>SELECT</code> statement, the cursor is something we can loop
through in a <code>for</code> statement. For efficiency, the cursor does
not read all of the data from the database when we execute the
<code>SELECT</code> statement. Instead, the data is read on demand as we
loop through the rows in the <code>for</code> statement.</p>
<p>The output of the program is as follows:</p>
<pre><code>Track:
('Thunderstruck', 20)
('My Way', 15)</code></pre>
<p></p>
<p>Our <code>for</code> loop finds two rows, and each row is a Python
tuple with the first value as the <code>title</code> and the second
value as the number of <code>plays</code>.</p>
<p>At the very end of the program, we execute an SQL command to
<code>DELETE</code> the rows we have just created so we can run the
program over and over. The <code>DELETE</code> command shows the use of
a <code>WHERE</code> clause that allows us to express a selection
criterion so that we can ask the database to apply the command to only
the rows that match the criterion. In this example the criterion happens
to apply to all the rows so we empty the table out so we can run the
program repeatedly. After the <code>DELETE</code> is performed, we also
call <code>commit()</code> to force the data to be removed from the
database.</p>
<h2 id="structured-query-language-summary">Structured Query Language
summary</h2>
<p> So far, we have been using the Structured Query Language in our
Python examples and have covered many of the basics of the SQL commands.
In this section, we look at the SQL language in particular and give an
overview of SQL syntax.</p>
<p>Since there are so many different database vendors, the Structured
Query Language (SQL) was standardized so we could communicate in a
portable manner to database systems from multiple vendors.</p>
<p>A relational database is made up of tables, rows, and columns. The
columns generally have a type such as text, numeric, or date data. When
we create a table, we indicate the names and types of the columns:</p>
<pre class="sql"><code>CREATE TABLE Track (title TEXT, plays INTEGER)</code></pre>
<p>To insert a row into a table, we use the SQL <code>INSERT</code>
command:</p>
<pre class="sql"><code>INSERT INTO Track (title, plays) VALUES ('My Way', 15)</code></pre>
<p>The <code>INSERT</code> statement specifies the table name, then a
list of the fields/columns that you would like to set in the new row,
and then the keyword <code>VALUES</code> and a list of corresponding
values for each of the fields.</p>
<p>The SQL <code>SELECT</code> command is used to retrieve rows and
columns from a database. The <code>SELECT</code> statement lets you
specify which columns you would like to retrieve as well as a
<code>WHERE</code> clause to select which rows you would like to see. It
also allows an optional <code>ORDER BY</code> clause to control the
sorting of the returned rows.</p>
<pre class="sql"><code>SELECT * FROM Track WHERE title = 'My Way'</code></pre>
<p>Using <code>*</code> indicates that you want the database to return
all of the columns for each row that matches the <code>WHERE</code>
clause.</p>
<p>Note, unlike in Python, in a SQL <code>WHERE</code> clause we use a
single equal sign to indicate a test for equality rather than a double
equal sign. Other logical operations allowed in a <code>WHERE</code>
clause include <code><</code>, <code>></code>, <code><=</code>,
<code>>=</code>, <code>!=</code>, as well as <code>AND</code> and
<code>OR</code> and parentheses to build your logical expressions.</p>
<p>You can request that the returned rows be sorted by one of the fields
as follows:</p>
<pre class="sql"><code>SELECT title,plays FROM Track ORDER BY title</code></pre>
<p>It is possible to <code>UPDATE</code> a column or columns within one
or more rows in a table using the SQL <code>UPDATE</code> statement as
follows:</p>
<pre class="sql"><code>UPDATE Track SET plays = 16 WHERE title = 'My Way'</code></pre>
<p>The <code>UPDATE</code> statement specifies a table and then a list
of fields and values to change after the <code>SET</code> keyword and
then an optional <code>WHERE</code> clause to select the rows that are
to be updated. A single <code>UPDATE</code> statement will change all of
the rows that match the <code>WHERE</code> clause. If a
<code>WHERE</code> clause is not specified, it performs the
<code>UPDATE</code> on all of the rows in the table.</p>
<p>To remove a row, you need a <code>WHERE</code> clause on an SQL
<code>DELETE</code> statement. The <code>WHERE</code> clause determines
which rows are to be deleted:</p>
<pre class="sql"><code>DELETE FROM Track WHERE title = 'My Way'</code></pre>
<p> These four basic SQL commands (INSERT, SELECT, UPDATE, and DELETE)
allow the four basic operations needed to create and maintain data. We
use “CRUD” (Create, Read, Update, and Delete) to capture all these
concepts in a single term.<a href="#fn2" class="footnote-ref"
id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<h2 id="multiple-tables-and-basic-data-modeling">Multiple tables and
basic data modeling</h2>
<p> The real power of a relational database is when we create multiple
tables and make links between those tables. The act of deciding how to
break up your application data into multiple tables and establishing the
relationships between the tables is called <em>data modeling</em>. The
design document that shows the tables and their relationships is called
a <em>data model</em>.</p>
<p>Data modeling is a relatively sophisticated skill and we will only
introduce the most basic concepts of relational data modeling in this
section. For more detail on data modeling you can start with:</p>
<p><a href="http://en.wikipedia.org/wiki/Relational_model"
class="uri">http://en.wikipedia.org/wiki/Relational_model</a></p>
<p> Lets say for our tracks database we wanted to track the name of the
<code>artist</code> for each track in addition to the <code>title</code>
and number of plays for each track. A simple approach might be to simply
add another column to the database called <code>artist</code> and put
the name of the artist in the column as follows:</p>
<pre class="python"><code>DROP TABLE IF EXISTS Track;
CREATE TABLE Track (title TEXT, plays INTEGER, artist TEXT);</code></pre>
<p>Then we could insert a few tracks into our table.</p>
<pre class="sql"><code>INSERT INTO Track (title, plays, artist)
VALUES ('My Way', 15, 'Frank Sinatra');
INSERT INTO Track (title, plays, artist)
VALUES ('New York', 25, 'Frank Sinatra');</code></pre>
<p>If we were to look at our data with a
<code>SELECT * FROM Track</code> statement, it looks like we have done a
fine job.</p>
<pre><code>sqlite> SELECT * FROM Track;
My Way|15|Frank Sinatra
New York|25|Frank Sinatra
sqlite></code></pre>
<p>We have made a <em>very bad error</em> in our data modeling. We have
violated the rules of <em>database normalization</em>.</p>
<p><a href="https://en.wikipedia.org/wiki/Database_normalization"
class="uri">https://en.wikipedia.org/wiki/Database_normalization</a></p>
<p> While database normalization seems very complex on the surface and
contains a lot of mathematical justifications, for now we can reduce it
all into one simple rule that we will follow.</p>
<p> We should never put the same string data in a column more than once.
If we need the data more than once, we create a numeric <em>key</em> for
the data and reference the actual data using this key. Especially if the
multiple entries refer to the same object.</p>
<p>To demonstrate the slippery slope we are going down by assigning
string columns to out database model, think about how we would change
the data model if we wanted to keep track of the eye color of our
artists? Would we do this?</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Track;
CREATE TABLE Track (title TEXT, plays INTEGER,
artist TEXT, eyes TEXT);
INSERT INTO Track (title, plays, artist, eyes)
VALUES ('My Way', 15, 'Frank Sinatra', 'Blue');
INSERT INTO Track (title, plays, artist, eyes)
VALUES ('New York', 25, 'Frank Sinatra', 'Blue');</code></pre>
<p>Since Frank Sinatra recorded over 1200 songs, are we really going to
put the string ‘Blue’ in 1200 rows in our <code>Track</code> table. And
what would happen if we decided his eye color was ‘Light Blue’?
Something just does not feel right.</p>
<p>The correct solution is to create a table for the each
<code>Artist</code> and store all the data about the artist in that
table. And then somehow we need to make a connection between a row in
the <code>Track</code> table to a row in the <code>Artist</code> table.
Perhaps we could call this “link” between two “tables” a “relationship”
between two tables. And that is exactly what database experts decided to
all these links.</p>
<p>Lets make an <code>Artist</code> table as follows:</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Artist;
CREATE TABLE Artist (name TEXT, eyes TEXT);
INSERT INTO Artist (name, eyes)
VALUES ('Frank Sinatra', 'blue');</code></pre>
<p> Now we have two tables but we need a way to <em>link</em> rows in
the two tables. To do this, we need why we call ‘keys’. These keys will
just be integer numbers that we can use to lookup a row in different
table. If we are going to make links to rows inside of a table, we need
to add a <em>primary key</em> to the rows in the table. By convention we
usually name the primary key column ‘id’. So our <code>Artist</code>
table looks as follows:</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Artist;
CREATE TABLE Artist (id INTEGER, name TEXT, eyes TEXT);
INSERT INTO Artist (id, name, eyes)
VALUES (42, 'Frank Sinatra', 'blue');</code></pre>
<p>Now we have a row in the table for ‘Frank Sinatra’ (and his eye
color) and a primary key of ‘42’ to use to link our tracks to him. So we
alter our Track table as follows:</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Track;
CREATE TABLE Track (title TEXT, plays INTEGER,
artist_id INTEGER);
INSERT INTO Track (title, plays, artist_id)
VALUES ('My Way', 15, 42);
INSERT INTO Track (title, plays, artist_id)
VALUES ('New York', 25, 42);</code></pre>
<p> The <code>artist_id</code> column is an integer, and by naming
convention is a <em>foreign key</em> pointing at a <em>primary</em> key
in the <code>Artist</code> table. We call it a foreign key because it is
pointing to a row in a different table.</p>
<p> Now we are following the rules of database normalization, but when
we want to get data out of our database, we don’t want to see the 42, we
want to see the name and eye color of the artist. To do this we use the
<code>JOIN</code> keyword in our SELECT statement.</p>
<pre class="sql"><code>SELECT title, plays, name, eyes
FROM Track JOIN Artist
ON Track.artist_id = Artist.id;</code></pre>
<p>The <code>JOIN</code> clause includes an <code>ON</code> condition
that defines how the rows are to to be connected. For each row in
<code>Track</code> add the data from <code>Artist</code> from the row
where <code>artist_id</code> <code>Track</code> table matches the
<code>id</code> from the <code>Artist</code> table.</p>
<p>The output would be:</p>
<pre><code>My Way|15|Frank Sinatra|blue
New York|25|Frank Sinatra|blue</code></pre>
<p>While it might seem a little clunky and your instincts might tell you
that it would be faster just to keep the data in one table, it turns out
the the limit on database performance is how much data needs to be
scanned when retrieving a query. While they details are very complex,
integers are a lot smaller than strings (especially Unicode) and far
quicker to to move and compare.</p>
<h2 id="data-model-diagrams">Data model diagrams</h2>
<p> While our <code>Track</code> and <code>Artist</code> database design
is simple with just two tables and a single one-to-many relationship,
these data models can get complicated quickly and are easier to
understand if we can make a graphical representation of our data
model.</p>
<figure>
<img src="../images/one-to-many-verbose.png" alt="A Verbose One-to-Many Data Model\label{figvrbo2m}" style="height: 1.5in;"/>
<figcaption>
A Verbose One-to-Many Data Model
</figcaption>
</figure>
<p> While there are many graphical representations of data models, we
will use one of the “classic” appraches, called “Crow’s Foot Diagrams”
as shown in Figure . Each table is shown as a box with the name of the
table and its columns. Then where there is a relationship between two
tables a line is drawn connecting the tables with a notation added to
the end of each line indicating the nature of the relationship.</p>
<p><a href="https://en.wikipedia.org/wiki/Entity-relationship_model"
class="uri">https://en.wikipedia.org/wiki/Entity-relationship_model</a></p>
<p>In this case, “many” tracks can be associated with each artist. So
the track end is shown with the crow’s foot spread out indicating it is
the” “many” end. The artist end is shown with a vertical like that
indicates “one”. There will be “many” artists in general, but the
important aspect is that for each artist there will be many tracks. And
each of those artists may be associated with multiple tracks.</p>
<p> You will note that the column that holds the <em>foreign_key</em>
like <code>artist_id</code> is on the “many” end and the <em>primary
key</em> is at the “one” end.</p>
<p>Since the pattern of foreign and primary key placement is so
consistent and follows the “many” and “one” ends of the lines, we never
include either the primary or foreign key columns in our diagram of the
data model as shown in the second diagram as shown in Figure . The
columns are thought of as “implementation detail” to capture the nature
of the relationship details and not an essential part of the data being
modeled.</p>
<figure>
<img src="../images/one-to-many.png" alt="A Succinct One-to-Many Data Model\label{figo2m}" style="height: 1.5in;"/>
<figcaption>
A Succinct One-to-Many Data Model
</figcaption>
</figure>
<h2 id="automatically-creating-primary-keys">Automatically creating
primary keys</h2>
<p> In the above example, we arbitrarily assigned Frank the primary key
of 42. However when we are inserting millions or rows, it is nice to
have the database automatically generate the values for the id column.
We do this by declaring the <code>id</code> column as a
<code>PRIMARY KEY</code> and leave out the <code>id</code> value when
inserting the row:</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Artist;
CREATE TABLE Artist (id INTEGER PRIMARY KEY,
name TEXT, eyes TEXT);
INSERT INTO Artist (name, eyes)
VALUES ('Frank Sinatra', 'blue');</code></pre>
<p>Now we have instructed the database to auto-assign us a unique value
to the Frank Sinatra row. But we then need a way to have the database
tell us the <code>id</code> value for the recently inserted row. One way
is to use a <code>SELECT</code> statement to retrieve data from an
SQLite built-in-fuction called <code>last_insert_rowid()</code>.</p>
<pre><code>sqlite> DROP TABLE IF EXISTS Artist;
sqlite> CREATE TABLE Artist (id INTEGER PRIMARY KEY,
...> name TEXT, eyes TEXT);
sqlite> INSERT INTO Artist (name, eyes)
...> VALUES ('Frank Sinatra', 'blue');
sqlite> select last_insert_rowid();
1
sqlite> SELECT * FROM Artist;
1|Frank Sinatra|blue
sqlite></code></pre>
<p>Once we know the <code>id</code> of our ‘Frank Sinatra’ row, we can
use it when we <code>INSERT</code> the tracks into the
<code>Track</code> table. As a general strategy, we add these
<code>id</code> columns to any table we create:</p>
<pre><code>sqlite> DROP TABLE IF EXISTS Track;
sqlite> CREATE TABLE Track (id INTEGER PRIMARY KEY,
...> title TEXT, plays INTEGER, artist_id INTEGER);</code></pre>
<p>Note that the <code>artist_id</code> value is the new auto-assigned
row in the <code>Artist</code> table and that while we added an
<code>INTEGER PRIMARY KEY</code> to the the <code>Track</code> table, we
did not include <code>id</code> in the list of fields on the
<code>INSERT</code> statements into the <code>Track</code> table. Again
this tells the database to choose a unique value for us for the
<code>id</code> column.</p>
<pre><code>sqlite> INSERT INTO Track (title, plays, artist_id)
...> VALUES ('My Way', 15, 1);
sqlite> select last_insert_rowid();
1
sqlite> INSERT INTO Track (title, plays, artist_id)
...> VALUES ('New York', 25, 1);
sqlite> select last_insert_rowid();
2
sqlite></code></pre>
<p> You can call <code>SELECT last_insert_rowid();</code> after each of
the inserts to retrieve the value that the database assigned to the
<code>id</code> of each newly created row. Later when we are coding in
Python, we can ask for the <code>id</code> value in our code and store
it in a variable for later use.</p>
<h2 id="logical-keys-for-fast-lookup">Logical keys for fast lookup</h2>
<p> If we had a table full of artists and a table full of tracks, each
with a foreign key link to a row in a table full of artists and we
wanted to list all the tracks that were sung by ‘Frank Sinatra’ as
follows:</p>
<pre><code>SELECT title, plays, name, eyes
FROM Track JOIN Artist
ON Track.artist_id = Artist.id
WHERE Artist.name = 'Frank Sinatra';</code></pre>
<p>Since we have two tables and a foreign key between the two tables,
our data is well-modeled, but if we are going to have millions of
records in the <code>Artist</code> table and going to do a lot of
lookups by artist name, we would benefit if we gave the database a hint
about our intended use of the <code>name</code> column.</p>
<p> We do this by adding an “index” to a text column that we intend to
use in <code>WHERE</code> clauses:</p>
<pre><code>CREATE INDEX artist_name ON Artist(name);</code></pre>
<p>When the database has been told that an index is needed on a column
in a table, it stores extra information to make it possible to look up a
row more quickly using the indexed field (<code>name</code> in this
example). Once you request that an index be created, there is nothing
special that is needed in the SQL to access the table. The database
keeps the index up to date as data is inserted, deleted, and updated,
and uses it automatically if it will increase the performance of a
database query.</p>
<p>These text columns that are used to find rows based on some
information in the “real world” like the name of an artist are called
<em>Logical keys</em>.</p>
<h2 id="adding-constraints-to-the-data-database">Adding constraints to
the data database</h2>
<p> We can also use an index to enforce a constraint (i.e. rules) on our
database operations. The most common constraint is a <em>uniqueness
constraint</em> which insists that all of the values in a column are
unique. We can add the optional <code>UNIQUE</code> keyword, to the
<code>CREATE INDEX</code> statement to tell the database that we would
like it to enforce the constraint on our SQL. We can drop and re-create
the <code>artist_name</code> index with a <code>UNIQUE</code> constraint
as follows.</p>
<pre><code>DROP INDEX artist_name;
CREATE UNIQUE INDEX artist_name ON Artist(name);</code></pre>
<p>If we try to insert ‘Frank Sinatra’ a second time, it will fail with
an error.</p>
<pre><code>sqlite> SELECT * FROM Artist;
1|Frank Sinatra|blue
sqlite> INSERT INTO Artist (name, eyes)
...> VALUES ('Frank Sinatra', 'blue');
Runtime error: UNIQUE constraint failed: Artist.name (19)
sqlite></code></pre>
<p> We can tell the database to ignore any duplicate key errors by
adding the <code>IGNORE</code> keyword to the <code>INSERT</code>
statement as follows:</p>
<pre><code>sqlite> INSERT OR IGNORE INTO Artist (name, eyes)
...> VALUES ('Frank Sinatra', 'blue');
sqlite> SELECT id FROM Artist WHERE name='Frank Sinatra';
1
sqlite></code></pre>
<p>By combining an <code>INSERT OR IGNORE</code> and a
<code>SELECT</code> we can insert a new record if the name is not
already there and whether or not the record is already there, retrieve
the <em>primary</em> key of the record.</p>
<pre><code>sqlite> INSERT OR IGNORE INTO Artist (name, eyes)
...> VALUES ('Elvis', 'blue');
sqlite> SELECT id FROM Artist WHERE name='Elvis';
2
sqlite> SELECT * FROM Artist;
1|Frank Sinatra|blue
2|Elvis|blue
sqlite></code></pre>
<p>Since we have not added a uniqueness constraint to the eye color
column, there is no problem having multiple ‘Blue’ values in the
<code>eye</code> column.</p>
<figure>
<img src="../images/tracks-albums-artists.png" alt="Tracks, Albums, and Artists\label{figtaa}" style="height: 1.5in;"/>
<figcaption>
Tracks, Albums, and Artists
</figcaption>
</figure>
<h2 id="sample-multi-table-application">Sample multi-table
application</h2>
<p>A sample application called <code>tracks_csv.py</code> shows how
these ideas can be combined to parse textual data and load it into
several tables using a proper data model with relational connections
between the tables.</p>
<p>This application reads and parses a comma-separated file
<code>tracks.csv</code> based on an export from Dr. Chuck’s iTunes
library.</p>
<pre><code>Another One Bites The Dust,Queen,Greatest Hits,55,100,217103
Asche Zu Asche,Rammstein,Herzeleid,79,100,231810
Beauty School Dropout,Various,Grease,48,100,239960
Black Dog,Led Zeppelin,IV,109,100,296620
...</code></pre>
<p>The columns in this file are: title, artist, album, number of plays,
rating (0-100) and length in milliseconds.</p>
<p>Our data model is shown in Figure and described in SQL as
follows:</p>
<pre class="sql"><code>DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS Album;
DROP TABLE IF EXISTS Track;
CREATE TABLE Artist (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE Album (
id INTEGER PRIMARY KEY,
artist_id INTEGER,
title TEXT UNIQUE
);
CREATE TABLE Track (
id INTEGER PRIMARY KEY,
title TEXT UNIQUE,
album_id INTEGER,
len INTEGER, rating INTEGER, count INTEGER
);</code></pre>
<p>We are adding the <code>UNIQUE</code> keyword to <code>TEXT</code>
columns that we would like to have a uniqueness constraint that we will
use in <code>INSERT IGNORE</code> statements. This is more succinct that
separate <code>CREATE INDEX</code> statements but has the same
effect.</p>
<p>With these tables in place, we write the following code
<code>tracks_csv.py</code> to parse the data and insert it into the
tables:</p>
<pre class="python"><code>import sqlite3
conn = sqlite3.connect('trackdb.sqlite')
cur = conn.cursor()
handle = open('tracks.csv')
for line in handle:
line = line.strip();
pieces = line.split(',')
if len(pieces) != 6 : continue
name = pieces[0]
artist = pieces[1]
album = pieces[2]
count = pieces[3]
rating = pieces[4]
length = pieces[5]
print(name, artist, album, count, rating, length)
cur.execute('''INSERT OR IGNORE INTO Artist (name)
VALUES ( ? )''', ( artist, ) )
cur.execute('SELECT id FROM Artist WHERE name = ? ', (artist, ))
artist_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO Album (title, artist_id)
VALUES ( ?, ? )''', ( album, artist_id ) )
cur.execute('SELECT id FROM Album WHERE title = ? ', (album, ))
album_id = cur.fetchone()[0]
cur.execute('''INSERT OR REPLACE INTO Track
(title, album_id, len, rating, count)
VALUES ( ?, ?, ?, ?, ? )''',
( name, album_id, length, rating, count ) )
conn.commit()</code></pre>
<p>You can see that we are repeating the pattern of
<code>INSERT OR IGNORE</code> followed by a <code>SELECT</code> to get
the appropriate <code>artist_id</code> and <code>album_id</code> for use
in later <code>INSERT</code> statements. We start from
<code>Artist</code> because we need <code>artist_id</code> to insert the
<code>Album</code> and need the <code>album_id</code> to insert the
<code>Track</code>.</p>
<p> If we look at the <code>Album</code> table, we can see that the
entries were added and assigned a <em>primary</em> key as necessary as
the data was parsed. We can also see the <em>foreign key</em> pointing
to a row in the <code>Artist</code> table for each <code>Album</code>
row.</p>
<pre><code>sqlite> .mode column
sqlite> SELECT * FROM Album LIMIT 5;
id artist_id title
-- --------- -----------------
1 1 Greatest Hits
2 2 Herzeleid
3 3 Grease
4 4 IV
5 5 The Wall [Disc 2]</code></pre>
<p> We can reconstruct all of the <code>Track</code> data, following all
the relations using <code>JOIN / ON</code> clauses. You can see both
ends of each of the (2) relational connections in each row in the output
below:</p>
<pre><code>sqlite> .mode line
sqlite> SELECT * FROM Track
...> JOIN Album ON Track.album_id = Album.id
...> JOIN Artist ON Album.artist_id = Artist.id
...> LIMIT 2;
id = 1
title = Another One Bites The Dust
album_id = 1
len = 217103
rating = 100
count = 55
id = 1
artist_id = 1
title = Greatest Hits
id = 1
name = Queen
id = 2
title = Asche Zu Asche
album_id = 2
len = 231810
rating = 100
count = 79
id = 2
artist_id = 2
title = Herzeleid
id = 2
name = Rammstein</code></pre>
<p>This example shows three tables and two <em>one-to-many</em>
relationships between the tables. It also shows how to use indexes and
uniqueness constraints to programmatically construct the tables and
their relationships.</p>
<p><a href="https://en.wikipedia.org/wiki/One-to-many_(data_model)"
class="uri">https://en.wikipedia.org/wiki/One-to-many_(data_model)</a></p>
<p>Up next we will look at the many-to-many relationships in data
models.</p>
<h2 id="many-to-many-relationships-in-databases">Many to many
relationships in databases</h2>
<p> Some data relationships cannot be modeled by a simple one-to-many
relationship. For example, lets say we are going to build a data model
for a course management system. There will be courses, users, and
rosters. A user can be on the roster for many courses and a course will
have many users on its roster.</p>
<p>It is pretty simple to <em>draw</em> a many-to-many relationship as
shown in Figure . We simply draw two tables and connect them with a line
that has the “many” indicator on both ends of the lines. The problem is
how to <em>implement</em> the raltionship using primary keys and foreign
keys.</p>
<p>Before we explore how we implement many-to-many relationships, lets
see if we could hack something up by extending a one-to many
relationship.</p>
<figure>
<img src="../images/many-to-many.png" alt="A Many to Many Relationship\label{figm2m}" style="height: 1.5in;"/>
<figcaption>
A Many to Many Relationship
</figcaption>
</figure>
<p>If SQL supported the notion of arrays, we might try to define
this:</p>
<pre class="sql"><code>CREATE TABLE Course (
id INTEGER PRIMARY KEY,
title TEXT UNIQUE
student_ids ARRAY OF INTEGER;
);</code></pre>
<p>Sadly, while this is a tempting idea, SQL does not support arrays.<a
href="#fn3" class="footnote-ref" id="fnref3"
role="doc-noteref"><sup>3</sup></a></p>
<p>Or we could just make long string and concatenate all the
<code>User</code> primary keys into a long string separated by
commas.</p>
<pre class="sql"><code>CREATE TABLE Course (
id INTEGER PRIMARY KEY,
title TEXT UNIQUE
student_ids ARRAY OF INTEGER;
);
INSERT INTO Course (title, student_ids)
VALUES( 'si311', '1,3,4,5,6,9,14');</code></pre>
<p>This would be very inefficient because as the course roster grows in
size and the number of courses increases it becomes quite expensive to
figure out which courses have student 14 on their roster.</p>
<figure>
<img src="../images/many-to-many-verbose.png" alt="A Many to Many Connector Table\label{figm2mvrb}" style="height: 1.5in;"/>
<figcaption>
A Many to Many Connector Table
</figcaption>
</figure>
<p> Instead of either of these approaches, we model a many-to-many
relationship using an additional table that we call a “junction table”,
“through table”, “connector table”, or “join table” as shown in Figure .
The purpose of this table is to capture the <em>connection</em> between
<em>a</em> course and <em>a</em> student.</p>
<p>In a sense the table sits between the <code>Course</code> and
<code>User</code> table and has a one-to-many relationship to both
tables. By using an intermediate table we break a many-to-many
relationship into two one-to-many relationships. Databases are very good
at modeling and processing one-to-many relationships.</p>
<p>An example <code>Member</code> table would be as follows:</p>
<pre class="sql"><code>CREATE TABLE User (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE Course (
id INTEGER PRIMARY KEY,
title TEXT UNIQUE
);
CREATE TABLE Member (
user_id INTEGER,
course_id INTEGER,
PRIMARY KEY (user_id, course_id)
);</code></pre>
<p>Following our naming convention, <code>Member.user_id</code> and
<code>Member.course_id</code> are foreign keys pointing at the
corresponding rows in the <code>User</code> and <code>Course</code>
tables. Each entry in the member table links a row in the
<code>User</code> table to a row in the <code>Course</code> table by
going <em>through</em> the <code>Member</code> table.</p>
<p> We indicate that the <em>combination</em> of <code>course_id</code>
and <code>user_id</code> is the <code>PRIMARY KEY</code> for the
<code>Member</code> table, also creating an uniqueness constraint for a
<code>course_id</code> / <code>user_id</code> combination.</p>
<p>Now lets say we need to insert a number of students into the rosters
of a number of courses. Lets assume the data comes to us in a
JSON-formatted file with records like this:</p>
<pre><code>[
[ "Charley", "si110"],
[ "Mea", "si110"],
[ "Hattie", "si110"],
[ "Keziah", "si110"],
[ "Rosa", "si106"],
[ "Mea", "si106"],
[ "Mairin", "si106"],
[ "Zendel", "si106"],
[ "Honie", "si106"],
[ "Rosa", "si106"],
...
]</code></pre>
<p> We could write code as follows to read the JSON file and insert the
members of each course roster into the database using the following
code:</p>
<pre class="python"><code>import json
import sqlite3
conn = sqlite3.connect('rosterdb.sqlite')
cur = conn.cursor()
str_data = open('roster_data_sample.json').read()
json_data = json.loads(str_data)
for entry in json_data:
name = entry[0]
title = entry[1]
print((name, title))
cur.execute('''INSERT OR IGNORE INTO User (name)
VALUES ( ? )''', ( name, ) )
cur.execute('SELECT id FROM User WHERE name = ? ', (name, ))
user_id = cur.fetchone()[0]
cur.execute('''INSERT OR IGNORE INTO Course (title)
VALUES ( ? )''', ( title, ) )
cur.execute('SELECT id FROM Course WHERE title = ? ', (title, ))
course_id = cur.fetchone()[0]