Skip to content

LinuxAtDuke/intro-to-databases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 

Repository files navigation

Introduction to databases

Version 4, 2021-01-27

https://github.com/LinuxAtDuke/intro-to-databases

Instructor

Andy Ingham (andy.ingham AT duke.edu)

Table of Contents

  1. Unit 1: Definitions, uses, and kinds
  2. Unit 2: Schema, data dictionary
  3. Unit 3: Architectures
  4. Unit 4: Data integrity, security, confidentiality
  5. Unit 5: Primer for the Intro to MySQL class (or for practical playing)

Unit 1: Definitions, uses, and kinds

  • What are data?

  • What is a "database"? What is a DBMS (database management system)?

  • Databases are primarily needed to structure and organize data so that:

    • associations can be revealed (e.g., X is related to Y via shared attribute Z),
    • patterns can be exposed (e.g., X is more prevalent than Y by magnitude Z),
    • questions can be answered (e.g., X is the type of Y that exhibits Z),
    • a repository of information can be stored (by retaining associated data elements)
  • Why aren't databases spreadsheets? Because, GENERALLY SPEAKING, databases ...

    • are more powerful (can organize data in more complex ways) and, therefore, can provide better searching/discovery options
    • provide better normalization (if designed well) and, therefore, are more performant as size increases ("scalable")
    • provide more complex integrity checks (rules and constraints about data elements and associations)
    • are more flexible (in how they can be configured)
    • provide more granular options for things like access control and, therefore, better suited to shared usage by disparate groups
    • provide more and better options for external integrations
    • are more focused on searching/storing than tabulating/displaying
  • Kinds of databases ... there are quite a few (https://en.wikipedia.org/wiki/Database#Models ), but relational is probably still king

    • Relational (https://medium.com/@pocztarski/what-if-i-told-you-there-are-no-tables-in-relational-databases-13d31a2f9677 )

      • PROS: highly normalizable (avoids data duplication); stable; mature; consistency after each transaction
      • CONS: rigid
    • Object / document / NoSQL

      • PROS: faster for "big" or distributed data; flexible
      • CONS: less standardization for how to interact with data; requires high developer discipline; reasonable use cases are limited
    • GraphDBs

      • PROS: powerful for specialized use cases (highly interconnected data)
      • CONS: highly complex for both users and developers
    • Key-value stores:

      • PROS: simple
      • CONS: simple

Unit 2: Schema, data dictionary

Unit 3: Architectures

  • suitability (or, So which database to choose? (SQLite vs Postgres vs MySQL vs ORACLE vs MSSQL ...))
    • functionality
      • store and retrieve; CRUD (Create, Read, Update, Delete)
      • partitioning, replication, MVCC (Multi-Version Concurrency Control)
      • stored procedures, triggers, views
      • advanced data types + searching (e.g., XPath)
      • "Hot" backups, point-in-time recovery
    • security
      • how secure are the defaults? (accounts, passwords, settings, allowed connections)
      • how straightforward is (or how comfortable are you with) the DB administration?
      • are patches regularly released? (remembering that updates are only useful if ACTUALLY APPLIED in a timely fashion)
    • usability
      • CLI or GUI
      • documentation / community-support versus paid-support
    • integration
      • base OS may dictate options for add-ons or helper apps
      • middleware / ODBC (Open Database Connectivity)
      • reporting tools
    • cost
      • up-front (hardware, licensing, life-cycle capital costs)
      • pay-as-you-go (cloud options)
    • hosting / future development (bug fixes+enhancements+security)
    • scalability (DBMS as well as underlying environments)
      • storage needs
      • transaction rates

Unit 4: Data integrity, security, confidentiality

  • integrity: ACID (Atomicity, Consistency, Isolation, and Durability) (https://en.wikipedia.org/wiki/ACID)
    • Important for some obvious reasons:
      • scientific validity
      • potential basis for future work or discoveries
      • provide confidence in information
  • (information) security: CIA (Confidentiality, Integrity, Availability) (https://www.tylercybersecurity.com/blog/fundamental-objectives-of-information-security-the-cia-triad)
    • Important for some obvious reasons:
      • protect intellectual property
      • maintain confidentiality of data (which can help protect privacy)
      • guard against unauthorized updates or deletions (to protect data integrity)
      • provide on-going access for authorized people

Unit 5: Primer for the Intro to MySQL class (or for practical playing)

AKA: Creating a personal Linux VM (Lab 0 from the Intro to MySQL class)

  1. Using a web browser, go to https://vcm.duke.edu/
  2. Login using your Duke NetId.
  3. Select "Reserve a VM" (near the middle of the page)
  4. On the next screen, select the "Lamp Stack" link from the list under "Linux Apps"
  5. If you encounter a pop-up window about SSH keys (which displays if you do not have them set up for your netID), you may need to select the less secure option until you've done that step (which is outside the scope of this class).
  6. After agreeing to the Terms of Use, the VCM web page will display the name of your VM along with available usernames. You must first connect to the University VPN (if not "on campus"), THEN initiate an ssh session as the Admin User (vcm) -- do this via the "Terminal" app on your Mac or via "PuTTY" (available at https://www.chiark.greenend.org.uk/~sgtatham/putty/ ) on your Windows machine.

Example (after establishing a University VPN session, if off campus): ssh vcm@vcm-1473.vm.duke.edu [Answering "yes" to "Are you sure you want to continue connecting (yes/no)?" and then entering the password behind "View Password" when prompted]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published