Skip to content

caohy1988/hw2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Homework 2

  • Assigned: 2/18
  • Due: 3/1 at 8:40am
  • worth 3.75% of your grade

We want to emphasize that we have picked this dataset for educational purposes and are not encouraging you to drink.

1. Analyze some data

(2 points each, 10 points total)

You will be using the same subset of the Iowa Liquor Dataset from hw0 and writing some Python programs to perform data analysis to understand transaction and inventory information.

You will write your code in hw2.py found in this same directory.
This file contains predefined methods such as def q1() with dummy method bodies that you will fill in to answer the corresponding questions below. We have also included a load_data() function so you can pass in the file path to the dataset and it will read the file and return a python list. Try running it and printing the first couple of list elements to see what the method returns.

Run the program with:

    python hw2.py (path to iowa-liquor-sample.csv)

Questions To Answer

  • Q1: How many distinct types of items (by description attribute) are in this dataset?

    • q1() should return a single number.
  • Q2: How many distinct vendors (by exact string comparison) are in this dataset?

    • q2() should return a single number.
  • Q3: Which store had the most sales (in terms of total bottle_qty)?

    • q3() should return a single number of the store's id.
  • Q4: At the store with the most sales (answer to Q3), what was the most sold item by description (the item that had the highest total bottle_qty)?

    • q4() should return a single string that corresponds to the description attribute's value.
  • Q5: Find the zipcode, with the greatest total bottle_qty for category_name TEQUILA.

    • q5() should return a single number that corresponds to the zipcode attribute’s value.

2. Relational Algebra

(2 points each, 4 points total)

    liquors(<u>lid</u>, name, price, manufacturer)
    sales(<u>month</u>, <u>seller</u>, <u>liquors</u>, county, quantity)

Given the simplified schema above (primary keys are surrounded by <u> </u> tags), construct relational algebra for the following queries:

  • Q1: Find the names of liquors that had at least one sale in both "Polk" and "Linn" counties for the month of December.

  • Q2: Find the names of manufacturers that sold at least two different liquors during any January in "Polk" county.

3. More Relational Algebra

(1 point each, 6 points total)

Given the following tables

T1

A B C
1 x a
2 y b
3 z b

T2

A B D
1 x c
3 y a
3 x a

Express the results for the relational algebra expressions:

  1. πB,D(T2)
  2. T2 × πA(T2)
  3. T1 ⨝T1.C=T2.D T2
  4. T1 − (T1 − T2)
  5. T1 / πB(T2)
  6. T1 ⨝ (σD=x(T2))

Submission

Submit through courseworks

  1. Submit hw2.py for part 1
  2. Submit a text file/pdf for part 2 and part 3

Include your name and UNI at the top of each page.

Hints

  • What does the data argument to the question functions contain? In q1(), try adding print data[0]; print data[1]
  • Test your code on a small sample of data, so you can verify that it works before running it on the entire file.
  • Importing the csv file into Google Sheets or Excel might help visually explore the data.
  • Helper functions can make your code easier to understand. E.g to help refer to the columns by name instead of by position.
  • Python dictionaries are very very useful.

About

Spring 2016 Homework 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages