Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hackathon - Data Methods #5

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,18 @@ A Baysian classifier minimizes the probability of missclassification. It is used
A simple graph you can you can generate to check for outliers is a histogram.

## Q8: What is a Null Hypothesis?
[Response]

----------
A general statement or default position where there is no relationship between two measured phenomena (or variables/datasets).

---------
Answer the following questions using this scenario: You just got a HUGE dataset from Spotify where each entry contains these fields -> [username, song, # of times played, user rating, genre]

## Q9: How would you figure out the most popular song?
measuring a frequency of highest user rating and number of times played.

## Q10: How do you determine what genre a certain user likes the most?
take data from 9 and look at the most common genre in the top X of the list
You could plot the dataset into Tableau and create a graph relating to Songs vs. # of times played. Then, you could incorporate the user rating to see which song is rated the highest.


## Q11: How do we match 2 users that we think may want to share playlists?

Expand Down
18 changes: 12 additions & 6 deletions README.md~
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ Some factors that go into deciding a chosen data format is how large the data fi
3. Build a webserver and write an API that dumps and queries that data in your database.

### Answers:
1.cost: only as useful as how often you analyse the data, benefit: less data cleaning
2.cost: data is not updated live, benefit: automation from periodic dumps
3.benefit: a public api allows for a diverse possibilities of analysis, such as different languages can work for whoever wants to analyze
1. cost: only as useful as how often you analyse the data, benefit: less data cleaning
2. cost: data is not updated live, benefit: automation from periodic dumps
3. benefit: a public api allows for a diverse possibilities of analysis, such as different languages can work for whoever wants to analyze


## Q5: You've now set up your database and have a website with 10,000 users, but have realized that you forgot a much needed field (say, an ID number for each user). What do you do and how might different database designs have helped this situation?
Expand All @@ -58,17 +58,23 @@ A Baysian classifier minimizes the probability of missclassification. It is used
A simple graph you can you can generate to check for outliers is a histogram.

## Q8: What is a Null Hypothesis?
[Response]

----------
A general statement or default position where there is no relationship between two measured phenomena (or variables/datasets).

---------
Answer the following questions using this scenario: You just got a HUGE dataset from Spotify where each entry contains these fields -> [username, song, # of times played, user rating, genre]

## Q9: How would you figure out the most popular song?
<<<<<<< HEAD
measuring a frequency of highest user rating and number of times played.

## Q10: How do you determine what genre a certain user likes the most?
take data from 9 and look at the most common genre in the top X of the list
=======
You could plot the dataset into Tableau and create a graph relating to Songs vs. # of times played. Then, you could incorporate the user rating to see which song is rated the highest.

## Q10: How do you determine what genre a certain user likes the most?
Genre vs. username. You could also the graph above to find what songs belong to each genre, but I think the initial comparison would give you what is necessary.
>>>>>>> ecc861a95e8f544174d27c7a50166b9d97c52da0

## Q11: How do we match 2 users that we think may want to share playlists?

Expand Down