-
Notifications
You must be signed in to change notification settings - Fork 524
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
487 additions
and
0 deletions.
There are no files selected for viewing
106 changes: 106 additions & 0 deletions
106
SQL for Data Science Interviews - 365 DS/01_mock_interview.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
/* | ||
Mock Interview 1) | ||
-- table name: post_events | ||
-- user_id int | ||
-- created_at datetime | ||
-- event_name varchar | ||
-- (event, post, cancel) | ||
Question: What information would you like to start off by pulling to get an overall understanding of | ||
the post feature? | ||
Possible Answer: | ||
We might want to get an idea of OVERALL HEALTH. | ||
+ Total number of posts (number of enters) | ||
+ Posts Made by Date | ||
+ Success Rate | ||
+ Cancel Rate | ||
*/ | ||
|
||
|
||
/***** Success Rate *****/ | ||
-- success rate by date | ||
-- date | success rate = number of posts / number of enters | ||
|
||
-- explaination in english | ||
-- group by date | ||
-- count of number of posts / count of number enters | ||
|
||
SELECT created_at, | ||
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 / | ||
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success | ||
FROM post_events | ||
GROUP BY created_at | ||
ORDER BY created_at; | ||
|
||
|
||
/* | ||
Question: When sucess rates are low, how can we diagonis the issue? | ||
Possible Answer: There can be several approach to this problem. But one possible way is we can take a look at the map | ||
out between created_at and success_rate. And see whether there is a pattern during a certain period of time. | ||
This can be one off dip, etc. Based on this information, we can take a further look at if there is any underlying | ||
issue in the application or not. Or if there is any potential group of users who are causing this kind of unsccessful posts. | ||
*/ | ||
|
||
|
||
/* | ||
Questions: | ||
What are the success rates by day? | ||
Which day of the week has the lowest success rate? | ||
*/ | ||
|
||
-- group by dow of created_date | ||
-- average of perc_success | ||
-- order by per_success | ||
-- day | per_sucess | ||
|
||
WITH created_events AS( | ||
SELECT created_at, | ||
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 / | ||
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success | ||
FROM post_events | ||
GROUP BY created_at | ||
ORDER BY created_at) | ||
|
||
SELECT EXTRACT (dow FROM created_at) AS dow, | ||
AVG(percent_success) | ||
FROM created_events | ||
GROUP BY 1 | ||
ORDER BY 2 ASC; | ||
|
||
|
||
/* | ||
Question: What could be a problem if we're aggregating on percent success? | ||
Possible Answer: this can lead to a problem that we're not taking into consideration of underlying distribution | ||
of percent success across the dates. | ||
*/ | ||
|
||
|
||
SELECT EXTRACT (dow FROM created_at) AS dow, | ||
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 / | ||
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success | ||
FROM created_events | ||
GROUP BY 1 | ||
ORDER BY 2 ASC; | ||
|
||
/*************************** Using actual table *******************************/ | ||
|
||
SELECT * | ||
FROM interviews.post_events | ||
LIMIT 10; | ||
|
||
SELECT EXTRACT (dow FROM created_at) AS dow, | ||
COUNT(CASE WHEN event_name = 'post' THEN 1 ELSE null END) * 1.00 / | ||
COUNT (CASE WHEN event_name = 'enter' THEN 1 ELSE null END) * 100 AS percent_success | ||
FROM interviews.post_events | ||
GROUP BY 1 | ||
ORDER BY 2 ASC; |
50 changes: 50 additions & 0 deletions
50
SQL for Data Science Interviews - 365 DS/02_mock_interview.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
/* | ||
Mock Interview 2) | ||
Question: | ||
-- 1) Find the date with the highest total energy consumption from our datacenters. | ||
-- 2) Output the date along with the total engery consumption across all datacenters. | ||
-- Table: eu_energy | ||
date datetime | ||
consumption int | ||
-- Table: asia_energy | ||
date datetime | ||
consumption int | ||
-- Table: na_energy | ||
date datetime | ||
consumption int | ||
*/ | ||
|
||
/****** | ||
Always make sure you under the tables and columns correctly. Ask the interviewer if you need to make any assumptions | ||
on the columns data, etc. | ||
1) So there are 3 tables representing energy consumptions across different continents. | ||
Can I assume that there is only one energy consumption for each particular date Or can there be a multiple consumptions | ||
for a specific date? Is there any possible missing values too? | ||
We can just sum it up across different dates across the different tables. | ||
*****/ | ||
|
||
|
||
/* | ||
Question: What would you do if in the first table there are two of the same dates with | ||
different energy consumptions? | ||
Possibe Answer: | ||
Then we can just group by using Date and sum the engery consumptions. | ||
*/ | ||
|
||
|
||
/**** | ||
Clarification Question back to Interviwer: | ||
Is there a situation that there are Multiple Dates with the same highest Total Energy Consumption? | ||
****/ | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.