Skip to content

Commit

Permalink
365 sql just exploring
Browse files Browse the repository at this point in the history
  • Loading branch information
ptyadana committed Apr 14, 2022
1 parent d5462ca commit df39adb
Show file tree
Hide file tree
Showing 5 changed files with 487 additions and 0 deletions.
106 changes: 106 additions & 0 deletions SQL for Data Science Interviews - 365 DS/01_mock_interview.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
/*
Mock Interview 1)
-- table name: post_events
-- user_id int
-- created_at datetime
-- event_name varchar
-- (event, post, cancel)
Question: What information would you like to start off by pulling to get an overall understanding of
the post feature?
Possible Answer:
We might want to get an idea of OVERALL HEALTH.
+ Total number of posts (number of enters)
+ Posts Made by Date
+ Success Rate
+ Cancel Rate
*/


/***** Success Rate *****/
-- success rate by date
-- date | success rate = number of posts / number of enters

-- explaination in english
-- group by date
-- count of number of posts / count of number enters

SELECT created_at,
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 /
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success
FROM post_events
GROUP BY created_at
ORDER BY created_at;


/*
Question: When sucess rates are low, how can we diagonis the issue?
Possible Answer: There can be several approach to this problem. But one possible way is we can take a look at the map
out between created_at and success_rate. And see whether there is a pattern during a certain period of time.
This can be one off dip, etc. Based on this information, we can take a further look at if there is any underlying
issue in the application or not. Or if there is any potential group of users who are causing this kind of unsccessful posts.
*/


/*
Questions:
What are the success rates by day?
Which day of the week has the lowest success rate?
*/

-- group by dow of created_date
-- average of perc_success
-- order by per_success
-- day | per_sucess

WITH created_events AS(
SELECT created_at,
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 /
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success
FROM post_events
GROUP BY created_at
ORDER BY created_at)

SELECT EXTRACT (dow FROM created_at) AS dow,
AVG(percent_success)
FROM created_events
GROUP BY 1
ORDER BY 2 ASC;


/*
Question: What could be a problem if we're aggregating on percent success?
Possible Answer: this can lead to a problem that we're not taking into consideration of underlying distribution
of percent success across the dates.
*/


SELECT EXTRACT (dow FROM created_at) AS dow,
COUNT(CASE WHEN event_name = 'post' 1 ELSE null END) * 1.00 /
COUNT (CASE WHEN event_name = 'enter' 1 ELSE null END) * 100 AS percent_success
FROM created_events
GROUP BY 1
ORDER BY 2 ASC;

/*************************** Using actual table *******************************/

SELECT *
FROM interviews.post_events
LIMIT 10;

SELECT EXTRACT (dow FROM created_at) AS dow,
COUNT(CASE WHEN event_name = 'post' THEN 1 ELSE null END) * 1.00 /
COUNT (CASE WHEN event_name = 'enter' THEN 1 ELSE null END) * 100 AS percent_success
FROM interviews.post_events
GROUP BY 1
ORDER BY 2 ASC;
50 changes: 50 additions & 0 deletions SQL for Data Science Interviews - 365 DS/02_mock_interview.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
Mock Interview 2)
Question:
-- 1) Find the date with the highest total energy consumption from our datacenters.
-- 2) Output the date along with the total engery consumption across all datacenters.
-- Table: eu_energy
date datetime
consumption int
-- Table: asia_energy
date datetime
consumption int
-- Table: na_energy
date datetime
consumption int
*/

/******
Always make sure you under the tables and columns correctly. Ask the interviewer if you need to make any assumptions
on the columns data, etc.
1) So there are 3 tables representing energy consumptions across different continents.
Can I assume that there is only one energy consumption for each particular date Or can there be a multiple consumptions
for a specific date? Is there any possible missing values too?
We can just sum it up across different dates across the different tables.
*****/


/*
Question: What would you do if in the first table there are two of the same dates with
different energy consumptions?
Possibe Answer:
Then we can just group by using Date and sum the engery consumptions.
*/


/****
Clarification Question back to Interviwer:
Is there a situation that there are Multiple Dates with the same highest Total Energy Consumption?
****/

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit df39adb

Please sign in to comment.