data-interview-questions

A collection of interview questions for data science, data analytics, data engineering, and machine learning

Created and currated by the Sharpestminds community. Contributions are welcome! Submit a pull-request with more questions or resources :)

Questions
Resources

Questions

If you’re currently employed, why you don’t you like your current job? If you’re not, what are some of the ideal aspects you’re looking for in a data science or data analyst job? What specifically excites you and why do you want to do it?
Out of all of your data analyst projects what was the one you're most proud of and why?
Have you ever found an analysis that was the opposite of what you were hoping for? If not, suppose you performed an analysis where a client was hoping for one outcome, but in fact you got the opposite of what they were hoping it would show? How did you or would you handle this?
You're a consultant. Briefly and concisely describe your approach/process to how you would go about solving this problem: You're given a large corpus of Twitter data. Please predict whether the tweet is a disaster or not.
Explain structured vs unstructured vs semi-structured data.
What are the advantages of RELU?
Explain Loss function vs Cost function vs Objective function.
What is the purpose of the activation function? What happens if I don't use one?
What are hyper-parameters? Can you list some?
L1 vs L2 regularization?
Normalizing vs standardizing inputs? Why do we even do this?
Your model is producing weird results. What is your process of debugging the model?
What id the difference bwteen Mini batch gradient descent vs stochastic gradient descent?
Briefly describe gradient descent with momentum, RMSprop,and Adam.
What are the benefits of Batch Normalization?
You're the new and ONLY DS hire for a company that analyzes medical images. The company has 10 million samples. Of the 10 million samples: 3 million are bone fractures for the foot, 2 million are bone fractures for the knee, and 5 million are bone fractures for the shoulder. The company wants to predict the severity of bone fractures in the foot. How would you split your dataset for train, validation, and testing (what % of each)?
Both in technical and non-technical language, explain Support Vector Machines.
Both in technical and non-technical language, explain the Perceptron.
Both in technical and non-technical language, explain KNNs.
Both in technical and non-technical language, explain K-Means Clustering.
How does K-Means work?
Both in technical and non-technical language (or, to a 5 year old), explain standard deviation.
Both in technical and non-technical language, explain Decision Trees.
Both in technical and non-technical language, explain Naive Bayes.
Both in technical and non-technical language, explain Linear Regression.
Both in technical and non-technical language, explain Logistic Regression.
How do you train Logistic Regression?
Both in technical and non-technical language, explain Perceptron.
Both in technical and non-technical language, explain Bagging.
Both in technical and non-technical language, explain Boosting.
Both in technical and non-technical language, explain LSTMs.
Both in technical and non-technical language, explain CNNs and its architecture.
Both in technical and non-technical language, explain Transformers and Attention.
Both in technical and non-technical language, explain PCA.
Both in technical and non-technical language, explain SVD.
Why do we max pool and what are its benefits?
What are the benefits of padding?
How does ResNet work?
Walk me through how object detection and localization works.
How does a BiLSTM work and what are its advantages?
How does Name Entity Recognition work?
Walk me through an A/B Test. How long do I run it and how do I determine how many samples I need?
Sensitivity vs Specificity vs Recall vs Precision.
Both in technical and non-technical language, explain ROC and AUC.
Stratified vs Cluster sampling.
Both in technical and non-technical language, explain eigenvectors and eigenvalues.
How do I choose the optimal hyper-parameter in K-Means?
What are tensors?
What are the assumptions when using linear regression?
Bayesian Estimate vs Maximum Likelihood Estimation?
What is multicollinearity? Why might it be problematic?
What is R-Square? Write out the formula.
Covariance vs Correlation?
What is a confounding variable?
Explain confidence intervals.
What is TF-IDF?
Discriminative models vs Generative models?
What it an N-gram model?
Stemming vs Lemmatization.
What is word2vec? Explain skip gram. Explain CBOW.
Binomial vs Geometric vs Hypergeometric vs Poisson distributions.
Explain P-values.
Give me a simple example using a t-Test for one mean.
Give me a simple example using confidence intervals.
What is the relationship between Normal distribution and Binomial distribution.
PMF vs PDF vs CDF.
Central Limit Theorem vs Law of Large Numbers.
Permutation vs Combination. Write out both formulas.
Parametric vs Non-Parametric.
Explain Chi-square test.
Why do we divide sample variance by n-1?
How would you determine if males and females have different heigh

More resources

If it's a classical machine learning or data science role (no deep learning), it will help a lot to go through the first 4 chapters of Ian Goodfellow's book Deep Learning.
If it's an AI / deep learning role, go through the next 6 chapters of Deep Learning.
HackerRank problems in algorithms and data structures are also really good. Many companies don't understand how to interview data scientists properly, so they revert to testing for software engineering skills because they do understand how to do that. You might think this doesn't make much sense, and you'd be right about that. But it's the way it is sometimes.
In the same way, LeetCode has a bunch of great algorithm problems plus mock interview questions sets if you create an account. You can usually find answers to common LeetCode questions by Googling them, so you may not have to pay for a subscription to get what you need from it.
A great slide deck on the ML / DS interview process from the inside, put together by Chip Huyen. (Thanks to Sowmya Vajjala for suggesting this resource!)
A comprehensive set of instructions on how to prep for technical interviews in ML. Highly recommended.
A list of Google AI interview questions.
If you're willing to buy a book, here's a great one that has a massive list of top notch interview questions.

Contributors

Thanks to Deepshika for supplying an initial set of questions!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-interview-questions

Questions

More resources

Contributors

About

Releases

Packages

sharpest-minds/data-interview-questions

Folders and files

Latest commit

History

Repository files navigation

data-interview-questions

Questions

More resources

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages