| Databases and Data Warehouses | |||||
|---|---|---|---|---|---|
| GitHub Repo | Official page | Questions | Description | Useful links | |
| Apache Cassandra | Cassandra is a distributed, wide-column store, NoSQL database management system. | Awesome Cassandra | |||
| Greenplum | Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. | Awesome Greenplum | |||
| MongoDB | MongoDB is a document-oriented database. | Awesome MongoDB | |||
| Apache Hbase | HBase is an open-source non-relational distributed database. | Awesome HBase | |||
| Apache Hive | Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. | Awesome Hive | |||
| Amazon DynamoDB | Amazon DynamoDB is a fully managed proprietary NoSQL database service. | Awesome DynamoDB Awesome AWS | |||
| Amazon Redshift | Amazon Redshift is a data warehouse product. | Amazon Redshift Utilities Awesome AWS | |||
| BigQuery GCP | BigQuery is a fully-managed, serverless data warehouse. | Awesome BigQuery | |||
| Bigtable GCP | Bigtable is a fully managed wide-column and key-value NoSQL database service. | Awesome Bigtable | |||
| Data Formats | |||||
| Apache Avro | Avro is a row-oriented remote procedure call and data serialization framework. | Awesome Avro | |||
| Apache Parquet | Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval. | TODO | |||
| Delta | Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines | Delta examples | |||
| Big Data Frameworks | |||||
| Apache Airflow | Apache Airflow is a workflow management platform for data engineering pipelines. | Awesome Airflow | |||
| Apache Flume | Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. | TODO | |||
| Apache Hadoop | Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. | Awesome Hadoop | |||
| Apache Impala | Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. | TODO | |||
| Apache Kafka | Apache Kafka is a distributed event store and stream-processing platform. | Awesome Kafka | |||
| Apache NiFi | Apache NiFi is a software project designed to automate the flow of data between software systems. | Awesome NiFi | |||
| Apache Spark | Apache Spark is unified analytics engine for large-scale data processing. | Awesome Spark | |||
| Apache Flink | Apache Flink is unified stream-processing and batch-processing framework. | Awesome Flink | |||
| Kubernetes | Kubernetes is a system for managing containerized applications across multiple hosts. | Awesome Kubernetes | |||
| Cloud providers | |||||
| Amazon Web Services | Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions. | Awesome AWS | |||
| Microsoft Azure | Microsoft Azure is Microsoft's public cloud computing platform. | Awesome Azure | |||
| Google Cloud Platform | Google Cloud Platform is a suite of cloud computing services. | Awesome GCP | |||
| Theory | |||||
| DWH Architectures | A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise. | Awesome databases | |||
| Data Structures | A data structure is a specialized format for organizing, processing, retrieving and storing data. | TODO | |||
| SQL | SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). | Awesome SQL | |||
| Data visualization tools/BI | |||||
| Tableau | Tableau is a powerful data visualization tool used in the Business Intelligence. | TODO | |||
| Looker | Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time. | TODO | |||
| Apache Superset | Superset is a modern data exploration and data visualization platform | TODO | |||
-
Notifications
You must be signed in to change notification settings - Fork 506
More than 2000+ Data engineer interview questions.
OBenner/data-engineering-interview-questions
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
More than 2000+ Data engineer interview questions.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published