Databases and Data Warehouses | |||||
---|---|---|---|---|---|
GitHub Repo | Official page | Questions | Description | Useful links | |
Apache Cassandra | Cassandra is a distributed, wide-column store, NoSQL database management system. | Awesome Cassandra | |||
Greenplum | Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. | Awesome Greenplum | |||
MongoDB | MongoDB is a document-oriented database. | Awesome MongoDB | |||
Apache Hbase | HBase is an open-source non-relational distributed database. | Awesome HBase | |||
Apache Hive | Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. | Awesome Hive | |||
Amazon DynamoDB | Amazon DynamoDB is a fully managed proprietary NoSQL database service. | Awesome DynamoDB Awesome AWS | |||
Amazon Redshift | Amazon Redshift is a data warehouse product. | Amazon Redshift Utilities Awesome AWS | |||
BigQuery GCP | BigQuery is a fully-managed, serverless data warehouse. | Awesome BigQuery | |||
Bigtable GCP | Bigtable is a fully managed wide-column and key-value NoSQL database service. | Awesome Bigtable | |||
Data Formats | |||||
Apache Avro | Avro is a row-oriented remote procedure call and data serialization framework. | Awesome Avro | |||
Apache Parquet | Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval. | TODO | |||
Delta | Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines | Delta examples | |||
Big Data Frameworks | |||||
Apache Airflow | Apache Airflow is a workflow management platform for data engineering pipelines. | Awesome Airflow | |||
Apache Flume | Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. | TODO | |||
Apache Hadoop | Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. | Awesome Hadoop | |||
Apache Impala | Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. | TODO | |||
Apache Kafka | Apache Kafka is a distributed event store and stream-processing platform. | Awesome Kafka | |||
Apache NiFi | Apache NiFi is a software project designed to automate the flow of data between software systems. | Awesome NiFi | |||
Apache Spark | Apache Spark is unified analytics engine for large-scale data processing. | Awesome Spark | |||
Apache Flink | Apache Flink is unified stream-processing and batch-processing framework. | Awesome Flink | |||
Kubernetes | Kubernetes is a system for managing containerized applications across multiple hosts. | Awesome Kubernetes | |||
Cloud providers | |||||
Amazon Web Services | Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions. | Awesome AWS | |||
Microsoft Azure | Microsoft Azure is Microsoft's public cloud computing platform. | Awesome Azure | |||
Google Cloud Platform | Google Cloud Platform is a suite of cloud computing services. | Awesome GCP | |||
Theory | |||||
DWH Architectures | A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise. | Awesome databases | |||
Data Structures | A data structure is a specialized format for organizing, processing, retrieving and storing data. | TODO | |||
SQL | SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). | Awesome SQL | |||
Data visualization tools/BI | |||||
Tableau | Tableau is a powerful data visualization tool used in the Business Intelligence. | TODO | |||
Looker | Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time. | TODO | |||
Apache Superset | Superset is a modern data exploration and data visualization platform | TODO |
-
Notifications
You must be signed in to change notification settings - Fork 506
More than 2000+ Data engineer interview questions.
OBenner/data-engineering-interview-questions
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
More than 2000+ Data engineer interview questions.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published