Application of our De-identification Framework with open source technologies, enabling enterprises to take ownership of the de-identification process and deploy it in trusted environments.
-
Updated
Nov 15, 2021 - Python
Application of our De-identification Framework with open source technologies, enabling enterprises to take ownership of the de-identification process and deploy it in trusted environments.
Este projeto é uma adaptação com base em um teste real para uma posição de Engenheiro de Dados Jr.
A cloud-based ETL testing and data analysis pipeline for YouTube trending video data using AWS services including Lambda, Glue, Athena, S3, and QuickSight. This project focuses on ingesting, transforming, storing, and analyzing structured and semi-structured data to generate insights based on video categories and trending metrics.
This repo is designed to show how to read and write data from/to google cloud storage with pyspark. The raw data is ingested, transformed and stored in the data lake in snapshot format.
Add a description, image, and links to the datalake-ingestion topic page so that developers can more easily learn about it.
To associate your repository with the datalake-ingestion topic, visit your repo's landing page and select "manage topics."