Skip to content

This project extracts data having 800k records from CSV in the data factory and convert it to parquet based data and finally create a PowerBI report of that parquet based data.

Notifications You must be signed in to change notification settings

MuhammadHasaanWahid/Csv-To-Parquet-And-Data-Reporting-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Csv-to-Parquet-and-data-reporting-pipeline

Click here to see the dataset

Problem Statement

This Problem was to extract 800k records from csv file, convert it to parquet and store in datalake and then create report in Power BI. The major problem that I was facing that in csv data there were spaces in column names. So my pipeline was crashing again and again because I was converting data to parquet format and parquet does not support spaces in column names then I removed spaces and pipeline ran successfuly after that I fetch that parquet data in power bi and created a report.

The Json files

The Third Portfolio Project.json file contains information about the ADF pipeline, including the pipeline name, description, and the resources that make up the pipeline. The manifest.json file contains information about the dependencies and structure of the ARM template of the pipeline in Azure DataFactory.

PowerBI Report

Capture

About

This project extracts data having 800k records from CSV in the data factory and convert it to parquet based data and finally create a PowerBI report of that parquet based data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published