Data Processing Pipeline README

Directory Structure

Notebooks Pipeline Preprocessing/: This directory contains Jupyter Notebooks for the pipeline modules
- 1.pipeline_token.ipynb: Notebook for loading and preprocessing data from Google Cloud Storage (GCS).
- 1.pipeline_token.ipynb: Notebook for tokenizing data.
- 2.pipeline_sentiment.ipynb: Notebook for performing sentiment analysis on text data.
- 3.pipeline_moderate.ipynb: Notebook for moderating text data.
- 4.pipeline_entities.ipynb: Notebook for extracting entities from text data.
- 5.pipeline_save_BQ.ipynb: Notebook for uploading data to BigQuery.
- functions.ipynb: Contains the functions to communicate with the GCS and BigQuery
Get Prediction API /: This directory contains RUN_make_prediction.ipynb to run preprocessing and call the API from a test sample.
- processing/data_preparation.py: Defines a class GCSParquetLoader for loading and preprocessing data from Google Cloud Storage (GCS).
- processing/tokenization.py: Defines a class TokenizationProcessor for tokenizing data.
- processing/sentiment.py: Defines a class GCSSentimentAnalyzer for performing sentiment analysis on text data.
- processing/moderate.py: Defines a class GCSTextModerationLoader for moderating text data.
- processing/entities.py: Defines a class GCSCEntityAnalyzer for extracting entities from text data.
- processing/bigquery.py: Defines a class GCS_Bigquery for uploading data to BigQuery.
- Get Predictions Client.ipynb: Contains the script to get the predicions from a test dataset by Client
- Get Predictions REST.ipynb: Contains the script to get the predicions from a test dataset by REST
- pipeline_preprocessing.py: Contains a class DataProcessor that orchestrates the data processing pipeline by calling different processing steps.
- main.py: Script to execute the data processing pipeline.
- request.json: File json output of the AutoML model trained.

Training autoML pipeline /:

  - 1.RUN_pipeline_prepr_and_train_autoML.ipynb to create the pipeline and the autoML api. 
  - custom folders contains docker images

Usage

Setting up Google Cloud Storage (GCS):
- Make sure you have access to a GCS bucket where your data is stored.
Running the Training Pipeline:
- in the folder Training autoML pipeline you will find the notebook RUN_pipeline_prepr_and_train_autoML that creates a pipeline of preprocessing and deploying the model in an EndPoint.
- Get predictions API will get the predictions from a test data.

Dependencies

Python 3.x
Google Cloud Storage (google-cloud-storage)
DataFlow
BigQuery
Language API
Vertex AI
Kubeflow
pandas
numpy
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Datasets_pipeline		Datasets_pipeline
Get_Predictions_API		Get_Predictions_API
Training_autoML_pipeline		Training_autoML_pipeline
data_exploration		data_exploration
0_pipeline_preprocessing.ipynb		0_pipeline_preprocessing.ipynb
1.pipeline_token.ipynb		1.pipeline_token.ipynb
2.pipeline_sentiment.ipynb		2.pipeline_sentiment.ipynb
3.pipeline_moderate.ipynb		3.pipeline_moderate.ipynb
4.pipeline_entities.ipynb		4.pipeline_entities.ipynb
5.pipeline_save_BQ.ipynb		5.pipeline_save_BQ.ipynb
6.Proof_of_deployment.ipynb		6.Proof_of_deployment.ipynb
Demo 3-end-to-end-AutoML-Topic-classification.docx		Demo 3-end-to-end-AutoML-Topic-classification.docx
README.md		README.md
functions.ipynb		functions.ipynb
requirements.txt		requirements.txt
test_notebook.ipynb		test_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Processing Pipeline README

Directory Structure

Usage

Dependencies

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Processing Pipeline README

Directory Structure

Usage

Dependencies

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages