This project focuses on dynamically optimizing product prices based on sentiment analysis of customer reviews. The model combines sentiment analysis with reinforcement learning to adjust prices in a way that maximizes revenue while maintaining customer satisfaction.
- Apache Spark: The base engine for large-scale data processing.
- GCP Dataproc Clusters: Used for processing data at a large scale on Google Cloud Platform.
- Databricks Workspace: Used as an alternative environment for running the model.
We used the Amazon Reviews Dataset (2023), available on Hugging Face, which includes customer reviews for a variety of products. The dataset provides information such as:
- asin: Unique identifier for each product.
- rating: Customer ratings (1-5).
- text: Review text.
- timestamp: Unix timestamp for the review.
- verified_purchase: Indicates if the review is from a verified purchase.
You can find the dataset here: Amazon Reviews Dataset
-
Upload the Tables:
- First, upload the
All_Beauty.jsonlandmeta_All_Beauty.jsonlfiles to the/FileStore/tablesdirectory in the Databricks.
- First, upload the
-
Import the Notebook:
- Import the
Review-based-price-optimization.ipynbfile into your Databricks workspace.
- Import the
-
Attach a Cluster:
- Attach a cluster to your notebook and run the code.
-
Set Up GCP Bucket:
- Set up a GCP bucket using the GCP Console and create two folders inside it:
/dataand/scripts.
- Set up a GCP bucket using the GCP Console and create two folders inside it:
-
Upload Files:
-
Upload the
install_text_blob.shfile to the/scriptsfolder in the bucket using the following GCP command:gsutil cp install_textblob.sh gs://<your-bucket-name>/scripts/install_textblob.sh
-
Also upload your dataset (e.g.,
Clothing_Shoes_and_Jewelry.jsonl) to the/datafolder using the following GCP command:gsutil cp /<file-location>/Clothing_Shoes_and_Jewelry.jsonl gs://<your-bucket-name>/data/
-
-
Set Up Dataproc Cluster:
-
Set up the GCP Dataproc cluster using GCP Console or the following GCP command, and make sure to use
install_text_blob.shas an initialization action../google-cloud-sdk/bin/gcloud dataproc clusters create <your-cluster-name> --enable-component-gateway --bucket <your-bucket-name> \ --region us-central1 --master-machine-type n2-standard-2 --master-boot-disk-type pd-balanced \ --initialization-actions=gs://<your-bucket-name>/scripts/install_textblob.sh \ --master-boot-disk-size 32 --num-workers 2 --worker-machine-type n2-standard-2 --worker-boot-disk-type pd-balanced \ --worker-boot-disk-size 32 --image-version 2.2-debian12 --project <your-project-name>
-
-
Export Environment Variables:
-
Once the cluster is running, SSH into the Master node and export the following environment variables:
export DATA_BUCKET="gs://<your-bucket-name>/data" export file_location="Clothing_Shoes_and_Jewelry.jsonl" export meta_file_location="meta_Clothing_Shoes_and_Jewelry.jsonl"
-
-
Upload the Python Script:
- Upload the
Review-based-price-optimization.pyfile to the Master node.
- Upload the
-
Run the Spark Job:
-
Use the following
spark-submitcommand to start the execution of the script:spark-submit Review-based-price-optimization.py \ --cluster=my-dataproc-cluster \ --region=us-central1 \ --properties=DATA_BUCKET=gs://<your-bucket-name>,DATA_LOCATION=us-central1
-
-
Observe the Results:
- Once the job has completed, observe the output and results from the model. The results will show the agent's actions for price adjustments and the corresponding rewards based on sentiment and ratings.
We welcome contributions! If you have ideas to improve the model or encounter any issues, feel free to fork the repository and submit a pull request.
This project is open source and available under the MIT License.