As an avid fan of Spotify's Discover Weekly playlist, I always wanted to have a scheduled, automated, self-controlled lean and cheap way of backing up the weekly generated tracks. There are different plugins to achieve the same result and integrate much more easily, so bear in mind that this is a slightly over-engineered solution for cloud infrastructure enthusiasts.
This project uses
- python (including the spotipy module) to back up Spotify playlist tracks by copying them into another playlist,
- serverless Google cloud infrastructure (Cloud functions, Cloud scheduler, Cloud storage, Secret manager, etc.) to schedule and run the code,
- terraform to manage the cloud infrastructure in code
- Github Actions to test and deploy the code
The cloud function which executes the script is idempotent, i.e. it "can be applied multiple times without changing the result beyond the initial application" (see wiki/Idempotence).
In case you are not interested in backing up Spotify playlists or Spotify's Discover Weekly, this project may be of interest if you want to learn how to automate and schedule a python function call using cloud infrastructure and infrastructure as code.
Existing Google Cloud project and terraform (>=v1.3.4).
Short summary of what follows below in more detail:
- Create Spotify app in Spotify Developer Dashboard to create credentials to authenticate later on and identify playlist IDs.
- Apply terraform to create cloud infrastructure
- Run auth script to create and store refresh token in Google Secret, this equips the cloud function with access to the Spotify account.
- Configure Github for CI/CD
- Test cloud function locally via curl
As a first step, we need to create an app in the Spotify Developer dashboard. This will provide a client ID and client secret.
To modify our private playlist, we will need to authenticate via the authorization code flow.
For this we also need to set up a redirect URI to, which we can achieve via Edit settings
. In our case, http://localhost:8080
will work to run the authentication locally.
export SPOTIFY_CLIENT_ID="{your-spotify-client-id}"
export SPOTIFY_CLIENT_SECRET="{your-spotify-client-secret}"
export SPOTIFY_REDIRECT_URI="{your-spotify-redirect-uri}"
The authorization flow will provide us with access and refresh tokens, which we have to store securely. For this we can use Google Secret Manager. Via the refresh token, the spotipy
library will be able to request new access tokens, which are usually valid for 1 hour. To achieve all this we will use a little helper script (see step 3).
Moreover, we need to find out the IDs of our source playlist (Discover Weekly) as well as the destination playlist we want to insert the tracks into. You can get the ID either through the web app or within the installed app via right-clicking on the playlist and then Share -> Copy link to playlist
.
The link looks like this: https://open.spotify.com/playlist/{id}
.
export SOURCE_PLAYLIST_ID="{your-source-playlist-id}"
export DESTINATION_PLAYLIST_ID="{your-destination-playlist-secret}"
We can then create our infrastructure via terraform:
cd terraform
terraform plan
terraform apply
After roughly two minutes the infrastructure should be created:
Apply complete! Resources: 20 added, 0 changed, 0 destroyed.
Outputs:
cloud_function_bucket_name = "dw-saver-54a6"
cloud_function_name = "dw-saver"
cloud_function_secret_id = "dw-saver-token"
cloud_function_service_account = "dw-saver@{your-project-id}.iam.gserviceaccount.com"
cloud_function_url = "https://europe-west3-{your-project-id}.cloudfunctions.net/dw-saver"
deployment_service_account = "dw-saver-deployment@{your-project-id}.iam.gserviceaccount.com"
workload_identity_provider = "projects/{your-num-project-id}/locations/global/workloadIdentityPools/github-pool/providers/github-provider"
Straight after creation, the cloud function will not work properly yet, since we have not provided spotify client ID and secret nor the refresh and access token. We will generate these in the next step.
If we have created the infrastructure before, we might want to import previously created long-living resources like the Workload Identity pool and provider beforehand:
terraform import google_iam_workload_identity_pool.deployment github-pool
terraform import google_iam_workload_identity_pool_provider.deployment github-pool/github-provider
terraform apply ...
With all environment variables in place locally, we can use our helper script to generate the refresh and access token and store them in Google Secret Manager (making use of the GoogleSecretManagerCacheHandler
class):
SPOTIFY_CLIENT_ID=$SPOTIFY_CLIENT_ID \
SPOTIFY_CLIENT_SECRET=$SPOTIFY_CLIENT_SECRET \
SPOTIFY_REDIRECT_URI=$SPOTIFY_REDIRECT_URI \
GCP_PROJECT_ID=$GCP_PROJECT_ID \
GCP_SECRET_ID=$GCP_SECRET_ID \
python app/auth.py
We have still not provided Spotify access to our cloud function, which we will do in the next step.
Since we plan to maintain this codebase in a serverless fashion, we can use Github actions to deploy and test the code and infrastructure.
To grant our deployment access to Spotify we have to add the Spotify credentials SPOTIFY_CLIENT_ID, SPOTIFY_CLIENT_SECRET, SPOTIFY_REDIRECT_URI
to the repo secrets via Settings -> Secrets -> Actions
.
To make the deployment work via Github Actions we need to specifcy a few more environment variables.
We can read these from the corresponding terraform
outputs after we have run the terraform apply
:
Outputs:
cloud_function_bucket_name = "dw-saver-54a6"
cloud_function_name = "dw-saver"
cloud_function_secret_id = "dw-saver-token"
cloud_function_service_account = "dw-saver@{your-project-id}.iam.gserviceaccount.com"
cloud_function_url = "https://europe-west3-{your-project-id}.cloudfunctions.net/dw-saver"
deployment_service_account = "dw-saver-deployment@{your-project-id}.iam.gserviceaccount.com"
workload_identity_provider = "projects/{your-num-project-id}/locations/global/workloadIdentityPools/github-pool/providers/github-provider"
Alternatively, we can fetch these programmatically via the terraform output
command:
cd terraform
export CLOUD_FUNCTION_URL=$(terraform output -raw cloud_function_url)
export CLOUD_FUNCTION_BUCKET_NAME=$(terraform output -raw cloud_function_bucket_name)
export CLOUD_FUNCTION_SERVICE_ACCOUNT=$(terraform output -raw cloud_function_secret_id)
export CLOUD_FUNCTION_SECRET_ID=$(terraform output -raw cloud_function_service_account)
...
If we add these to the Github deploy workflow we can use Github actions to do the deployment using gcloud CLI
and eventually test it.
We can ping the cloud function endpoint and should receive an OK
:
export CLOUD_FUNCTION_URL=$(cd terraform && terraform output -raw cloud_function_url)
curl \
--silent \
-X POST \
$CLOUD_FUNCTION_URL \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{}'
We can test the cloud function end-to-end locally before deployment using the functions framework
.
If we want to test the Spotify integration, we need to make sure we have access to the secret. This can be checked via
gcloud secrets \
versions access latest \
--secret=$CLOUD_FUNCTION_SECRET_ID \
--impersonate-service-account=$CLOUD_FUNCTION_SERVICE_ACCOUNT
We can start the function in the background (using the ampersand)
pip install functions-framework
functions-framework --target copy_tracks --port 8765 &
and then send request via
curl -X POST localhost:8765 -d '{}'
More on this to be found in a Google blog post How to develop and test your Cloud Functions locally
We can set up above test in a programmatic way and by mocking the relevant parts or by using dry-run functionality, again either
- by using the functions framework or by
- testing the python function itself
Since our cloud function is idempotent and invocations come with almost no cost, we can also run test using the real infrastructure, either by
- invoking the function locally but using the Spotify credentials from Google cloud, see test-integration.yml,
- invoking the remote cloud function using the http endpoint and curl, see deploy.yml
You can delete all objects in the bucket via
gsutil rm -a gs://${CLOUD_FUNCTION_BUCKET_NAME}/**
We can destroy all infrastructure using a simple terraform destroy
, but there is a drawback: Since Workload Identity pools and providers are soft-deleted and recreating them under the same name is blocked for 30 days, we may want to remove the pool and provider state first, in case we want to apply the infrastructure again:
cd terraform
terraform state rm google_iam_workload_identity_pool.deployment
terraform state rm google_iam_workload_identity_pool_provider.deployment
terraform destroy
If we do not bother, a simple terraform destroy
will do though.
As per 2022-12-27 the cloud infrastructure cost for Cloud Function, Cloud Storage, Cloud Scheduler are below 0.01$ per month and within the free tiers, if deployments are done moderately (few a week) and schedule is reasonable (few times a week).
Only potential cost driver are the Secrets, if multiple versions are kept active (i.e. undestroyed). Since the refresh token produces new versions regularly, versions can add up and produce low but increasing cost exceeding the free tier. Thus there is a method (see _delete_old_versions
in utils.py)) to destroy old secret versions, which is invoked by default once the secret has been updated.
-
We can update the environment variables used in the cloud function deployment "in-place" (will still trigger a new deployment) via
gcloud functions deploy \ --region=europe-west3 \ discover-weekly-saver \ --update-env-vars \ GCP_SECRET_ID=${GCP_SECRET_ID} \ ...
-
Potential improvements:
- To make the project completely "serverless", the Terraform deployment could be moved into CI/CD, i.e. Github actions, for ex. using atlantis.
- For full reproducability, the python code could be put into Docker. More on this to be found in another blogpost: Building a serverless, containerized batch prediction model using Google Cloud Run, Pub/Sub, Cloud Storage and Terraform.
- For better testing a staging environment could be added, and we could also use staging Spotify client or even staging playlists (as there is only one Spotify environment).
- Since Secrets does not seem to be made for frequently changing credentials (as it is an immutable cache that adds new versions) a different cache could be used. Alternatively, the token could be encrypted using a static token and then stored in cloud storage or another key value store.
- Secrets could also be used for the Spotify client ID and secret.
- Terraform code could be modularized.
-
NotFound: 404 Secret [projects/{your-project-id}/secrets/dw-saver-token] not found or has no versions
: You need to create the secret version first. This can be achieved by running the authentication script (see setup step 3) on your local machine. -
Error creating Job: googleapi: Error 409: Job projects/{your-project-id}/locations/europe-west3/jobs/{something} already exists
: This may be a race condition between resources. Usually resolved by runningterraform apply
again. -
Trouble getting Workload Identity running remotely: Alternatively you can generate identity (not access!) tokens manually using the
gcloud
CLI:token=$(gcloud \ auth print-identity-token \ --impersonate-service-account=$CLOUD_FUNCTION_SERVICE_ACCOUNT \ --include-email) response=$(curl \ --silent \ -m 310 \ -X POST \ $CLOUD_FUNCTION_URL \ -H "Authorization: bearer ${token}" \ -H "Content-Type: application/json" \ -d '{}')
-
Error: could not handle the request
: Check cloud function logs in cloud console, usually a problem in the cloud function's python code itself. -
You can verify the token stored in Google Secret Manager using the
GoogleSecretManagerCacheHandler
:from utils import GoogleSecretManagerCacheHandler cache_handler = GoogleSecretManagerCacheHandler(project_id=GCP_PROJECT_ID, secret_id=GCP_SECRET_ID) cache_handler.get_cached_token()
-
End-of-central-directory signature not found. Either this file is not..
: Setgzip
ingoogle-github-actions/upload-cloud-storage
tofalse
(true
by default) -
You can restore a deleted Workload Identity pool or provider via cloud console via
IAM & Admin > Workload Identity Federation
If you have any questions, feel free to add an issue or PR in this repo or ping me on LinkedIn.