Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Microsoft Fabric Utilities

Python script to upload file in GIT repo to Fabric lakehouse

For config-driven data pipelines or notebooks, the config files are generally stored in the "Files" section of the Fabric Lakehouse. However, the Git integration with Fabric only syncs the lakehouse metadata, not the actual data files. Therefore, the config files must be version controlled outside of Fabric and uploaded to the lakehouse manually. Including this process in the CI/CD pipeline ensures that the latest config files are always available in the lakehouse, and can be promoted to the higher environments.

To facilitate that, the python script upload-file-to-lakehouse.py uploads a file from a GIT repository to a Fabric lakehouse. The script uses a service principal with a client secret and uses Azure Data Lake APIs to authenticate and upload the file.

We plan to use this script in the future to automate the process of uploading "config" files from the Git repository to the Fabric Lakehouse as part of the CI/CD process.

Usage

Here is a sample usage of the script:

  • cd into the lakehouse-file-upload directory.

  • Rename the .envtemplate to .env and update the values for the following environment variables:

    # Service principal (SP) credentials
    AZURE_CLIENT_ID="<Azure SP client id>"
    AZURE_TENANT_ID="<Azure SP tenant id>"
    AZURE_CLIENT_SECRET="<Azure SP client secret>"
    # Microsoft Fabric workspace and lakehouse details
    ONELAKE_ACCOUNT_NAME="onelake"
    FABRIC_WORKSPACE_ID="<Microsoft Fabric workspace id>"
    FABRIC_LAKEHOUSE_ID="<Microsoft Fabric lakehouse id>"
    # Azure DevOps details and personal access token (PAT)
    GIT_ORGANIZATION_NAME="<Azure DevOps organization name>"
    GIT_PERSONAL_ACCESS_TOKEN="<Azure DevOps PAT>"
    GIT_PROJECT_NAME="<Azure DevOps project name>"
    GIT_REPO_NAME="<Azure DevOps repository name>"
    GIT_BRANCH_NAME="<Azure DevOps branch name>"
  • Run the script as instructed below:

    # Create a virtual environment
    $ python3 -m venv .venv
    
    # Activate the virtual environment
    source .venv/bin/activate
    
    # Install the dependencies
    $ pip install -r requirements.txt
    
    # View the help
    $ python3 upload-file-to-lakehouse.py -h
      usage: upload-file-to-lakehouse.py [-h] [--upload_from {local,git}] source_file_path target_file_path
    
      Script to upload local file or file from Azure Repo to Fabric lakehouse.
    
      positional arguments:
        source_file_path      The source file path of the local file or in the Azure Repo.
        target_file_path      The target file path in the Fabric lakehouse.
    
      options:
        -h, --help            show this help message and exit
        --upload_from {local,git}
                              Specify the source of the file to upload: 'local' or 'git'. Default is 'local'.
    
    # Run the script to upload a file from the local file system to the lakehouse
    $ python3 upload-file-to-lakehouse.py --upload_from local requirements.txt config/requirements.txt
    
      [I] Fabric workspace id: a6730feb-5c55-4b6b-8ff7-7cde1d746452
      [I] Fabric lakehouse id: 4fd95a9f-160c-488c-8341-887974929c36
      [I] Source file path: ./requirements.txt
      [I] Target file path: config/requirements.txt
      [I] Upload from: local
      [I] Uploading local './requirements.txt' to './requirements.txt'
      [I] Reading the file just uploaded from 4fd95a9f-160c-488c-8341-887974929c36/Files/config/requirements.txt
      b'azure-core==1.32.0\nazure-devops==7.1.0b4\nazure-identity==1.19.0\nazure-storage-blob==12.23.1\nazure-storage-file-datalake==12.17.0\npython-dotenv==1.0.  1\nrequests==2.32.3'
  • The file will be uploaded to the lakehouse to the "Files section of the lakehouse as shown below:

    Lakehouse File upload