Skip to content

catweisun/hdinsight-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

##PowerShell helper functions for Azure HDInsight

This is a collection of helper functions I've created for working with HDInsight. Currently it provides functions for uploading (add,) downloading (get,) listing (find,) and deleting (remove,) operations on files in the HDInsight cluster primary storage account.

The primary storage account for an HDInsight cluster is really just an Azure blob, so these helpers are really just using the Azure PowerShell cmdlets for blob operations. But, they query your HDInsight cluster to figure out the storage to use, so you only have to know the cluster name for these operations.

As I encounter other common tasks that can benefit from helper functions, I'll add them to this project.

##Contributing

If you have things you'd like to see in this collection, add an issue using the issues link. If you have something you'd like to contribute, fork this repository, write the code, test it, then send a pull request.

##Why do this?

Because actions like uploading and downloading files are things you do every day, so it makes sense to encapsulate them in a nice function, rather than writing 5+ lines of code everytime you write a new script to upload something, run a job, and download the results.

##Installing

  1. Download the hdinsight-tools.psm1.

  2. From PowerShell, use import-module hdinsight-tools.psm1 to load the module.

##Using

You must have an active Azure subscription to use this. You must also have an active subscription associated with the Azure PowerShell (Add-AzureAccount or Import-AzurePublishSettingsFile). The HDInsight-Tools will bark at you if you don't have these.

###Working with files

  • Add-HDInsightFile - Adds a single file from the local file system (or networked,) to the primary storage container for your HDInsight cluster

  • Remove-HDInsightFile - Removes a single file from the primary storage container for your HDInsight cluster

  • Get-HDInsightFile - Gets a single file from the primary storage container for your HDInsight cluster

  • Find-HDInsightFile - Lists file(s) from the primary storage container for your HDInsight cluster

  • Get-HDInsightStorage - Lists the storage account(s) associated with your HDInsight cluster. Mostly useful if you are going to use some other tool like AzCopy

NOTE: This currently only works on one file at a time. Mainly because it depends on using a context generated by New-AzureStorageContext, and this doesn't serialize/deserialize when using workflows, which you have to use with foreach -parallel. If you have a solution to this to allow parallel file operations using a file context, please let me know. Otherwise, use Get-HDInsightStorage to find your storage containers, then use something like AzCopy to perform bulk copy operations.

About

random tools for Azure HDInsight

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors