The AzureML studio allows the user to manage their dataset and datastores directly inside the portal.
A Dataset is a resource for exploring, transforming and managing data in Azure Machine Learning. Datasets enable:
-
Easy access to data: without worrying about connection strings or data paths. Only keep a single copy of data in the storage service of your choice.
-
Training with big data: seamless integration with Azure Machine Learning features like labelling, training products and pipelines. Users can share and reuse datasets in various experiments.
-
Tracking data usage: Azure ML automatically tracks which version of the dataset was used for the ML experiment and produced which model.
⭐ Download dataset: IBM-Employee-Attrition.csv
-
Download the IBM Attrition dataset by clicking on this link: IBM-Employee-Attrition.csv and save the file to disk.
-
Goto the AzureML studio
-
Navigate to the left pane of your workspace. Select
Datasets
under theAssets
section. -
Click on
Create dataset
and choose 'From local files'. Name the datasetIBM-Employee-Attrition
and then clickNext
. Make sure to leave the dataset type as Tabular. -
Click
Browse
, choose the file you had downloaded, and clickNext
to create the dataset in the workspace's default Blob storage. -
Make sure to select
Use headers from the first file
underColumn headers
. ClickNext
through the following. Make sure theSchema
section looks good before continueing. -
Finally, in the
Confirm Details
section, selectProfile this dataset after creation
and specify thecpu-cluster
that you previously created as the compute to use for profiling.
-
Now, click on the newly created dataset and click
Explore
. Here you can see the fields of the Tabular dataset. -
To view the profile of the dataset we generated in the previous step, click the
Profile
tab. If you want to regenerate a profile (or you created the dataset without selecting the profile option), you can clickGenerate profile
and select a cluster to generate profile information for the dataset. -
In the
Consume
tab we can find a short code snippet for consuming the dataset.
For more information on datasets, see the how-to for more information on creating and using Datasets.