Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Data Viewer deprecation and Data Wrangler GA doc updates #7184

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions docs/datascience/data-science-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The following installations are required for the completion of this tutorial. Ma

- [Visual Studio Code](https://code.visualstudio.com/)
- The [Python extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-python.python) and [Jupyter extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) from the Visual Studio Marketplace. For more details on installing extensions, see [Extension Marketplace](/docs/editor/extension-marketplace.md). Both extensions are published by Microsoft.
- The [Data Wrangler extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler)

- [Miniconda with latest Python](https://docs.conda.io/en/latest/miniconda.html)

Expand Down Expand Up @@ -74,38 +75,39 @@ This tutorial uses the [Titanic dataset](https://hbiostat.org/data/repo/titanic.

![Running a Jupyter notebook cell](images/data-science-tutorial/jupyter-cell-01.png)

1. After the cell finishes running, you can view the data that was loaded using the Variables Explorer and Data Viewer. First select the **Variables** icon in the notebook's upper toolbar.
1. After the cell finishes running, you can view the data that was loaded using the Variables Explorer and Data Wrangler. First select the **Variables** icon in the notebook's upper toolbar.

![Select Variables icon](images/data-science-tutorial/variable-explorer-1.png)

1. A **JUPYTER: VARIABLES** pane will open at the bottom of VS Code. It contains a list of the variables defined so far in your running kernel.

![Variables pane](images/data-science-tutorial/variable-explorer-2.png)

1. To view the data in the Pandas DataFrame previously loaded, select the Data Viewer icon to the left of the `data` variable.
1. To view the data in the Pandas DataFrame previously loaded, select the "Open in Data Wrangler" icon to the left of the `data` variable.

![Select Data Viewer icon](images/data-science-tutorial/variable-explorer-3.png)
![Select Open in Data Wrangler icon](images/data-science-tutorial/variable-explorer-3.png)

1. Use the Data Viewer to view, sort, and filter the rows of data. After reviewing the data, it can then be helpful to graph some aspects of it to help visualize the relationships between the different variables.
1. Use Data Wrangler to view, sort, and filter the rows of data. After reviewing the data, it can then be helpful to graph some aspects of it to help visualize the relationships between the different variables. Learn more about the [Data Wrangler extension in our docs](/docs/datascience/data-wrangler.md).

![Data viewer and variable explorer](images/data-science-tutorial/dataviewer.png)

Alternatively, you can use the data viewing experience offered by other extensions like [Data Wrangler](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler). The Data Wrangler extension offers a rich user interface to show insights about your data and helps you perform data profiling, quality checks, transformations, and more. Learn more about the [Data Wrangler extension in our docs](/docs/datascience/data-wrangler.md).
![Data wrangler and variable explorer](images/data-science-tutorial/datawrangler.png)

1. Before the data can be graphed, you need to make sure that there aren't any issues with it. If you look at the Titanic csv file, one thing you'll notice is that a question mark ("?") was used to identify cells where data wasn't available.

While Pandas can read this value into a DataFrame, the result for a column like **age** is that its data type will be set to **object** instead of a numeric data type, which is problematic for graphing.

This problem can be corrected by replacing the question mark with a missing value that pandas is able to understand. Add the following code to the next cell in your notebook to replace the question marks in the **age** and **fare** columns with the [numpy NaN](https://docs.scipy.org/doc/numpy/reference/constants.html?highlight=nan#numpy.nan) value. Notice that we also need to update the column's data type after replacing the values.

> **Tip**: To add a new cell you can use the insert cell icon that's in the bottom left corner of an existing cell. Alternatively, you can also use the `kbstyle(Esc)` to enter command mode, followed by the `kbstyle(B)` key.

Switch Data Wrangler from viewing to editing mode to apply a data cleaning transformation to replace it with a np.nan. Then apply another operation to change the column types of **age** and **fare** to be floats.
```python
data.replace('?', np.nan, inplace= True)
data = data.astype({"age": np.float64, "fare": np.float64})
```

> **Note**: If you ever need to see the data type that has been used for a column, you can use the [DataFrame dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html#pandas.DataFrame.dtypes) attribute.
![Data wrangler to clean data](images/data-science-tutorial/datawrangler-clean.png)

Once everything looks good, you can export the code that was generated by Data Wrangler back into your notebook.

![Data wrangler export code](images/data-science-tutorial/datawrangler-export.png)

1. Now that the data is in good shape, you can use [seaborn](https://seaborn.pydata.org/) and [matplotlib](https://matplotlib.org) to view how certain columns of the dataset relate to survivability. Add the following code to the next cell in your notebook and run it to see the generated plots.

Expand Down
28 changes: 12 additions & 16 deletions docs/datascience/jupyter-notebooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ MetaSocialImage: images/tutorial/social.png

- Create, open, and save Jupyter Notebooks
- Work with Jupyter code cells
- View, inspect, and filter variables using the Variable Explorer and Data Viewer
- View, inspect, and filter variables using the Variable Explorer and Data Wrangler
- Connect to a remote Jupyter server
- Debug a Jupyter Notebook

Expand Down Expand Up @@ -210,35 +210,31 @@ The Python Jupyter Notebook Editor window has full IntelliSense – code complet

![IntelliSense support](images/jupyter/intellisense.png)

## Variable Explorer and Data Viewer
## Variable Explorer and working with data

Within a Python Notebook, it's possible to view, inspect, sort, and filter the variables within your current Jupyter session. By selecting the **Variables** icon in the main toolbar after running code and cells, you'll see a list of the current variables, which will automatically update as variables are used in code. The variables pane will open at the bottom of the notebook.
Within a Python Notebook, it's possible to view, inspect, sort, filter, and transform the variables within your current Jupyter session. By selecting the **Variables** icon in the main toolbar after running code and cells, you'll see a list of the current variables, which will automatically update as variables are used in code. The variables pane will open at the bottom of the notebook.

![Variable Explorer](images/jupyter/variable-explorer-01.png)

![Variable Explorer](images/jupyter/variable-explorer-02.png)

### Data Viewer
### Working with data

For additional information about your variables, you can also double-click a row or use the **Show variable in data viewer** button next to the variable for a more detailed view of a variable in the Data Viewer.
For additional information about your variables, you can also double-click a row or use the **View data** button next to the variable for a more detailed view of a variable. If you do not have a data viewing extension installed, clicking the button will bring up a list of recommended data viewing extensions like [Data Wrangler](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler).

![Data Viewer](images/jupyter/data-viewer.png)
The Data Wrangler extension offers a rich user interface to show insights about your data and helps you perform data profiling, quality checks, transformations, and more. As you manipulate your data, Data Wrangler will automatically generate the Python code required to perform the transformation. Learn more about the [Data Wrangler extension in our docs](/docs/datascience/data-wrangler.md).

Alternatively, you can use the data viewing experience offered by other extensions like [Data Wrangler](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler). The Data Wrangler extension offers a rich user interface to show insights about your data and helps you perform data profiling, quality checks, transformations, and more. Learn more about the [Data Wrangler extension in our docs](/docs/datascience/data-wrangler.md).
![Data Wrangler](images/jupyter/data-wrangler.gif)

### Filtering rows
### Filtering/sorting rows

Filtering rows in the data viewer can be done by typing in the textbox at the top of each column. Type a string you want to search for and any row that has that string in the column will be found:
Filtering and sorting rows in Data Wrangler can be done by selecting the column header of the column you would like to apply the filter/sort on, and selecting the operation respectively. As you type into the filter box, Data Wrangler will automatically update the data in view to match your filter conditions in real-time.

![Data Viewer](images/jupyter/filter-default.png)
![Data Wrangler Filter](images/jupyter/data-wrangler-filter.png)

If you want to find an exact match, prefix your filter with '=':
More complex filtering, including [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) can be done by changing Data Wrangler from viewing to editing mode.

![Data Viewer](images/jupyter/filter-exact.png)

More complex filtering can be done by typing a [regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions):

![Data Viewer](images/jupyter/filter-regex.png)
![Data Wrangler Edit](images/jupyter/data-wrangler-edit.png)

## Saving plots

Expand Down