Skip to content

19thyneb/lux

 
 

Repository files navigation

A Python API for Intelligent Visual Discovery

Build Status PyPI version Documentation Status

Lux is a Python library that makes data science easier by automating certain aspects of the data exploration process. Lux is designed to facilitate faster experimentation with data, even when the user does not have a clear idea of what they are looking for. Lux is integrated with an interactive Jupyter widget that allows users to quickly browse through large collections of data directly within their Jupyter notebooks.

Here are some slides from a talk on Lux.

Getting Started

To start using Lux, simply add an additional import statement alongside your Pandas import.

import lux
import pandas as pd

Then, Lux can be used as-is, without modifying any of your existing Pandas code. Here, we use Pandas's read_csv command to load in a dataset of colleges and their properties.

    df = pd.read_csv("college.csv")
    df

Basic recommendations in Lux

Voila! Here's a set of visualizations that you can now use to explore your dataset further!

Next-step recommendations based on user context:

In addition to dataframe visualizations at every step in the exploration, you can specify in Lux the attributes and values you're interested in. Based on this context information, Lux guides users towards potential next-steps in their exploration.

For example, we might be interested in the attributes AverageCost and SATAverage.

    df.set_context(["AverageCost","SATAverage"])
    df

Next-step Recommendations Based on User Context

The left-hand side of the widget shows the Current View, which corresponds to the visualization based on what the user is interested in. On the right, Lux generates three sets of recommendations, organized as separate tabs on the widget:

  • Enhance adds an additional attribute to the current selection, essentially highlighting how additional variables affect the relationship of AverageCost and SATAverage. We see that if we breakdown the relationship by FundingModel, there is a clear separation between public colleges (shown in red) and private colleges (in blue), with public colleges being cheaper to attend and with SAT average of lower than 1400.
  • Filter adds a filter to the current selection, while keeping attributes (on the X and Y axes) fixed. These visualizations shows how the relationship of AverageCost and SATAverage changes for different subsets of data. For instance, we see that colleges that offer Bachelor's degree as its highest degree offered shows a roughly linear trend between the two variables.
  • Generalize removes an attribute to display a more general trend, showing the distributions of AverageCost and SATAverage on its own. From the AverageCost histogram, we see that there are many colleges with average cost of around $20000 per year, corresponding to the bulge we see in the scatterplot view.

See this page more information on additional ways for specifying the context.

Easy programmatic access of exported visualization objects:

Now that we have found some interesting visualizations through Lux, we might be interested in digging into these visualizations a bit more. We can click on one or more visualizations to be exported, so we can programmatically access these visualizations further in Jupyter. Visualizations are represented as View objects in Lux. These View objects can be translated into Altair or VegaLite code, so that we can further edit these visualizations.

Easily exportable visualization object

Quick, on-demand visualizations with the help of automatic encoding:

We've seen how Views are automatically generated as part of the recommendations, users can also create their own View via the same syntax as specifying the context. Lux is built on the philosophy that users should always be able to visualize anything they want, without having to think about how the visualization should look like. Lux automatically determines the mark and channel mappings based on a set of best practices from Tableau. The visualizations are rendered via Altair into Vega-Lite specifications.

    from lux.view.View import View
    newEnglandCost = View(["Region=New England","MedianEarnings"])
    newEnglandCost.load(df)

Specified Visualization

Powerful language for working with collections of visualizations:

Lux provides a powerful abstraction for working with collections of visualizations based on a partially specified queries. Users can provide a list or a wildcard to iterate over combinations of filter or attribute values and quickly browse through large numbers of visualizations. The partial specification is inspired by existing work on query languages for visualization languages, including ZQL and CompassQL.

For example, we might be interested in looking at how the AverageCost distribution differs across different Regions.

    from lux.view.ViewCollection import ViewCollection
    differentRegions = ViewCollection(["Region=?","AverageCost"])
    differentRegions.load(df)

Example View Collection

To find out more about other features in Lux, see the complete documentation on ReadTheDocs.

Quick Installation

Install the Python Lux API through PyPI:

pip install lux-api

Install the Lux Jupyter widget through npm:

npm i lux-widget

See the installation page for more information.

Lux is undergoing active development. Please report any bugs, issues, or requests through Github Issues.

About

Python API for Intelligent Visual Data Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Other 0.2%