Introduction to Datashader and Plotly

Datashader

Datashader is a graphics pipeline system used to visualize large datasets quickly. Datashader breaks up the entire visualization process into a series of steps and then either distributes the computation-intensive workload across CPU cores using Dask or GPUs using CUDA. This approach allows Datashader to explore extremely large datasets even on standard hardware.

The process behind Datashader's extreme pipeline efficiency can be found here for curious readers but will not be necessary to complete this lab.

Before moving forward, take some time to glace through the official documentation here to get a better understanding of what Datashader offers.

Let's import the data we'll be using for this lab. You can download the Lab9-Data.zip file from here. The one CSV contains Uber pickup data from April to September 2014 and originates from here. The other contains white wine quality data and originates from here.

Let's first explore the Uber data a bit. We will first read our data as a pandas dataframe.

Now we see the general layout of the data, how many datapoints are there, and how many different Base types are there? How many of the Ubers are associated with each base?

Datashader Canvas

We want to get a rough understanding of the spatial distribution of all the Uber locations. Which columns would we need, and which Canvas function should we use?

Datashader also has the ability to change the color scheme of displayed graph visualizations with in-house color palettes (colormaps). Changing the background and changing the color scheme can lead to some cool visualizations as well!

Plotly + Plotly Dash

Plotly.py (Plotly) is a Python library that provides users with the ability to create interactive web-based visualizations from their data. Plotly offers a wide variety of chart types, and visualizations can be displayed in Jupyter notebooks, saved as HTML files, or utilized in Python-built web applications built with Dash.

Plotly Dash is a framework that is written on top of Flash, Plotly, and React that specializes in building visualization tools. Dash apps are rendered in the web browser, allowing them to be deployed on websites. More on Dash can be read here.

In this lab, we will be using Plotly Express to build our graphs as it is the easy-to-use, high-level interface to Plotly. Plotly Express helps streamline the visualization process, so you can spend more time analyzing your data. After we finish building our graphs, we will then deploy them through Plotly Dash.

You can click here to see all the different types of graphs Plotly has to offer, ranging from 3D charts to maps.

Let's now move on to the white wine data. We will first read our data as a pandas dataframe.

Let's visualize our white wine data. First create a scatter plot that visualizes the dataset by plotting the pH level of each datapoint on the x-axis and the alcohol level on the y-axis. We want to be able to distinguish the good wine from the bad wine, so color the datapoints in the scatter plot based on quality. We also want the better wine to have a larger size on the scatter plot.

Next, we want to make an adjustment to our dataframe by adding information that describes how the current wine's quality compares to the overall average quality.

Plotly also includes many different predefined color palettes that can help make your visualizations prettier or stick to a common theme. In the next part, we will create a scatter plot that utilizes one such color scale. You can see all the different types of color scales that are prebuilt by Plotly as well as how to use them here.

Besides scatter plots, Plotly offers many other of visualization tools. A complete list of all the functionality available to the Plotly toolkit as well as examples for each one can be found at Plotly's website. Let's try creating a histogram using our data.

Now that we have created three graphs using our wine dataset, let's deploy them through our web browser. Dash apps are composed of two portions, the first portion defines the layout of the app, and the second portion defines the interactivity of the app. In this lab, we will only be working with the first portion, so if you want to read up on how Dash allows for customizable user-interactivity, click on this link.

The process of creating and styling the layout of Dash apps can be found here.