Datashader is a graphics pipeline system used to visualize large datasets quickly. Datashader breaks up the entire visualization process into a series of steps and then either distributes the computation-intensive workload across CPU cores using Dask or GPUs using CUDA. This approach allows Datashader to explore extremely large datasets even on standard hardware.
The process behind Datashader's extreme pipeline efficiency can be found here for curious readers but will not be necessary to complete this lab.
import datashader as ds
import pandas as pd
import colorcet as cc
Before moving forward, take some time to glace through the official documentation here to get a better understanding of what Datashader offers.
Let's import the data we'll be using for this lab. You can download the Lab9-Data.zip file from here. The one CSV contains Uber pickup data from April to September 2014 and originates from here. The other contains white wine quality data and originates from here.
Let's first explore the Uber data a bit. We will first read our data as a pandas dataframe.
# Part 1
# Convert the Uber data into a Pandas dataframe
Now we see the general layout of the data, how many datapoints are there, and how many different Base types are there? How many of the Ubers are associated with each base?
# Part 2
# Find the total datapoints in our dataframe as well as the number of points associated with each unique value
# within the Base column.
We want to get a rough understanding of the spatial distribution of all the Uber locations. Which columns would we need, and which Canvas function should we use?
# Part 3
# Project the Lon and Lat field of the entire dataframe onto the x and y dimensions of a Datashader Canvas
# and display the Canvas.
# Part 4
# Project the Lon and Lat field of Ubers from the first Base (this can vary depending on how you arrange the values)
# onto the x and y dimensions of a Datashader Canvas and display the Canvas.
# Part 5
# Project the Lon and Lat field of Ubers from the second Base (this can vary depending on how you arrange the values)
# onto the x and y dimensions of a Datashader Canvas and display the Canvas.
# Part 6
# Project the Lon and Lat field of Ubers from the first Base (this can vary depending on how you arrange the values)
# onto the x and y dimensions of a Datashader Canvas and display the Canvas.
# Part 7
# Project the Lon and Lat field of Ubers from the first Base (this can vary depending on how you arrange the values)
# onto the x and y dimensions of a Datashader Canvas and display the Canvas.
# Part 8
# Project the Lon and Lat field of Ubers from the first Base (this can vary depending on how you arrange the values)
# onto the x and y dimensions of a Datashader Canvas and display the Canvas.
Datashader also has the ability to change the color scheme of displayed graph visualizations with in-house color palettes (colormaps). Changing the background and changing the color scheme can lead to some cool visualizations as well!
# Part 9
# Project the Lon and Lat field of the entire dataframe onto the x and y dimensions of a Datashader Canvas
# and display the Canvas. Set the background to black and the shade scheme to 'cc.fire' in order to create a
# new color scheme that is different than the regular white and blue scheme.
Plotly.py (Plotly) is a Python library that provides users with the ability to create interactive web-based visualizations from their data. Plotly offers a wide variety of chart types, and visualizations can be displayed in Jupyter notebooks, saved as HTML files, or utilized in Python-built web applications built with Dash.
Plotly Dash is a framework that is written on top of Flash, Plotly, and React that specializes in building visualization tools. Dash apps are rendered in the web browser, allowing them to be deployed on websites. More on Dash can be read here.
In this lab, we will be using Plotly Express to build our graphs as it is the easy-to-use, high-level interface to Plotly. Plotly Express helps streamline the visualization process, so you can spend more time analyzing your data. After we finish building our graphs, we will then deploy them through Plotly Dash.
You can click here to see all the different types of graphs Plotly has to offer, ranging from 3D charts to maps.
import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
Let's now move on to the white wine data. We will first read our data as a pandas dataframe.
# Part 10
# Convert the wine data into a Pandas dataframe
Let's visualize our white wine data. First create a scatter plot that visualizes the dataset by plotting the pH level of each datapoint on the x-axis and the alcohol level on the y-axis. We want to be able to distinguish the good wine from the bad wine, so color the datapoints in the scatter plot based on quality. We also want the better wine to have a larger size on the scatter plot.
# Part 11
# Create a scatter plot that takes in the entire dataset and has the pH level on the x-axis and the alcohol level
# on the y-axis. Color the points on the scatter plot based on the quality of each wine.
# Update the graph so marker size is determined based on the quality of the respective datapoint.
# Display the graph.
Next, we want to make an adjustment to our dataframe by adding information that describes how the current wine's quality compares to the overall average quality.
# Part 12
# Update the dataframe so that there is a column that contains the different between the current point's quality value
# and the average quality value of the entire dataset.
# For example, if the current wine has a quality of 10 and the average quality is 8, then the value for this column would
# be 2.
Plotly also includes many different predefined color palettes that can help make your visualizations prettier or stick to a common theme. In the next part, we will create a scatter plot that utilizes one such color scale. You can see all the different types of color scales that are prebuilt by Plotly as well as how to use them here.
# Part 13
# Create a scatter plot that takes in the entire dataset and has the chlorides level on the x-axis and the sulphates level
# on the y-axis. Color the points on the scatter plot based on the quality difference (column you created in the last step),
# and set the color_continous_scale parameter to 'RdBu'.
# Display the graph.
Besides scatter plots, Plotly offers many other of visualization tools. A complete list of all the functionality available to the Plotly toolkit as well as examples for each one can be found at Plotly's website. Let's try creating a histogram using our data.
# Part 14
# Create a histogram plot that takes in the entire dataset and displays the number of wines that fall into each quality
# category.
# Display the graph.
Now that we have created three graphs using our wine dataset, let's deploy them through our web browser. Dash apps are composed of two portions, the first portion defines the layout of the app, and the second portion defines the interactivity of the app. In this lab, we will only be working with the first portion, so if you want to read up on how Dash allows for customizable user-interactivity, click on this link.
The process of creating and styling the layout of Dash apps can be found here.
# Part 15
# Create a Dash app that displays all three different Plotly express graphs created above.
app = dash.Dash(__name__)
app.layout = html.Div(children=[
# Add first graph here
# Add second graph here
# Add third graph here
])
if __name__ == '__main__':
app.run_server(debug=True, use_reloader=False)