Preface

Highlight

There is a wealth of cookbook-style resources available for D3.js visualisations, meaning you can create some interesting visualisations by copying some code and passing in your data. However, what this book aims to be is a practical journey through the many components of D3.js. By the end of this book, we want to be able to create new visualisations from the ground up and modify the behaviour of existing ones.

Preface

D3.js (Data-Driven Documents) is a JavaScript library for manipulating documents based on data1. On its own, D3.js is powerful enough to let us achieve almost anything with regards to visualisation. However, it's often the case that people find D3.js either too hard to use or that requires too many lines of code to produce something simple, e.g. a bar chart.

Using the bar chart example, we can find two D3.js code examples with the search term "d3js bar chart". The first hit is named "Simple d3.js bar chart" and dated 12-May-2020. Data aside, the full document (including HTML and JavaScript) comes in at 98 lines, and it produces the following:

Another example2 is from the D3.js gallery, it's named "Bar Chart" and dated 15-Nov-2017. It comes in at just over 50 lines excluding HTML, it's hosted on Observable (a platform for exploring and visualising data) so those expecting to copy/paste code to get something running locally may struggle, and it produces the following:

The same people may use examples from other libraries to support their claim. Let's look towards bar charts with Plotly3, where we'll find that, data aside, the example demonstrates the output of a bar chart with a single line of JavaScript. Opening up the sample example in code pen to include the HTML brings it to less than 10 lines, and it procues the following:

Data aside, with only a few more lines of code Plotly can produce the following:

Even with this limited example, it's easy to see where people are coming from when they complain about D3.js and its difficulty.

So why would anyone want to waste their time with D3.js? The answer depends on the context. If all you want to do is create a pie, line, bar, or area chart then it's likely D3.js is overkill - especially if you don't already know how to use it. It will take you far longer to create your charts with D3.js, and you'd likely have a better time using something like Plotly, matplotlib, Highcharts, Google chart tools, Chart.js, Tableau, Microsoft Excel, and so on. Some of these libraries are even built on top of D3.js, e.g. Plotly which has the following introductory sentence in its bar chart documentation "How to make a D3.js-based bar chart in javascript."3.

But what happens when we want to create something that's more than just a {simple|basic|standard|normal} data visualisation? Perhaps we need:

  • To work with HTML, SVG, and CSS.
  • Full control over the aesthetics of visualisations, not just changing colours, font-size, etc.
  • Full control over the behaviour of visualisations, enabling more than just mouse over tooltips and brushing.
  • To create an entirely new and unique type of visualisation.

In which case, D3.js will often be the most feasible option. D3.js sells itself as a library for manipulating documents based on data, and whilst it does have helpful components that make charting possible, it can be used for much more.

Let's take a look at some examples of what we can do with D3.js that we can't do with the aforementioned alternatives.

In the first example made with D3.js, we can see an animation of a colour changing circle moving up along a sloped path before falling back down with a bounce.

It's a fairly simple example and one that we'll look at closely later in the book, but already we can see that we've created something that we couldn't have if we were to use one of the alternatives. There is of course the point that it isn't a data visualisation - but perhaps it could be!

In the second example made with D3.js, we have an interactive Chord diagram illustrating the relationships between two features, complete with rich popups on mouse hover.

Chord Diagram

With these examples, it's easy to see what we can achieve with D3.js and where it can add value.

There is a wealth of cookbook-style resources available for D3.js visualisations, meaning you can create some interesting visualisations by copying some code and passing in your data. However, what this book aims to be is a practical journey through the many components of D3.js. Many sections will build on the last, and new component features will be introduced in small increments rather than being buried amongst a bunch of other newly introduced code. By the end of this book, we want to be able to create new visualisations from the ground up and modify the behaviour of existing ones.

Note

I aim to generate everything in this book through code. This means you will see the code for all my figures and tables, including things like flowcharts. Many of the visualisations are animated and interactive, so it's recommended that you generate the output using the code listings.

This book is currently available in early access form. It is being actively worked on and updated.

Every section is intended to be independent, so you will find some repetition as you progress from one section to another.


  1. M. Bostock. Data-Driven Documents, https://d3js.org/. 

  2. M. Bostock. Bar Chart, https://observablehq.com/@d3/bar-chart. 

  3. Plotly. Bar Charts in JavaScript, https://plotly.com/javascript/bar-charts/. 

Theme Purple Please for Jupyter Lab

Introduction

I put together this theme, theme-purple-please, for when I'm working with Python and Rust in Jupyter Lab. It currently supports Jupyter Lab 1 and 2.

Figure 1 - A Jupyter Notebook being edited within Jupyter Lab.
Theme from https://github.com/shahinrostami/theme-purple-please

You may have also seen it used in screenshots from the following books:

Installation through Jupyter Lab

You can install it through the Jupyter Lab Extension Manager UI, or with the following command:

jupyter labextension install @shahinrostami/theme-purple-please

GitHub Repository

You can navigate and download the source code at https://github.com/shahinrostami/theme-purple-please.

npm Package

The theme comes as an npm package at https://www.npmjs.com/package/@shahinrostami/theme-purple-please, where you can check out usage statistics and dependencies.

Box Plots at the Olympics

Preamble

In [2]:
:dep darn = {version = "0.1.15"}
:dep ndarray = {version = "0.13.1"}
:dep itertools = {version = "0.9.0"}
:dep plotly = {version = "0.4.0"}
extern crate ndarray;

use ndarray::prelude::*;
use std::str::FromStr;
use itertools::Itertools;
use plotly::{Plot, Layout, BoxPlot};
use plotly::common::{Title, Font};
use plotly::layout::{Margin, Axis};

Introduction

In this section, we're going to use 120 years of Olympic history to create two visualisations. Let's set our sights on something that illustrates the age and height in athletes grouped by the different Olympic games.

Basketball cat

The Dataset

We'll use the 120 years of Olympic history: athletes and results dataset, which we'll download and load with the darn crate. You're also welcome to use the mirrored that has been used in the following cell.

In [3]:
let data = darn::read_csv("https://shahinrostami.com/datasets/athlete_events_known_age.csv");

We'll take a peek at what we've downloaded to make sure there were no issues with the loading.

In [4]:
darn::show_frame(&data.0, Some(&data.1));
Out[4]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
"1" "A Dijiang" "M" "24" "180" "80" "China" "CHN" "1992 Summer" "1992" "Summer" "Barcelona" "Basketball" "Basketball Men\'s Basketball" "NA"
"2" "A Lamusi" "M" "23" "170" "60" "China" "CHN" "2012 Summer" "2012" "Summer" "London" "Judo" "Judo Men\'s Extra-Lightweight" "NA"
"5" "Christine Jacoba Aaftink" "F" "21" "185" "82" "Netherlands" "NED" "1988 Winter" "1988" "Winter" "Calgary" "Speed Skating" "Speed Skating Women\'s 500 metres" "NA"
"5" "Christine Jacoba Aaftink" "F" "21" "185" "82" "Netherlands" "NED" "1988 Winter" "1988" "Winter" "Calgary" "Speed Skating" "Speed Skating Women\'s 1,000 metres" "NA"
"5" "Christine Jacoba Aaftink" "F" "25" "185" "82" "Netherlands" "NED" "1992 Winter" "1992" "Winter" "Albertville" "Speed Skating" "Speed Skating Women\'s 500 metres" "NA"
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
"135569" "Andrzej ya" "M" "29" "179" "89" "Poland-1" "POL" "1976 Winter" "1976" "Winter" "Innsbruck" "Luge" "Luge Mixed (Men)\'s Doubles" "NA"
"135570" "Piotr ya" "M" "27" "176" "59" "Poland" "POL" "2014 Winter" "2014" "Winter" "Sochi" "Ski Jumping" "Ski Jumping Men\'s Large Hill, Individual" "NA"
"135570" "Piotr ya" "M" "27" "176" "59" "Poland" "POL" "2014 Winter" "2014" "Winter" "Sochi" "Ski Jumping" "Ski Jumping Men\'s Large Hill, Team" "NA"
"135571" "Tomasz Ireneusz ya" "M" "30" "185" "96" "Poland" "POL" "1998 Winter" "1998" "Winter" "Nagano" "Bobsleigh" "Bobsleigh Men\'s Four" "NA"
"135571" "Tomasz Ireneusz ya" "M" "34" "185" "96" "Poland" "POL" "2002 Winter" "2002" "Winter" "Salt Lake City" "Bobsleigh" "Bobsleigh Men\'s Four" "NA"

It looks like the data was loaded without any issues.

Data Wrangling

Let's assign the feature data to games and feature names to headers for readability.

In [5]:
let games = data.0;
let headers = data.1;

A quick look at the available features will give us the feature names we're after for the age and height of athletes.

In [6]:
println!("{}", &headers.iter().format("\n"));
ID
Name
Sex
Age
Height
Weight

We've confirmed that the two features we're after are named Age and Height, and that they're at index $3$ and $4$. However, it would be better to determine these indices programmatically instead of hard-coding them.

In [7]:
let idx_age = headers.iter().position(|x| x == "Age").unwrap();
let idx_height = headers.iter().position(|x| x == "Height").unwrap();
Team
NOC
Games
Year
Season
City
Sport
Event
Medal

Let's create an array of these indices and print them out to check.

In [8]:
let selected_features = [idx_age,idx_height];

println!("{}",selected_features.iter().format("\n"));
3
4

Now that we know the index of our age and height columns, let's prepare two collection variables, one named features to hold the numeric feature data, and one named feature_headers to hold the corresponding column names.

In [9]:
let mut features: Array2::<f32> =  Array2::<f32>::zeros((games.shape()[0],0));
let mut feature_headers = Vec::<String>::new();

Now, we can copy and parse our feature data into initialised collections.

In [10]:
for &feature_index in selected_features.iter() {
    feature_headers.push(headers[feature_index].clone());
    features = ndarray::stack![Axis(1), features,
        games.column(feature_index as usize)
            .mapv(|elem| elem.parse::<f32>().unwrap())
            .insert_axis(Axis(1))
    ];
};

We'll take a peek to make sure there were no obvious issues with parsing.

In [11]:
darn::show_frame(&features, Some(&feature_headers));
Out[11]:
Age Height
24.0 180.0
23.0 170.0
21.0 185.0
21.0 185.0
25.0 185.0
... ...
29.0 179.0
27.0 176.0
27.0 176.0
30.0 185.0
34.0 185.0

Looking good. Next, we'll need to determine the different games available in our dataset - we'll be using these to group the age and height data.

In [12]:
let idx_sport = headers.iter().position(|x| x == "Sport").unwrap();
let unique_games = games.column(idx_sport).iter().cloned().unique().collect_vec();

println!("{}",unique_games.iter().format(", "));
Basketball, Judo, Speed Skating, Cross Country Skiing, Athletics, Ice Hockey, Badminton, Sailing, Biathlon, Gymnastics, Alpine Skiing, Handball, Weightlifting, Wrestling, Luge, Rowing, Bobsleigh, Swimming, Football, Equestrianism, Shooting, Taekwondo, Boxing, Fencing, Diving, Canoeing, Water Polo, Tennis, Cycling, Hockey, Figure Skating, Softball, Archery, Volleyball, Synchronized Swimming, Modern Pentathlon, Table Tennis, Nordic Combined, Baseball, Rhythmic Gymnastics, Freestyle Skiing, Rugby Sevens, Trampolining, Beach Volleyball, Triathlon, Ski Jumping, Curling, Golf, Snowboarding, Short Track Speed Skating, Skeleton, Rugby, Art Competitions, Tug-Of-War

We now have the unique list of Olympic games - some of which you may not even have heard of!

Visualising the Data

Now that we have prepared our data, let's use all of our hard work in a box plot test.

Height of Athletes in Basketball

Let's see if we can create a box plot for the height of athletes in Basketball. To do so, we're going to build a list of row indices that correspond to Basketball data.

In [13]:
let mut count = -1;
let mut indices = Vec::<usize>::new();

let mask = games.column(idx_sport).map(|elem| {
    count += 1;    
    if(elem == "Basketball") { indices.push(count as usize) };
    elem == "Basketball"
    }
);

Then, we'll use these indices to select from our feature data.

In [14]:
let basketball = features.select(Axis(0), &indices);

We'll take a peek to make sure there were no obvious issues with parsing.

In [15]:
darn::show_frame(&basketball, Some(&feature_headers));
Out[15]:
Age Height
24.0 180.0
19.0 185.0
29.0 195.0
25.0 189.0
23.0 178.0
... ...
30.0 218.0
20.0 201.0
28.0 201.0
23.0 202.0
33.0 171.0

Finally, we'll create a box plot with just the height of the athletes in our dataset.

In [16]:
let mut plot = Plot::new();

let trace = BoxPlot::new(basketball.column(1).to_vec()).name("Basketball");

plot.add_trace(trace);

darn::show_plot(plot);
Out[16]:

Looking good.

Athlete Height Grouped by Olympic Games

Now let's do the same as what we've just done for Basketball, but apply it to all the games in our dataset.

In [42]:
let mut plot = Plot::new();
let layout = Layout::new()
    .title(Title::new("Athlete height grouped by Olympic games."))
    .margin(Margin::new().left(30).right(0).bottom(140).top(40))
    .xaxis(Axis::new().show_grid(true).tick_font(Font::new().size(10)))
    .show_legend(false);

plot.set_layout(layout);

for name in unique_games.iter() {
    let mut count = -1;
    let mut indices = Vec::<usize>::new();
    let mask = games.column(idx_sport).map(|elem| {
        count += 1;    
        if(elem == name) { indices.push(count as usize) };
        elem == "name"
        }
    );

    let game = features.select(Axis(0), &indices);
    let trace1 = BoxPlot::new(game.column(1).to_vec()).name(name);
    plot.add_trace(trace1);
};

darn::show_plot(plot);
Out[42]: