Arabica Coffee Beans - Origin and Variety

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

The Dataset

In [3]:
data_url = 'https://datacrayon.com/datasets/arabica_data.csv'
data = pd.read_csv(data_url)
data.head()
Out[3]:
Unnamed: 0 Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
0 1 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 0 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0
1 2 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 1 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0
2 3 Arabica grounds for health admin Guatemala san marcos barrancas "san cristobal cuch NaN NaN NaN NaN 1600 - 1800 m ... NaN 0 May 31st, 2011 Specialty Coffee Association 36d0d00a3724338ba7937c52a378d085f2172daa 0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660 m 1600.0 1800.0 1700.0
3 4 Arabica yidnekachew dabessa Ethiopia yidnekachew dabessa coffee plantation NaN wolensu NaN yidnekachew debessa coffee plantation 1800-2200 ... Green 2 March 25th, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1800.0 2200.0 2000.0
4 5 Arabica metad plc Ethiopia metad plc NaN metad plc 2014/2015 metad agricultural developmet plc 1950-2200 ... Green 2 April 3rd, 2016 METAD Agricultural Development plc 309fcf77415a3661ae83e027f7e5f05dad786e44 19fef5a731de2db57d16da10287413f5f99bc2dd m 1950.0 2200.0 2075.0

5 rows × 44 columns

In [4]:
data['Country.of.Origin'] = data['Country.of.Origin'].replace('United States (Hawaii)','Hawaii')
data['Country.of.Origin'] = data['Country.of.Origin'].replace('Tanzania, United Republic Of','Tanzania')
In [5]:
data = data[data.Variety != 'Other']
data = data[data['Variety'].notna()]
In [6]:
data = data[data['Country.of.Origin'].isin(list(data['Country.of.Origin'].value_counts()[:12].index))]
data = data[data['Variety'].isin(list(data['Variety'].value_counts()[:12].index))]

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [7]:
data.shape
Out[7]:
(889, 44)

Perfect, that's exactly what we were expecting.

Data Wrangling

So let's select just these two columns and work with a list containing only them as we move forward.

In [8]:
species_personality = pd.DataFrame(data[['Country.of.Origin', 'Variety']].values).dropna().astype(str)
species_personality
Out[8]:
0 1
0 Guatemala Bourbon
1 China Catimor
2 Costa Rica Caturra
3 Brazil Bourbon
4 Uganda SL14
... ... ...
884 Honduras Catuai
885 Honduras Catuai
886 Mexico Bourbon
887 Guatemala Catuai
888 Honduras Caturra

889 rows × 2 columns

In [9]:
species_personality = species_personality.dropna()

Now for the names of our types.

In [10]:
left = list(data['Country.of.Origin'].value_counts().index)[::-1]

pd.DataFrame(left)
Out[10]:
0
0 China
1 Kenya
2 El Salvador
3 Uganda
4 Hawaii
5 Costa Rica
6 Honduras
7 Taiwan
8 Brazil
9 Colombia
10 Guatemala
11 Mexico
In [11]:
right = list(data['Variety'].value_counts().index)
pd.DataFrame(right)
Out[11]:
0
0 Caturra
1 Bourbon
2 Typica
3 Catuai
4 Hawaiian Kona
5 Yellow Bourbon
6 Mundo Novo
7 SL14
8 SL28
9 Catimor
10 Pacas
11 Pacamara

Which we can now use to create the matrix.

In [12]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [13]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [14]:
for x in species_personality:
    d.at[x[0], x[1]] += 1
In [ ]:
 

Chord Diagram

Time to visualise the co-occurrence of items using a chord diagram. We are going to use a list of custom colours that represent the items.

In [15]:
colors =[
    "#ff575c","#ff914d","#ffca38","#f2fa00","#C3F500","#94f000","#00fa68","#00C1A2","#0087db","#0054f0","#5d00e0","#2F06EB",
    
"#6f1d1b","#955939","#A87748","#bb9457","#7f5e38","#432818","#6e4021","#99582a","#cc9f69","#755939","#BAA070","#ffe6a7"]
In [16]:
names = left + right

Finally, we can put it all together.
In [17]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False, padding=0.05,
      margin=40, font_size="12px",font_size_large="18px",noun="coffee bean reviews", title="Coffee Bean Reviews - Variety and Origin Explorer",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.6, width=910, allow_download=True).show()
Chord Diagram
Download
In [18]:
#Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
#      margin=40, font_size_large=7,noun="coffee beans",
#        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()

Animated Transitions

Highlight

Much like d3.selection(), d3.transition() can be used to modify attributes and styles. The difference is that whilst d3.selection() applies the changes instantly, d3.transition() applies the changes gradually (and smoothly) over a specified duration.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

We've already had a look at how to draw SVG shapes with D3.js, but we may often find ourselves wanting to animate these shapes to bring them to life! We can achieve animation using transitions in D3.js1, which enables key-frame animations consisting of two key-frames: start and end. Let's demonstrate a transition by gradually increasing the radius of the below circle, making it appear as if it's growing.

The above circle is positioned at ${150,75}$ with a radius of $50$. Our transition will gradually increase the radius until it looks like the following circle.

By the end of the transition, the circle will have a radius of $75$.

A Container for the Output

This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id.

<div id="container"></div>

Note

The animation started on page load and lasted two seconds. If you missed it - you can refresh the page to see it in action again!

Creating an Empty SVG

We'll create a new detached <svg> element and use the returned selection throughout the rest of this section.

const svg = d3.create("svg");

Creating a Circle Element

Let's create our circle! We'll append the <circle> element to our selection of the <svg> element, and we'll use our starting coordinates ${150,75}$ and radius of ${50}$.

var circle = svg
    .append("circle")
    .attr("cx", 150)
    .attr("cy", 75)
    .attr("r", 50);

Animating the Circle Element

Much like d3.selection(), d3.transition() can be used to modify attributes and styles. The difference is that whilst d3.selection() applies the changes instantly, d3.transition() applies the changes gradually (and smoothly) over a specified duration.

Whilst we could use a d3.selection() to change the radius of our circle from its current value of $50$ to its target value of $75$ with the following:

circle.attr('r', 75);

We will instead be using a d3.transition() to do the same, but over a duration of $2$ seconds (or $2000$ milliseconds):

circle
    .transition()
    .duration(2000)
    .attr('r', 75);

We can see that we've invoked .transition() on the selection of our <circle> element, specified our transition duration with .duration(), and specified r as the transition to transition to a value of $75$.

Appending to the Container

Finally, let's append everything to our container.

d3
    .select("#container")
    .append(() => svg.node());

We can see the output by checking on our container with the corresponding id, which in this case is where id=container.

Conclusion

If we inspect the HTML, we will see the <svg> and <circle> elements have been added to the <div> where the id=container. We can also see that the <circle> element's r attribute has been set to $75$ after smoothly transitioning from $50$.

<div id="container">
  <svg>
    <circle cx="150" cy="75" r="75"></circle>
  </svg>
</div>

  1. M. Bostock. d3-transition: Animated transitions for D3 selections, https://github.com/d3/d3-transition. 

Grouping Elements

Highlight

The <g> SVG element is a container used to group other SVG elements. Transformations applied to the <g> element are performed on its child elements, and its attributes are inherited by its children. We can create a group element with D3.js by appending a g element using any selection.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

We may often find ourselves needing to group elements together, allowing us to apply transformations or set attributes that are inherited by all child elements of that group. One way to achieve this is to use the container SVG element, <g>, allowing us to arrange our elements into groups which can also be nested1. We can create these group elements using SVG directly, for example with the following.

<svg>
  <g fill="#40F99B">
    <circle cx="85" cy="75" r="50"></circle>
    <circle cx="215" cy="75" r="50"></circle>
  </g>
</svg>

Here we can see that we've changed the colour of both circles by setting the fill attribute on their parent group element, <g>.

But we can also do this with D3.js. In this section, we'll use D3.js to create a group containing two circle elements, and change their fill colour to #40F99B using the group element.

A Container for the Output

This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id.

<div id="container"></div>

Creating an Empty SVG

We'll create a new detached <svg> element and use the returned selection throughout the rest of this section.

const svg = d3.create("svg");

Creating a Group Element

The <g> SVG element is a container used to group other SVG elements. Transformations applied to the <g> element are performed on its child elements, and its attributes are inherited by its children. We can create a group element with D3.js by appending a g element using any selection.

var shapeGroup = svg.append("g");

Now our shapeGroup variable refers to our selection of the new <g> element.

Creating Elements within a Group

Let's create our circles! This time we're appending these <circle> elements to our new shapeGroup selection of our new group, rather than directly to the <svg> element.

shapeGroup
    .append("circle")
    .attr("cx", 85)
    .attr("cy", 75)
    .attr("r", 50);

shapeGroup
    .append("circle")
    .attr("cx", 215)
    .attr("cy", 75)
    .attr("r", 50);

Setting Group Attributes

Previously, we would set the fill attribute of both <circle> elements to change their colour. With groups, we can instead set the fill attribute of our parent group element, the selection of which is stored in our shapeGroup variable.

shapeGroup
    .attr("fill", "#40F99B");

That's all it takes to change the fill colour of any elements within our group.

Appending to the Container

Finally, let's append everything to our container.

d3
    .select("#container")
    .append(() => svg.node());

We can see the output by checking on our container with the corresponding id, which in this case is where id=container.

Conclusion

If we inspect the HTML, we will see the <svg>, <g>, and <circle> elements have been added to the <div> where the id=container. We can also see that the <g> element's fill colour has been set to #40F99B, which has been inherited by both circles in the output.

<div id="container">
  <svg>
    <g fill="#40F99B">
      <circle cx="85" cy="75" r="50"></circle>
      <circle cx="215" cy="75" r="50"></circle>
    </g>
  </svg>
</div>

  1. W3C. Grouping: the ‘g’ element, https://www.w3.org/TR/SVG/struct.html#Groups. 

Attributes and Styles

Highlight

We can set CSS style properties by invoking .style(name, value) on the selection, and set SVG attributes by invoking .attr(name, value) where the argument to the first parameter should be the name of the attribute we want to set, and the argument to the second parameter should be the value we want to set it to.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

A common way to modify an <svg> element is by setting its attributes. The W3C SVG specification defines two categories of attributes1: the regular category which includes the style attribute for styling with CSS, and the presentation category which includes the fill attribute for painting the interior of an element.

For example, we can change the colour of a circle by setting its fill attribute to #40F99B.

<svg>
  <circle cx="85" cy="75" r="50" fill="#40F99B"></circle>
</svg>

But we can also set these attributes with D3.js. In this section, we'll use D3.js to modify multiple attributes for our SVG elements.

A Container for the Output

This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id.

<div id="container"></div>

Creating an Empty SVG

We'll create a new detached <svg> element and use the returned selection throughout the rest of this section.

const svg = d3.create("svg");

Creating Elements and Setting Attributes

In previous sections, we've created <circle> elements by invoking d3.append(name) and passing in circle as the argument for the name parameter. This returned a selection which we then used to set the attributes. We can do this by invoking .attr(name, value) on the selection, where the argument to the first parameter should be the name of the attribute we want to set, and the argument to the second parameter should be the value we want to set it to.

Let's set the horizontal and vertical coordinates of our circle with the cx and cy attributes, respectively2. We'll also set the radius of our circle using the r attribute.

var circle = svg
    .append("circle")
    .attr("cx", 85)
    .attr("cy", 75)
    .attr("r", 50);

This time, we'll also change the colour of a circle by setting its fill attribute to #40F99B. We can do this using the selection stored in the circle variable.

circle
    .attr("fill", "#40F99B");

Let's do something similar, but this time for a rect element.

var rect = svg
    .append("rect")
    .attr("x", 165)
    .attr("y", 25)
    .attr("height", 100)
    .attr("width", 100)
    .attr("fill", "#420a91")
    .attr("stroke", "#FF00FF")
    .attr("stroke-width", "4")
    .attr("stroke-dasharray", "10,10");

Here we can see we've created a <rect> element and set its horizontal and vertical coordinates, its height and width, its fill colour, and its stroke. It's worth mentioning that the <rect> element is positioned using the x and y attributes from the top-left corner of the element by default 2, whereas the <circle> element is positioned using the cx and cy attributes from the centre of the element.

Styling with CSS and Setting Styles

We can set CSS style properties by invoking .style(name, value) on the selection, where the argument to the first parameter should be the name of the CSS style property we want to set, and the argument to the second parameter should be the value we want to set it to.

As an example, let's set the background-image CSS property of our <svg> element to a linear gradient going from #420a91 to #40F99B.

svg
    .style("background-image",
           "linear-gradient(to right, #420a91, #40F99B)");

Appending to the Container

Finally, let's append everything to our container.

d3
    .select("#container")
    .append(() => svg.node());

We can see the output by checking on our container with the corresponding id, which in this case is where id=container.

Conclusion

If we inspect the HTML, we will see the <svg>, <circle>, and <rect> elements have been added to the <div> where the id=container. We can also see that the <svg> element's background has been set to a linear-gradient using CSS, and the presentation of the <circle> and <rect> elements have been modified with several SVG attributes.

<div id="container">
  <svg style="background-image: linear-gradient(to right, rgb(66, 10, 145), rgb(64, 249, 155));">
    <circle cx="85" cy="75" r="50" fill="#40F99B"></circle>
    <rect x="165" y="25" height="100" width="100" fill="#420a91" 
          stroke="#FF00FF" stroke-width="4" stroke-dasharray="10,10"></rect>
  </svg>
</div>

In this section, we've demonstrated how to modify elements using attributes and styles using D3.js.


  1. W3C. Appendix G: Attribute Index, https://www.w3.org/TR/SVG/attindex.html. 

  2. W3C. The Circle Element, https://www.w3.org/TR/SVG/shapes.html#CircleElement. 

  3. W3C. The Rect Element, https://www.w3.org/TR/SVG/shapes.html#RectElement. 

Selections and Selecting Elements

Highlight

Besides the d3.create() and d3.append() functions which return selections, we can use the d3.select() and d3.selectAll() functions to return selections by matching a CSS selector.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

Selections are a D3.js concept that enables easy manipulation of the document object model (DOM), including attributes, styles, properties, HTML, and more. These manipulations can also be bound to data using data join1.

In this section, we're going to look at a few D3.js functions that return selections.

A Container for the Output

This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id.

<div id="container"></div>

Creating an Empty SVG and Storing the Selection

In previous sections, we've created a new and detached <svg> element and then used the returned selection throughout the remainder of the section itself.

Let's do the same again but discuss a few things that are happening when this function is invoked.

const svg = d3.create("svg");

The create(name) function expects the name parameter, so to create our <svg> element we'll need to pass in "svg" as our argument. We can see something is being returned and stored in our variable that we have also named svg. The something that's being returned is a single-element selection.

With this selection stored in svg we can start invoking the many supported functions that can manipulate the element itself.

For example, let's change the width of our <svg> element to 500. We can do this by using the selection stored in our svg variable, and invoking the .attr() function.

svg.attr("width", 500);

We'll come back to the .attr() function in more detail in future sections.

Appending an Element

The d3.append(name) function can be invoked on an existing selection to append a new element as the last child of that selection.

For example, we could use our selection stored in the svg variable and append a <circle> element.

var circle = svg
    .append("circle")
    .attr("cx", 50)
    .attr("cy", 50)
    .attr("r", 50);

The d3.append() function returns a selection referring to itself, which is how we can start changing the <circle> element's attributes by chaining the attr() function immediately aftwards. We've also stored the <circle> element's selection in our new variable, circle.

Selecting using CSS Selectors

Besides the d3.create() and d3.append() functions which return selections, we can use the d3.select() and d3.selectAll() functions to return selections by matching a CSS selector.

For example, we could select all <h2> elements in the current document with

allHeader2 = d3.selectAll("h2");

or we could select an element where id=container

container = d3.select("#container");

As we know that we've just appended a single <circle> element as a child of our <svg> element, we can safely use .select("circle") to get its selection. Although we already have its selection stored in the circle variable, let's use the .selection() function to demonstrate its usage, and change the circle's fill colour to purple.

svg
    .select("circle")
    .attr("fill", "#420a91");

Appending to the Container

Finally, let's append everything to our container.

d3
    .select("#container")
    .append(() => svg.node());

We can see the output by checking on our container with the corresponding id, which in this case is where id=container.

Conclusion

If we inspect the HTML, we will see the <svg> and <circle> elements have been added to the <div> where the id=container. We can also see that the <svg> element's width attribute has been modified, as has the <circle> element's fill colour attribute.

<div id="container">
  <svg width="500">
    <circle cx="50" cy="50" r="50" fill="#420a91"></circle>
  </svg>
</div>

In this section, we've already demonstrated how useful selections can be, and we haven't even touched on the manipulation of styles, properties, HTML, etc. We also haven't mentioned the things you can do with selections and data joins. We'll cover these in the following sections.


  1. M. Bostock. Joining Data, https://github.com/d3/d3-selection#joining-data. 

Creating Shape Elements

Highlight

To create a <circle> element with D3.js we can invoke the append(name) function on our svg selection and pass in the name of the element. In this case, we're passing in circle as our argument for the name parameter.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

The W3C SVG specification defines several basic shape elements such as rect, circle, and ellipse1. We can create these shape elements using SVG directly, for example with the following.

<svg>
  <circle cx="50" cy="50" r="50"></circle>
</svg>

But we can also create these with D3.js. In this section, we'll use D3.js to output a circle identical to the one above.

A Container for the Output

This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id.

<div id="container"></div>

Creating an Empty SVG

We'll create a new detached <svg> element and use the returned selection throughout the rest of this section.

const svg = d3.create("svg");

Creating a Circle Element

To create a <circle> element with D3.js we can invoke the d3.append(name) function on our svg selection and pass in the name of the element. In this case, we're passing in circle as our argument for the name parameter. We'll also specify the following geometry properties2:

  • the horizontal centre coordinate, cx;
  • the vertical centre coordinate, cy;
  • and the Radius, r.
svg
    .append("circle")
    .attr("cx", 50)
    .attr("cy", 50)
    .attr("r", 50);

We'll come back to selections, d3.append(name), and attr() in more detail in future sections.

We could do the same with other basic shape elements whilst specifying the appropriate geometry properties for each one. For example, the <rect> element's geometry properties include the horizontal radius, rx, and the vertical radius, ry. These two properties can be used to round off corners, i.e. creating a rectangle with rounded corners. We can check this by using SVG directly:

<svg>
  <rect x="100" y="25" width="100" height="100" rx="20" ry="20"></rect>
</svg>

Appending to the Container

Finally, let's append everything to our container.

d3
    .select("#container")
    .append(() => svg.node());

We can see the output by checking on our container with the corresponding id, which in this case is where id=container.

Conclusion

If we inspect the HTML, we'll see the <svg> element has been added to the <div> where the id=container.

<div id="container">
  <svg>
    <circle cx="50" cy="50" r="50"></circle>
  </svg>
</div>

With this, we're now able to add basic SVG shape elements to the DOM using D3.js.


  1. W3C. Basic Shapes, https://www.w3.org/TR/SVG/shapes.html. 

  2. W3C. Geometry Properties, https://www.w3.org/TR/2018/CR-SVG2-20181004/geometry.html. 

Creating an Empty SVG

Highlight

To create an element with D3.js we invoke the create(name) function and pass in the name of the element. In this case, we're passing in svg as our argument for the name parameter.

Preamble

Let's get access to the D3.js library so that we can begin. In this case, we'll be including the library using the HTML <script> tag.

<script src="https://d3js.org/d3.v6.js"></script>

Introduction

Now that we have access to the D3.js library, the very first thing we want to do is create our <svg> element. Typically we'd expect a "hello world" example as the first section, but this will be more similar to what I remember of my first OpenGL lesson which was focussed on creating a black window with nothing in it, shortly followed by a section on how to display a cube!

Scalable Vector Graphics

Let's quickly remind ourselves of what a Scalable Vector Graphic (SVG) is. It's an XML-based graphics markup language for describing two-dimensional graphics1. It targets the web and is generally supported by all modern web browsers, and it's right at home amongst within the DOM amongst HTML within the <svg> element.

Because SVG is a vector format, graphics can be resized without any loss in quality, unlikely raster graphics (e.g. JPEG, PNG, and GIF). This also means that file size remains identical whether you want to display your SVG at 200px or 2000px. It also supports interactivity, transparency, and animation.

Whilst we can create SVG visualisations for the browser by writing the markup directly, we're instead going to be using D3.js to manipulate the markup for us.

A Container for the Output

Throughout this book, we'll be creating many visualisations, and we need somewhere in the DOM to output them. In the cell below, we've created the container element where we plan to append our SVG element. This is where you will see the output of the code cells that follow it, provided they are referencing the corresponding id. This means that what you see below is a spoiler - it's this section's finished visualisation!

Note

You will see one or many of these containers in each section. For clarity and structure, they will be outlined with a dotted border.

Don't forget - the containers will show the finished output of the entire section, even if it appears earlier on!

This one will look empty because all we're outputting is an empty <svg> element.

<div id="container"></div>

Creating an Empty SVG

To create an element with D3.js we invoke the create(name) function and pass in the name of the element, in this case, we're passing in svg as our argument for the name parameter.

Our new <svg> element is currently detached from the DOM, meaning it hasn't been added to our HTML yet. We can still manipulate it as the create() function returns what's referred to as a selection, and we're storing that in our svg variable. We'll discuss selections in the next Section.

const svg = d3.create("svg");

Appending to the Container

We don't want to do anything fancy just yet, so let's just get our <svg> element into the DOM. We'll do this using the append() function.

d3.select("#container").append(() => svg.node());

Here we can see that we've selected our container from earlier with id=container. This returns a selection, which we've then used to invoke append() where we're passing in our svg element.

Conclusion

If we inspect the HTML, we'll see the <svg> element has been added to the <div> where the id=container. With this, we're ready to do more interesting things with D3.js and SVG!

<div id="container">
  <svg></svg>
</div>

  1. W3C. Scalable Vector Graphics (SVG), https://www.w3.org/Graphics/SVG/. 

Preface

Highlight

There is a wealth of cookbook-style resources available for D3.js visualisations, meaning you can create some interesting visualisations by copying some code and passing in your data. However, what this book aims to be is a practical journey through the many components of D3.js. By the end of this book, we want to be able to create new visualisations from the ground up and modify the behaviour of existing ones.

Preface

D3.js (Data-Driven Documents) is a JavaScript library for manipulating documents based on data1. On its own, D3.js is powerful enough to let us achieve almost anything with regards to visualisation. However, it's often the case that people find D3.js either too hard to use or that requires too many lines of code to produce something simple, e.g. a bar chart.

Using the bar chart example, we can find two D3.js code examples with the search term "d3js bar chart". The first hit is named "Simple d3.js bar chart" and dated 12-May-2020. Data aside, the full document (including HTML and JavaScript) comes in at 98 lines, and it produces the following:

Another example2 is from the D3.js gallery, it's named "Bar Chart" and dated 15-Nov-2017. It comes in at just over 50 lines excluding HTML, it's hosted on Observable (a platform for exploring and visualising data) so those expecting to copy/paste code to get something running locally may struggle, and it produces the following:

The same people may use examples from other libraries to support their claim. Let's look towards bar charts with Plotly3, where we'll find that, data aside, the example demonstrates the output of a bar chart with a single line of JavaScript. Opening up the sample example in code pen to include the HTML brings it to less than 10 lines, and it procues the following:

Data aside, with only a few more lines of code Plotly can produce the following:

Even with this limited example, it's easy to see where people are coming from when they complain about D3.js and its difficulty.

So why would anyone want to waste their time with D3.js? The answer depends on the context. If all you want to do is create a pie, line, bar, or area chart then it's likely D3.js is overkill - especially if you don't already know how to use it. It will take you far longer to create your charts with D3.js, and you'd likely have a better time using something like Plotly, matplotlib, Highcharts, Google chart tools, Chart.js, Tableau, Microsoft Excel, and so on. Some of these libraries are even built on top of D3.js, e.g. Plotly which has the following introductory sentence in its bar chart documentation "How to make a D3.js-based bar chart in javascript."3.

But what happens when we want to create something that's more than just a {simple|basic|standard|normal} data visualisation? Perhaps we need:

  • To work with HTML, SVG, and CSS.
  • Full control over the aesthetics of visualisations, not just changing colours, font-size, etc.
  • Full control over the behaviour of visualisations, enabling more than just mouse over tooltips and brushing.
  • To create an entirely new and unique type of visualisation.

In which case, D3.js will often be the most feasible option. D3.js sells itself as a library for manipulating documents based on data, and whilst it does have helpful components that make charting possible, it can be used for much more.

Let's take a look at some examples of what we can do with D3.js that we can't do with the aforementioned alternatives.

In the first example made with D3.js, we can see an animation of a colour changing circle moving up along a sloped path before falling back down with a bounce.

It's a fairly simple example and one that we'll look at closely later in the book, but already we can see that we've created something that we couldn't have if we were to use one of the alternatives. There is of course the point that it isn't a data visualisation - but perhaps it could be!

In the second example made with D3.js, we have an interactive Chord diagram illustrating the relationships between two features, complete with rich popups on mouse hover.

Chord Diagram

With these examples, it's easy to see what we can achieve with D3.js and where it can add value.

There is a wealth of cookbook-style resources available for D3.js visualisations, meaning you can create some interesting visualisations by copying some code and passing in your data. However, what this book aims to be is a practical journey through the many components of D3.js. Many sections will build on the last, and new component features will be introduced in small increments rather than being buried amongst a bunch of other newly introduced code. By the end of this book, we want to be able to create new visualisations from the ground up and modify the behaviour of existing ones.

Note

I aim to generate everything in this book through code. This means you will see the code for all my figures and tables, including things like flowcharts. Many of the visualisations are animated and interactive, so it's recommended that you generate the output using the code listings.

This book is currently available in early access form. It is being actively worked on and updated.

Every section is intended to be independent, so you will find some repetition as you progress from one section to another.


  1. M. Bostock. Data-Driven Documents, https://d3js.org/. 

  2. M. Bostock. Bar Chart, https://observablehq.com/@d3/bar-chart. 

  3. Plotly. Bar Charts in JavaScript, https://plotly.com/javascript/bar-charts/. 

Theme Purple Please for Jupyter Lab

Introduction

I put together this theme, theme-purple-please, for when I'm working with Python and Rust in Jupyter Lab. It currently supports Jupyter Lab 1 and 2.

Figure 1 - A Jupyter Notebook being edited within Jupyter Lab.
Theme from https://github.com/shahinrostami/theme-purple-please

You may have also seen it used in screenshots from the following books:

Installation through Jupyter Lab

You can install it through the Jupyter Lab Extension Manager UI, or with the following command:

jupyter labextension install @shahinrostami/theme-purple-please

GitHub Repository

You can navigate and download the source code at https://github.com/shahinrostami/theme-purple-please.

npm Package

The theme comes as an npm package at https://www.npmjs.com/package/@shahinrostami/theme-purple-please, where you can check out usage statistics and dependencies.

Box Plots at the Olympics

Preamble

In [2]:
:dep darn = {version = "0.1.15"}
:dep ndarray = {version = "0.13.1"}
:dep itertools = {version = "0.9.0"}
:dep plotly = {version = "0.4.0"}
extern crate ndarray;

use ndarray::prelude::*;
use std::str::FromStr;
use itertools::Itertools;
use plotly::{Plot, Layout, BoxPlot};
use plotly::common::{Title, Font};
use plotly::layout::{Margin, Axis};

Introduction

In this section, we're going to use 120 years of Olympic history to create two visualisations. Let's set our sights on something that illustrates the age and height in athletes grouped by the different Olympic games.

Basketball cat

The Dataset

We'll use the 120 years of Olympic history: athletes and results dataset, which we'll download and load with the darn crate. You're also welcome to use the mirrored that has been used in the following cell.

In [3]:
let data = darn::read_csv("https://shahinrostami.com/datasets/athlete_events_known_age.csv");

We'll take a peek at what we've downloaded to make sure there were no issues with the loading.

In [4]:
darn::show_frame(&data.0, Some(&data.1));
Out[4]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
"1" "A Dijiang" "M" "24" "180" "80" "China" "CHN" "1992 Summer" "1992" "Summer" "Barcelona" "Basketball" "Basketball Men\'s Basketball" "NA"
"2" "A Lamusi" "M" "23" "170" "60" "China" "CHN" "2012 Summer" "2012" "Summer" "London" "Judo" "Judo Men\'s Extra-Lightweight" "NA"
"5" "Christine Jacoba Aaftink" "F" "21" "185" "82" "Netherlands" "NED" "1988 Winter" "1988" "Winter" "Calgary" "Speed Skating" "Speed Skating Women\'s 500 metres" "NA"
"5" "Christine Jacoba Aaftink" "F" "21" "185" "82" "Netherlands" "NED" "1988 Winter" "1988" "Winter" "Calgary" "Speed Skating" "Speed Skating Women\'s 1,000 metres" "NA"
"5" "Christine Jacoba Aaftink" "F" "25" "185" "82" "Netherlands" "NED" "1992 Winter" "1992" "Winter" "Albertville" "Speed Skating" "Speed Skating Women\'s 500 metres" "NA"
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
"135569" "Andrzej ya" "M" "29" "179" "89" "Poland-1" "POL" "1976 Winter" "1976" "Winter" "Innsbruck" "Luge" "Luge Mixed (Men)\'s Doubles" "NA"
"135570" "Piotr ya" "M" "27" "176" "59" "Poland" "POL" "2014 Winter" "2014" "Winter" "Sochi" "Ski Jumping" "Ski Jumping Men\'s Large Hill, Individual" "NA"
"135570" "Piotr ya" "M" "27" "176" "59" "Poland" "POL" "2014 Winter" "2014" "Winter" "Sochi" "Ski Jumping" "Ski Jumping Men\'s Large Hill, Team" "NA"
"135571" "Tomasz Ireneusz ya" "M" "30" "185" "96" "Poland" "POL" "1998 Winter" "1998" "Winter" "Nagano" "Bobsleigh" "Bobsleigh Men\'s Four" "NA"
"135571" "Tomasz Ireneusz ya" "M" "34" "185" "96" "Poland" "POL" "2002 Winter" "2002" "Winter" "Salt Lake City" "Bobsleigh" "Bobsleigh Men\'s Four" "NA"

It looks like the data was loaded without any issues.

Data Wrangling

Let's assign the feature data to games and feature names to headers for readability.

In [5]:
let games = data.0;
let headers = data.1;

A quick look at the available features will give us the feature names we're after for the age and height of athletes.

In [6]:
println!("{}", &headers.iter().format("\n"));
ID
Name
Sex
Age
Height
Weight

We've confirmed that the two features we're after are named Age and Height, and that they're at index $3$ and $4$. However, it would be better to determine these indices programmatically instead of hard-coding them.

In [7]:
let idx_age = headers.iter().position(|x| x == "Age").unwrap();
let idx_height = headers.iter().position(|x| x == "Height").unwrap();
Team
NOC
Games
Year
Season
City
Sport
Event
Medal

Let's create an array of these indices and print them out to check.

In [8]:
let selected_features = [idx_age,idx_height];

println!("{}",selected_features.iter().format("\n"));
3
4

Now that we know the index of our age and height columns, let's prepare two collection variables, one named features to hold the numeric feature data, and one named feature_headers to hold the corresponding column names.

In [9]:
let mut features: Array2::<f32> =  Array2::<f32>::zeros((games.shape()[0],0));
let mut feature_headers = Vec::<String>::new();

Now, we can copy and parse our feature data into initialised collections.

In [10]:
for &feature_index in selected_features.iter() {
    feature_headers.push(headers[feature_index].clone());
    features = ndarray::stack![Axis(1), features,
        games.column(feature_index as usize)
            .mapv(|elem| elem.parse::<f32>().unwrap())
            .insert_axis(Axis(1))
    ];
};

We'll take a peek to make sure there were no obvious issues with parsing.

In [11]:
darn::show_frame(&features, Some(&feature_headers));
Out[11]:
Age Height
24.0 180.0
23.0 170.0
21.0 185.0
21.0 185.0
25.0 185.0
... ...
29.0 179.0
27.0 176.0
27.0 176.0
30.0 185.0
34.0 185.0

Looking good. Next, we'll need to determine the different games available in our dataset - we'll be using these to group the age and height data.

In [12]:
let idx_sport = headers.iter().position(|x| x == "Sport").unwrap();
let unique_games = games.column(idx_sport).iter().cloned().unique().collect_vec();

println!("{}",unique_games.iter().format(", "));
Basketball, Judo, Speed Skating, Cross Country Skiing, Athletics, Ice Hockey, Badminton, Sailing, Biathlon, Gymnastics, Alpine Skiing, Handball, Weightlifting, Wrestling, Luge, Rowing, Bobsleigh, Swimming, Football, Equestrianism, Shooting, Taekwondo, Boxing, Fencing, Diving, Canoeing, Water Polo, Tennis, Cycling, Hockey, Figure Skating, Softball, Archery, Volleyball, Synchronized Swimming, Modern Pentathlon, Table Tennis, Nordic Combined, Baseball, Rhythmic Gymnastics, Freestyle Skiing, Rugby Sevens, Trampolining, Beach Volleyball, Triathlon, Ski Jumping, Curling, Golf, Snowboarding, Short Track Speed Skating, Skeleton, Rugby, Art Competitions, Tug-Of-War

We now have the unique list of Olympic games - some of which you may not even have heard of!

Visualising the Data

Now that we have prepared our data, let's use all of our hard work in a box plot test.

Height of Athletes in Basketball

Let's see if we can create a box plot for the height of athletes in Basketball. To do so, we're going to build a list of row indices that correspond to Basketball data.

In [13]:
let mut count = -1;
let mut indices = Vec::<usize>::new();

let mask = games.column(idx_sport).map(|elem| {
    count += 1;    
    if(elem == "Basketball") { indices.push(count as usize) };
    elem == "Basketball"
    }
);

Then, we'll use these indices to select from our feature data.

In [14]:
let basketball = features.select(Axis(0), &indices);

We'll take a peek to make sure there were no obvious issues with parsing.

In [15]:
darn::show_frame(&basketball, Some(&feature_headers));
Out[15]:
Age Height
24.0 180.0
19.0 185.0
29.0 195.0
25.0 189.0
23.0 178.0
... ...
30.0 218.0
20.0 201.0
28.0 201.0
23.0 202.0
33.0 171.0

Finally, we'll create a box plot with just the height of the athletes in our dataset.

In [16]:
let mut plot = Plot::new();

let trace = BoxPlot::new(basketball.column(1).to_vec()).name("Basketball");

plot.add_trace(trace);

darn::show_plot(plot);
Out[16]:

Looking good.

Athlete Height Grouped by Olympic Games

Now let's do the same as what we've just done for Basketball, but apply it to all the games in our dataset.

In [42]:
let mut plot = Plot::new();
let layout = Layout::new()
    .title(Title::new("Athlete height grouped by Olympic games."))
    .margin(Margin::new().left(30).right(0).bottom(140).top(40))
    .xaxis(Axis::new().show_grid(true).tick_font(Font::new().size(10)))
    .show_legend(false);

plot.set_layout(layout);

for name in unique_games.iter() {
    let mut count = -1;
    let mut indices = Vec::<usize>::new();
    let mask = games.column(idx_sport).map(|elem| {
        count += 1;    
        if(elem == name) { indices.push(count as usize) };
        elem == "name"
        }
    );

    let game = features.select(Axis(0), &indices);
    let trace1 = BoxPlot::new(game.column(1).to_vec()).name(name);
    plot.add_trace(trace1);
};

darn::show_plot(plot);
Out[42]: