Data Analysis with Rust Notebooks
A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.
Get the book
Visualisation of Co-occurring Types
Preamble¶
:dep darn = {version = "0.3.0"}
:dep ndarray = {version = "0.13.1"}
:dep itertools = {version = "0.9.0"}
:dep chord = {Version = "0.1.6"}
extern crate ndarray;
use ndarray::prelude::*;
use itertools::Itertools;
use chord::{Chord, Plot};
Introduction¶
In this section, we're going to use the Complete Pokemon Dataset dataset to visualise the co-occurrence of Pokémon types from generations one to eight. We'll make this happen using a chord diagram.
Chord Diagrams¶
In a chord diagram (or radial network), entities are arranged radially as segments with their relationships visualised by arcs that connect them. The size of the segments illustrates the numerical proportions, whilst the size of the arc illustrates the significance of the relationships1.
Chord diagrams can be useful when trying to convey relationships between different entities, and they can be beautiful and eye-catching. They can get messy when considering many entities, so it's often beneficial to make them interactive and explorable.
The Chord Crate¶
I wasn't able to find any Rust crates for plotting chord diagrams, so I ported my own (based on d3-chord) from Python to Rust.
You can get the crate either from crates.io or from the GitHub repository. With your processed data, you should be able to plot something beautiful with just a single line, Chord{ matrix : matrix, names : names, .. Chord::default() }.show()
. To enable the pro features of the chord
crate check out Chord Pro.
The Dataset¶
The dataset documentation states that we can expect two type variables per each of the 1028 samples of the first eight generations, type_1
, and type_2
.
Let's download the mirrored dataset and have a look for ourselves.
let data = darn::read_csv("https://datacrayon.com/datasets/pokemon_gen_1_to_8.csv");
darn::show_frame(&data.0, Some(&data.1));
It looks good so far, we can clearly see the two type columns. Let's confirm that we have 1028 samples.
&data.0.shape()
Perfect, that's exactly what we were expecting.
Data Wrangling¶
We need to do a bit of data wrangling before we can visualise our data. We can see from the column names that the Pokémon types are split between the columns type_1
and type_2
.
&data.1
So let's select just these two columns and work with a list containing only them as we move forward.
let types = data.0.slice(s![.., 9..11]).into_owned();
darn::show_frame(&types, None);
Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.
First, we'll populate our list of type names by looking for the unique ones.
let mut names = types.iter().cloned().unique().collect_vec();
names
Let's sort this alphabetically.
names.sort();
names
We'll also remove the empty string that has appeared as a result of samples with only one type.
names.remove(0);
names
Now we can create our empty co-occurrence matrix with a shape that can hold co-occurrences between our types.
let type_count = names.len();
let mut matrix: Vec<Vec<f64>> = vec![vec![Default::default(); type_count]; type_count];
matrix
We can populate a co-occurrence matrix with the following approach. Here, we're looping through every sample in our dataset and incrementing the corresponding matrix entry by one using the type_1
and type_2
indices from the names
vector. To make sure we have a co-occurrence matrix, we're also doing the same in reverse, i.e. type_2
and type_1
.
for item in types.genrows() {
if(!item[0].is_empty() && !item[1].is_empty()) {
matrix[names.iter().position(|s| s == &item[1]).unwrap()]
[names.iter().position(|s| s == &item[0]).unwrap()] += 1.0;
matrix[names.iter().position(|s| s == &item[0]).unwrap()]
[names.iter().position(|s| s == &item[1]).unwrap()] += 1.0;
};
};
Chord Diagram¶
Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.
let colors: Vec<String> = vec![
"#A6B91A", "#705746", "#6F35FC", "#F7D02C", "#D685AD",
"#C22E28", "#EE8130", "#A98FF3", "#735797", "#7AC74C",
"#E2BF65", "#96D9D6", "#A8A77A", "#A33EA1", "#F95587",
"#B6A136", "#B7B7CE", "#6390F0"
]
.into_iter()
.map(String::from)
.collect();
Finally, we can put it all together.
Chord {
matrix: matrix.clone(),
names: names.clone(),
colors: colors,
margin: 30.0,
wrap_labels: true,
..Chord::default()
}
.show();
Conclusion¶
In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!
Made with Plotapi
You can create beautiful, interactive, and engaging visualisations like this one with Plotapi in any programming language. Learn how to make beautiful visualisations with the book, Data Analysis with Rust Notebooks.
Data Analysis with Rust Notebooks
A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.
Get the book