## Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.

# Visualisation of Co-occurring Types

## Preamble¶

In [2]:
:dep darn = {version = "0.3.0"}
:dep ndarray = {version = "0.13.1"}
:dep itertools = {version = "0.9.0"}
:dep chord = {Version = "0.1.6"}
extern crate ndarray;

use ndarray::prelude::*;
use itertools::Itertools;
use chord::{Chord, Plot};

## Introduction¶

In this section, we're going to use the Complete Pokemon Dataset dataset to visualise the co-occurrence of Pokémon types from generations one to eight. We'll make this happen using a chord diagram.

### Chord Diagrams¶

In a chord diagram (or radial network), entities are arranged radially as segments with their relationships visualised by arcs that connect them. The size of the segments illustrates the numerical proportions, whilst the size of the arc illustrates the significance of the relationships1.

Chord diagrams can be useful when trying to convey relationships between different entities, and they can be beautiful and eye-catching. They can get messy when considering many entities, so it's often beneficial to make them interactive and explorable.

### The Chord Crate¶

I wasn't able to find any Rust crates for plotting chord diagrams, so I ported my own (based on d3-chord) from Python to Rust.

You can get the crate either from crates.io or from the GitHub repository. With your processed data, you should be able to plot something beautiful with just a single line, Chord{ matrix : matrix, names : names, .. Chord::default() }.show(). To enable the pro features of the chord crate check out Chord Pro.

### The Dataset¶

The dataset documentation states that we can expect two type variables per each of the 1028 samples of the first eight generations, type_1, and type_2.

In [3]:
let data = darn::read_csv("https://datacrayon.com/datasets/pokemon_gen_1_to_8.csv");
In [4]:
darn::show_frame(&data.0, Some(&data.1));
Out[4]:
pokedex_number name german_name japanese_name generation status species type_number type_1 type_2 height_m weight_kg abilities_number ability_1 ability_2 ability_hidden total_points hp attack defense sp_attack sp_defense speed catch_rate base_friendship base_experience growth_rate egg_type_number egg_type_1 egg_type_2 percentage_male egg_cycles against_normal against_fire against_water against_electric against_grass against_ice against_fight against_poison against_ground against_flying against_psychic against_bug against_rock against_ghost against_dragon against_dark against_steel against_fairy
"0" "1" "Bulbasaur" "Bisasam" "フシギダネ (Fushigidane)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "0.7" "6.9" "2" "Overgrow" "" "Chlorophyll" "318" "45" "49" "49" "65" "65" "45" "45" "70" "64" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"1" "2" "Ivysaur" "Bisaknosp" "フシギソウ (Fushigisou)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "1" "13" "2" "Overgrow" "" "Chlorophyll" "405" "60" "62" "63" "80" "80" "60" "45" "70" "142" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"2" "3" "Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2" "100" "2" "Overgrow" "" "Chlorophyll" "525" "80" "82" "83" "100" "100" "80" "45" "70" "236" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"3" "3" "Mega Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2.4" "155.5" "1" "Thick Fat" "" "" "625" "80" "100" "123" "122" "120" "80" "45" "70" "281" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "1" "0.5" "0.5" "0.25" "1" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"4" "4" "Charmander" "Glumanda" "ヒトカゲ (Hitokage)" "1" "Normal" "Lizard Pokémon" "1" "Fire" "" "0.6" "8.5" "2" "Blaze" "" "Solar Power" "309" "39" "52" "43" "60" "50" "65" "45" "70" "62" "Medium Slow" "2" "Dragon" "Monster" "87.5" "20" "1" "0.5" "2" "1" "0.5" "0.5" "1" "1" "2" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "0.5"
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
"1023" "888" "Zacian Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fairy" "" "2.8" "110" "1" "Intrepid Sword" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "1" "1" "0" "0.5" "2" "1"
"1024" "889" "Zamazenta Crowned Shield" "" "" "8" "Legendary" "Warrior Pokémon" "2" "Fighting" "Steel" "2.9" "785" "1" "Dauntless Shield" "" "" "720" "92" "130" "145" "80" "145" "128" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "0.5" "2" "1" "1" "0.5" "0.5" "2" "0" "2" "1" "1" "0.25" "0.25" "1" "0.5" "0.5" "0.5" "1"
"1025" "889" "Zamazenta Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fighting" "" "2.9" "210" "1" "Dauntless Shield" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "2" "0.5" "0.5" "1" "1" "0.5" "1" "2"
"1026" "890" "Eternatus" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "20" "950" "1" "Pressure" "" "" "690" "140" "85" "95" "145" "95" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"
"1027" "890" "Eternatus Eternamax" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "100" "" "0" "" "" "" "1125" "255" "115" "250" "125" "250" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"

It looks good so far, we can clearly see the two type columns. Let's confirm that we have 1028 samples.

In [5]:
&data.0.shape()
Out[5]:
[1028, 51]

Perfect, that's exactly what we were expecting.

## Data Wrangling¶

We need to do a bit of data wrangling before we can visualise our data. We can see from the column names that the Pokémon types are split between the columns type_1 and type_2.

In [6]:
&data.1
Out[6]:
["", "pokedex_number", "name", "german_name", "japanese_name", "generation", "status", "species", "type_number", "type_1", "type_2", "height_m", "weight_kg", "abilities_number", "ability_1", "ability_2", "ability_hidden", "total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "catch_rate", "base_friendship", "base_experience", "growth_rate", "egg_type_number", "egg_type_1", "egg_type_2", "percentage_male", "egg_cycles", "against_normal", "against_fire", "against_water", "against_electric", "against_grass", "against_ice", "against_fight", "against_poison", "against_ground", "against_flying", "against_psychic", "against_bug", "against_rock", "against_ghost", "against_dragon", "against_dark", "against_steel", "against_fairy"]

So let's select just these two columns and work with a list containing only them as we move forward.

In [7]:
let types = data.0.slice(s![.., 9..11]).into_owned();
darn::show_frame(&types, None);
Out[7]:
 "Grass" "Poison" "Grass" "Poison" "Grass" "Poison" "Grass" "Poison" "Fire" "" ... ... "Fairy" "" "Fighting" "Steel" "Fighting" "" "Poison" "Dragon" "Poison" "Dragon"

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First, we'll populate our list of type names by looking for the unique ones.

In [8]:
let mut names = types.iter().cloned().unique().collect_vec();
names
Out[8]:
["Grass", "Poison", "Fire", "", "Flying", "Dragon", "Water", "Bug", "Normal", "Dark", "Electric", "Psychic", "Ground", "Ice", "Steel", "Fairy", "Fighting", "Rock", "Ghost"]

Let's sort this alphabetically.

In [9]:
names.sort();
names
Out[9]:
["", "Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

We'll also remove the empty string that has appeared as a result of samples with only one type.

In [10]:
names.remove(0);
names
Out[10]:
["Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

Now we can create our empty co-occurrence matrix with a shape that can hold co-occurrences between our types.

In [11]:
let type_count = names.len();
let mut matrix: Vec<Vec<f64>> = vec![vec![Default::default(); type_count]; type_count];
matrix
Out[11]:
[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

We can populate a co-occurrence matrix with the following approach. Here, we're looping through every sample in our dataset and incrementing the corresponding matrix entry by one using the type_1 and type_2 indices from the names vector. To make sure we have a co-occurrence matrix, we're also doing the same in reverse, i.e. type_2 and type_1.

In [12]:
for item in types.genrows() {
if(!item[0].is_empty() && !item[1].is_empty()) {
matrix[names.iter().position(|s| s == &item[1]).unwrap()]
[names.iter().position(|s| s == &item[0]).unwrap()] += 1.0;
matrix[names.iter().position(|s| s == &item[0]).unwrap()]
[names.iter().position(|s| s == &item[1]).unwrap()] += 1.0;
};
};

## Chord Diagram¶

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [13]:
let colors: Vec<String> = vec![
"#C22E28", "#EE8130", "#A98FF3", "#735797", "#7AC74C",
"#E2BF65", "#96D9D6", "#A8A77A", "#A33EA1", "#F95587",
"#B6A136", "#B7B7CE", "#6390F0"
]
.into_iter()
.map(String::from)
.collect();

Finally, we can put it all together.

In [14]:
Chord {
matrix: matrix.clone(),
names: names.clone(),
colors: colors,
margin: 30.0,
wrap_labels: true,
..Chord::default()
}
.show();

## Conclusion¶

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!