Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.

Get the book

Visualisation of Co-occurring Types

Preamble

In [2]:
:dep darn = {version = "0.1.11"}
:dep ndarray = {version = "0.13.0"}
:dep itertools = {version = "0.9.0"}
:dep chord = {Version = "0.1.6"}
extern crate ndarray;

use ndarray::prelude::*;
use itertools::Itertools;
use chord::{Chord, Plot};

Introduction

In this section, we're going to use the Complete Pokemon Dataset dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect two type variables per each of the 1028 samples of the first eight generations, type_1, and type_2.

Let's download the mirrored dataset and have a look for ourselves.

In [3]:
let data = darn::read_csv("https://shahinrostami.com/datasets/pokemon_gen_1_to_8.csv");
In [4]:
darn::show_frame(&data.0, Some(&data.1));
Out[4]:
pokedex_number name german_name japanese_name generation status species type_number type_1 type_2 height_m weight_kg abilities_number ability_1 ability_2 ability_hidden total_points hp attack defense sp_attack sp_defense speed catch_rate base_friendship base_experience growth_rate egg_type_number egg_type_1 egg_type_2 percentage_male egg_cycles against_normal against_fire against_water against_electric against_grass against_ice against_fight against_poison against_ground against_flying against_psychic against_bug against_rock against_ghost against_dragon against_dark against_steel against_fairy
"0" "1" "Bulbasaur" "Bisasam" "フシギダネ (Fushigidane)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "0.7" "6.9" "2" "Overgrow" "" "Chlorophyll" "318" "45" "49" "49" "65" "65" "45" "45" "70" "64" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"1" "2" "Ivysaur" "Bisaknosp" "フシギソウ (Fushigisou)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "1" "13" "2" "Overgrow" "" "Chlorophyll" "405" "60" "62" "63" "80" "80" "60" "45" "70" "142" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"2" "3" "Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2" "100" "2" "Overgrow" "" "Chlorophyll" "525" "80" "82" "83" "100" "100" "80" "45" "70" "236" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "2" "0.5" "0.5" "0.25" "2" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"3" "3" "Mega Venusaur" "Bisaflor" "フシギバナ (Fushigibana)" "1" "Normal" "Seed Pokémon" "2" "Grass" "Poison" "2.4" "155.5" "1" "Thick Fat" "" "" "625" "80" "100" "123" "122" "120" "80" "45" "70" "281" "Medium Slow" "2" "Grass" "Monster" "87.5" "20" "1" "1" "0.5" "0.5" "0.25" "1" "0.5" "1" "1" "2" "2" "1" "1" "1" "1" "1" "1" "0.5"
"4" "4" "Charmander" "Glumanda" "ヒトカゲ (Hitokage)" "1" "Normal" "Lizard Pokémon" "1" "Fire" "" "0.6" "8.5" "2" "Blaze" "" "Solar Power" "309" "39" "52" "43" "60" "50" "65" "45" "70" "62" "Medium Slow" "2" "Dragon" "Monster" "87.5" "20" "1" "0.5" "2" "1" "0.5" "0.5" "1" "1" "2" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "0.5"
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
"1023" "888" "Zacian Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fairy" "" "2.8" "110" "1" "Intrepid Sword" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "0.5" "2" "1" "1" "1" "0.5" "1" "1" "0" "0.5" "2" "1"
"1024" "889" "Zamazenta Crowned Shield" "" "" "8" "Legendary" "Warrior Pokémon" "2" "Fighting" "Steel" "2.9" "785" "1" "Dauntless Shield" "" "" "720" "92" "130" "145" "80" "145" "128" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "0.5" "2" "1" "1" "0.5" "0.5" "2" "0" "2" "1" "1" "0.25" "0.25" "1" "0.5" "0.5" "0.5" "1"
"1025" "889" "Zamazenta Hero of Many Battles" "" "" "8" "Legendary" "Warrior Pokémon" "1" "Fighting" "" "2.9" "210" "1" "Dauntless Shield" "" "" "670" "92" "130" "115" "80" "115" "138" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "1" "1" "1" "1" "1" "1" "1" "1" "2" "2" "0.5" "0.5" "1" "1" "0.5" "1" "2"
"1026" "890" "Eternatus" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "20" "950" "1" "Pressure" "" "" "690" "140" "85" "95" "145" "95" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"
"1027" "890" "Eternatus Eternamax" "" "" "8" "Legendary" "Gigantic Pokémon" "2" "Poison" "Dragon" "100" "" "0" "" "" "" "1125" "255" "115" "250" "125" "250" "130" "" "" "" "Slow" "1" "Undiscovered" "" "" "120" "1" "0.5" "0.5" "0.5" "0.25" "2" "0.5" "0.5" "2" "1" "2" "0.5" "1" "1" "2" "1" "1" "1"

It looks good so far, we can clearly see the two type columns. Let's confirm that we have 1028 samples.

In [5]:
&data.0.shape()
Out[5]:
[1028, 51]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the column names that the Pokémon types are split between the columns type_1 and type_2.

In [6]:
&data.1
Out[6]:
["", "pokedex_number", "name", "german_name", "japanese_name", "generation", "status", "species", "type_number", "type_1", "type_2", "height_m", "weight_kg", "abilities_number", "ability_1", "ability_2", "ability_hidden", "total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "catch_rate", "base_friendship", "base_experience", "growth_rate", "egg_type_number", "egg_type_1", "egg_type_2", "percentage_male", "egg_cycles", "against_normal", "against_fire", "against_water", "against_electric", "against_grass", "against_ice", "against_fight", "against_poison", "against_ground", "against_flying", "against_psychic", "against_bug", "against_rock", "against_ghost", "against_dragon", "against_dark", "against_steel", "against_fairy"]

So let's select just these two columns and work with a list containing only them as we move forward.

In [7]:
let types = data.0.slice(s![.., 9..11]).into_owned();
darn::show_frame(&types, None);
Out[7]:
"Grass" "Poison"
"Grass" "Poison"
"Grass" "Poison"
"Grass" "Poison"
"Fire" ""
... ...
"Fairy" ""
"Fighting" "Steel"
"Fighting" ""
"Poison" "Dragon"
"Poison" "Dragon"

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First, we'll populate our list of type names by looking for the unique ones.

In [8]:
let mut names = types.iter().cloned().unique().collect_vec();
names
Out[8]:
["Grass", "Poison", "Fire", "", "Flying", "Dragon", "Water", "Bug", "Normal", "Dark", "Electric", "Psychic", "Ground", "Ice", "Steel", "Fairy", "Fighting", "Rock", "Ghost"]

Let's sort this alphabetically.

In [9]:
names.sort();
names
Out[9]:
["", "Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

We'll also remove the empty string that has appeared as a result of samples with only one type.

In [10]:
names.remove(0);
names
Out[10]:
["Bug", "Dark", "Dragon", "Electric", "Fairy", "Fighting", "Fire", "Flying", "Ghost", "Grass", "Ground", "Ice", "Normal", "Poison", "Psychic", "Rock", "Steel", "Water"]

Now we can create our empty co-occurrence matrix with a shape that can hold co-occurrences between our types.

In [11]:
let type_count = names.len();
let mut matrix: Vec<Vec<f64>> = vec![vec![Default::default(); type_count]; type_count];
matrix
Out[11]:
[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

We can populate a co-occurrence matrix with the following approach. Here, we're looping through every sample in our dataset and incrementing the corresponding matrix entry by one using the type_1 and type_2 indices from the names vector. To make sure we have a co-occurrence matrix, we're also doing the same in reverse, i.e. type_2 and type_1.

In [29]:
for item in types.genrows() { 
    if(!item[0].is_empty() && !item[1].is_empty()) {
        matrix[names.iter().position(|s| s == &item[1]).unwrap()]
              [names.iter().position(|s| s == &item[0]).unwrap()] += 1.0;
        matrix[names.iter().position(|s| s == &item[0]).unwrap()]
              [names.iter().position(|s| s == &item[1]).unwrap()] += 1.0;
    };
};

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [26]:
let colors: Vec<String> = vec![
    "#A6B91A", "#705746", "#6F35FC", "#F7D02C", "#D685AD",
    "#C22E28", "#EE8130", "#A98FF3", "#735797", "#7AC74C",
    "#E2BF65", "#96D9D6", "#A8A77A", "#A33EA1", "#F95587",
    "#B6A136", "#B7B7CE", "#6390F0"
]
.into_iter()
.map(String::from)
.collect();

Finally, we can put it all together.

In [27]:
Chord {
    matrix: matrix.clone(),
    names: names.clone(),
    colors: colors,
    margin: 30.0,
    wrap_labels: true,
    ..Chord::default()
}
.show();

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Made with Chord Pro

You can create beautiful interactive visualisations like this one with Chord Pro. Learn how to make beautiful visualisations with the book, Data Analysis with Rust Notebooks.