League of Legends World Championship 2019

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the Pokemon with stats Generation 8 dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/lol/wc_matches.csv'
data = pd.read_csv(data_url)
data['date'] =  pd.to_datetime(data['date'])
data.head()
Out[2]:
Unnamed: 0 team1 team2 winner date pbp_caster color_caster mvp blue red
0 0 Fnatic SK Telecom T1 SK Telecom T1 2019-10-12 12:00:00 Atlus Froskurinn, Kobe Faker Fnatic SK Telecom T1
1 1 Royal Never Give Up Clutch Gaming Royal Never Give Up 2019-10-12 13:00:00 Atlus Froskurinn, Kobe Langx Royal Never Give Up Clutch Gaming
2 2 Invictus Gaming ahq eSports Club Invictus Gaming 2019-10-12 14:00:00 Atlus Froskurinn, Kobe Rookie Invictus Gaming ahq eSports Club
3 3 DAMWON Gaming Team Liquid Team Liquid 2019-10-12 15:00:00 Phreak Azael, Spawn Impact DAMWON Gaming Team Liquid
4 4 J Team FunPlus Phoenix J Team 2019-10-12 16:00:00 Phreak Azael, Spawn FoFo J Team FunPlus Phoenix

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [3]:
data.shape
Out[3]:
(81, 10)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [4]:
pd.DataFrame(data.columns.values.tolist())
Out[4]:
0
0 Unnamed: 0
1 team1
2 team2
3 winner
4 date
5 pbp_caster
6 color_caster
7 mvp
8 blue
9 red

So let's select just these two columns and work with a list containing only them as we move forward.

In [5]:
types = pd.DataFrame(data[['team1', 'team2']].values)
types
Out[5]:
0 1
0 Fnatic SK Telecom T1
1 Royal Never Give Up Clutch Gaming
2 Invictus Gaming ahq eSports Club
3 DAMWON Gaming Team Liquid
4 J Team FunPlus Phoenix
... ... ...
76 Flamengo eSports Royal Youth
77 DAMWON Gaming Lowkey Esports
78 Clutch Gaming Royal Youth
79 Hong Kong Attitude Isurus
80 Splyce Unicorns Of Love

81 rows × 2 columns

Without further investigation, we can see that we have at least a few NaN values in the table above. We are only interested in co-occurrence of types, so we can remove all samples which contain a NaN value.

In [6]:
types = types.dropna()

We can also see an instance where the type Fighting at index $1014$ is followed by \n. We'll strip all these out before continuing.

In [7]:
types = types.replace('\n','', regex=True)
types
Out[7]:
0 1
0 Fnatic SK Telecom T1
1 Royal Never Give Up Clutch Gaming
2 Invictus Gaming ahq eSports Club
3 DAMWON Gaming Team Liquid
4 J Team FunPlus Phoenix
... ... ...
76 Flamengo eSports Royal Youth
77 DAMWON Gaming Lowkey Esports
78 Clutch Gaming Royal Youth
79 Hong Kong Attitude Isurus
80 Splyce Unicorns Of Love

81 rows × 2 columns

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First we'll populate our list of type names by looking for the unique ones.

In [8]:
names = np.unique(types).tolist()
pd.DataFrame(names)
Out[8]:
0
0 Cloud9
1 Clutch Gaming
2 DAMWON Gaming
3 DetonatioN FocusMe
4 Flamengo eSports
5 Fnatic
6 FunPlus Phoenix
7 G2 Esports
8 GAM Esports
9 Griffin
10 Hong Kong Attitude
11 Invictus Gaming
12 Isurus
13 J Team
14 Lowkey Esports
15 MAMMOTH
16 MEGA
17 Royal Never Give Up
18 Royal Youth
19 SK Telecom T1
20 Splyce
21 Team Liquid
22 Unicorns Of Love
23 ahq eSports Club

Now we can create our empty co-occurrence matrix using these type names for the row and column indeces.

In [ ]:
 

We can populate a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

Which we can now use to create the matrix.

In [9]:
matrix = pd.DataFrame(0, index=names, columns=names)
details = pd.DataFrame([], index=names, columns=names).astype('object')
for index, x in details.iterrows():
    for k in details.columns.values:
        x[k] = []

for index, x in data.iterrows():
    if(x['winner'] == x['team1']):
        matrix.at[x['team1'], x['team2']] += 1
        temp = details.at[x['team1'], x['team2']]
        if(x['team1'] == x['blue']):
            color = '#00bbf9'
        else:
            color = '#fe5f55' 
        temp.append(f"{x['date'].strftime('%b-%d %H:%M')} <span style='color:{color}!important; font-style: normal;'>⬤</span> {x['team1']} 🥇 {x['mvp']}")
        details.at[x['team1'], x['team2']] = temp
    else:
        matrix.at[x['team2'], x['team1']] += 1
        temp = details.at[x['team2'], x['team1']]
        if(x['team2'] == x['blue']):
            color = '#00bbf9'
        else:
            color = '#fe5f55' 
        temp.append(f"{x['date'].strftime('%b-%d %H:%M')} <span style='color:{color}!important; font-style: normal;'>⬤</span> {x['team2']} 🥇 {x['mvp']}")
    
matrix = matrix.values.tolist()
details = pd.DataFrame(details)

We can list DataFrame for better presentation.

In [10]:
x['blue']
Out[10]:
'Unicorns Of Love'

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [11]:
colors = ["#A6B91A", "#705746", "#6F35FC", "#F7D02C", "#D685AD",
          "#C22E28", "#EE8130", "#A98FF3", "#735797", "#7AC74C",
          "#E2BF65", "#96D9D6", "#A8A77A", "#A33EA1", "#F95587",
          "#B6A136", "#B7B7CE", "#6390F0"];

Finally, we can put it all together.

In [13]:
Chord(matrix, names, symmetric=False,popup_width=700, 
      verb="battled together in",noun="matches",details=details.values.tolist(),
      details_separator='<br>',colors=colors,wrap_labels=False,
      margin=75,font_size_large="12px",font_size="12px", credit=True).show()
Chord Diagram

Getting Started with Chord PRO and Rust

Preamble

In [2]:
:dep chord = {Version = "0.1.6"}
use chord::{Chord, Plot};

Note

This introduction to ChordPRO was quickly put together to enable users to get started. You can see the most up-to-date API documentation at https://api.shahin.dev/docs.

Introduction

In a chord diagram (or radial network), entities are arranged radially as segments with their relationships visualised by arcs that connect them. The size of the segments illustrates the numerical proportions, whilst the size of the arc illustrates the significance of the relationships1.

Chord diagrams are useful when trying to convey relationships between different entities, and they can be beautiful and eye-catching.

Get Chord Pro

Click here to get lifetime access to the full-featured chord visualization API, producing beautiful interactive visualizations, e.g. those featured on the front page of Reddit.

chord pro

  • Produce beautiful interactive Chord diagrams.
  • Customize colours and font-sizes.
  • Access Divided mode, enabling two sides to your diagram.
  • Symmetric and Asymmetric modes,
  • Add images and text on hover,
  • Access finer-customisations including HTML injection.
  • Allows commercial use without open source requirement.
  • Currently supports Python, JavaScript, and Rust, with many more to come (accepting requests).

chord pro

The Chord Crate

I wasn't able to find any Rust crates for plotting chord diagrams, so I ported my own from Python to Rust.

You can get the package either from crates.io or from the GitHub repository. With your processed data, you should be able to plot something beautiful with just a single line, Chord{ matrix : matrix, names : names, .. Chord::default() }.show()

License

To switch to the PRO version of the chord crate, you need to assign a valid username (the email you entered at purchase) and license key. This can be purchased here.

Chord {
    user: String::from("enter username here"),
    key: String::from("enter license key here"),
    matrix: matrix.clone(),
    names: names.clone(),
    wrap_labels: true,
    ..Chord::default()
}
.show();

Chord Diagrams with the Rich Hover Box

The Dataset

The focus for this section will be the demonstration of the chord package. To keep it simple, we will use synthetic data that illustrates the co-occurrences between movie genres within the same movie.

In [30]:
let matrix: Vec<Vec<f64>> = vec![
    vec![0., 5., 6., 4., 7., 4.],
    vec![5., 0., 5., 4., 6., 5.],
    vec![6., 5., 0., 4., 5., 5.],
    vec![4., 4., 4., 0., 5., 5.],
    vec![7., 6., 5., 5., 0., 4.],
    vec![4., 5., 5., 5., 4., 0.],
];

let names: Vec<String> = vec![
    "Action",
    "Adventure",
    "Comedy",
    "Drama",
    "Fantasy",
    "Thriller",
]
.into_iter()
.map(String::from)
.collect();

In basic version of chord, matrix and names are the only sets of data that can be used to create a chord diagram. In the PRO version, you can also use details and details_thumbs. These enable the rich hover boxes.

In [31]:
let details : Vec<Vec<Vec<String>>> = vec![
    vec![vec![], vec!["Movie 1".to_string(),"Movie 2".to_string()], vec!["Movie 3".to_string(),"Movie 4".to_string(),"Movie 5".to_string()], vec!["Movie 6".to_string(),"Movie 7".to_string()], vec!["Movie 8".to_string(),"Movie 9".to_string(),"Movie 10".to_string(),"Movie 11".to_string()], vec!["Movie 12".to_string()]],
    vec![vec!["Movie 13".to_string(),"Movie 14".to_string()], vec![], vec!["Movie 15".to_string(),"Movie 16".to_string()], vec!["Movie 17".to_string()], vec!["Movie 18".to_string(),"Movie 19".to_string(),"Movie 20".to_string()], vec!["Movie 21".to_string(),"Movie 22".to_string()]],
    vec![vec!["Movie 23".to_string(),"Movie 24".to_string(),"Movie 25".to_string()], vec!["Movie 26".to_string(),"Movie 27".to_string()], vec![], vec!["Movie 28".to_string()], vec!["Movie 29".to_string(),"Movie 30".to_string()], vec!["Movie 31".to_string(),"Movie 32".to_string()]],
    vec![vec!["Movie 33".to_string()], vec!["Movie 34".to_string()], vec!["Movie 35".to_string()], vec![], vec!["Movie 36".to_string(),"Movie 37".to_string()], vec!["Movie 38".to_string(),"Movie 39".to_string()]],
    vec![vec!["Movie 40".to_string(),"Movie 41".to_string(),"Movie 42".to_string(),"Movie 43".to_string()], vec!["Movie 44".to_string(),"Movie 45".to_string(),"Movie 46".to_string()], vec!["Movie 47".to_string(),"Movie 48".to_string()], vec!["Movie 49".to_string(),"Movie 50".to_string()], vec![], vec!["Movie 51".to_string()]],
    vec![vec!["Movie 52".to_string()], vec!["Movie 53".to_string(),"Movie 54".to_string()], vec!["Movie 55".to_string(),"Movie 56".to_string()], vec!["Movie 57".to_string(),"Movie 58".to_string()], vec!["Movie 59".to_string()], vec![]]
];
In [32]:
let details_thumbs : Vec<Vec<Vec<String>>> = vec![
    vec![vec![], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()]],
    vec![vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec![], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()]],
    vec![vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec![], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()]],
    vec![vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec![], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()]],
    vec![vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec![], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()]],
    vec![vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string(),"https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec!["https://shahinrostami.com/images/stami-labs/lablet.png".to_string()], vec![]],
];

Let's see what the Chord() produces when we invoke the show() method.

In [33]:
Chord {
    user: String::from("enter username here"),
    key: String::from("enter license key here"),
    matrix: matrix.clone(),
    names: names.clone(),
    details: details,
    details_thumbs: details_thumbs,
    ..Chord::default()
}
.show();
Out[33]:
Chord Diagram

Chord Diagrams with Two Sides

In [34]:
let matrix: Vec<Vec<f64>> = vec![
    vec![0.0, 0.0, 0.0, 1.0, 4.0, 1.0],
    vec![0.0, 0.0, 0.0, 1.0, 3.0, 2.0],
    vec![0.0, 0.0, 0.0, 1.0, 2.0, 2.0],
    vec![1.0, 1.0, 1.0, 0.0, 0.0, 0.0],
    vec![4.0, 3.0, 2.0, 0.0, 0.0, 0.0],
    vec![1.0, 2.0, 2.0, 0.0, 0.0, 0.0],
];

let names: Vec<String> = vec!["A", "B", "C", "1", "2", "3"]
.into_iter()
.map(String::from)
.collect();

let colors: Vec<String> = vec!["#7400B8", "#5E60CE", "#5684D6", "#56CFE1", "#64DFDF", "#80FFDB"]
.into_iter()
.map(String::from)
.collect();

Chord {
    user: String::from("enter username here"),
    key: String::from("enter license key here"),
    matrix: matrix.clone(),
    names: names.clone(),
    colors: colors,
    divide: true,
    divide_idx: 3,
    ..Chord::default()
}.show();
Out[34]:
Chord Diagram

Conclusion

In this section, we've introduced major PRO features of the chord package. We used the crate and some synthetic data to demonstrate several chord diagram visualisations with different configurations. The chord Python crate is available for free from crates.io or from the GitHub repository.


  1. Tintarev, N., Rostami, S., & Smyth, B. (2018, April). Knowing the unknown: visualising consumption blind-spots in recommender systems. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (pp. 1396-1399). 

Getting Started with Chord PRO and Python

Preamble

In [1]:
from chord import Chord

Note

This introduction to ChordPRO was quickly put together to enable users to get started. You can see the most up-to-date API documentation at https://api.shahin.dev/docs.

Introduction

In a chord diagram (or radial network), entities are arranged radially as segments with their relationships visualised by arcs that connect them. The size of the segments illustrates the numerical proportions, whilst the size of the arc illustrates the significance of the relationships1.

Chord diagrams are useful when trying to convey relationships between different entities, and they can be beautiful and eye-catching.

Get Chord Pro

Click here to get lifetime access to the full-featured chord visualization API, producing beautiful interactive visualizations, e.g. those featured on the front page of Reddit.

chord pro

  • Produce beautiful interactive Chord diagrams.
  • Customize colours and font-sizes.
  • Access Divided mode, enabling two sides to your diagram.
  • Symmetric and Asymmetric modes,
  • Add images and text on hover,
  • Access finer-customisations including HTML injection.
  • Allows commercial use without open source requirement.
  • Currently supports Python, JavaScript, and Rust, with many more to come (accepting requests).

chord pro

The Chord Package

With Python in mind, there are many libraries available for creating Chord diagrams, such as Plotly, Bokeh, and a few that are lesser-known. However, I wanted to use the implementation from d3 because it can be customised to be highly interactive and to look beautiful.

I couldn't find anything that ticked all the boxes, so I made a wrapper around d3-chord myself. It took some time to get it working, but I wanted to hide away everything behind a single constructor and method call. The tricky part was enabling multiple chord diagrams on the same page, and then loading resources in a way that would support Jupyter Notebooks.

You can get the package either from PyPi using pip install chord or from the GitHub repository. With your processed data, you should be able to plot something beautiful with just a single line, Chord(data, names).show()

License

To switch to the PRO version of the chord package, you need to assign a valid username (the email you entered at purchase) and license key. This can be purchased here.

In [2]:
Chord.user = "your username"
Chord.key = "your license key"

The chord package switches to PRO mode when a username and license are specified. This enables the use of all the PRO features.

This uses the Chord PRO API service hosted on the DataCrayon.com (AWS hosted) server to generate your visualisation. Your parameter arguments (e.g. matrix, colors, etc) are sent to the API, which then generates and returns your HTML content.

Chord Diagrams with the Rich Hover Box

The Dataset

The focus for this section will be the demonstration of the chord package. To keep it simple, we will use synthetic data that illustrates the co-occurrences between movie genres within the same movie.

In [3]:
matrix = [
    [0, 2, 3, 1, 4, 1],
    [2, 0, 2, 1, 3, 2],
    [3, 2, 0, 1, 2, 2],
    [1, 1, 1, 0, 2, 2],
    [4, 3, 2, 2, 0, 1],
    [1, 2, 2, 2, 1, 0],
]

names = ["Action", "Adventure", "Comedy", "Drama", "Fantasy", "Thriller"]

In basic version of chord, matrix and names are the only sets of data that can be used to create a chord diagram. In the PRO version, you can also use details and details_thumbs. These enable the rich hover boxes.

In [4]:
details = [
    [[], ["Movie 1","Movie 2"], ["Movie 3","Movie 4","Movie 5"], ["Movie 6","Movie 7"], ["Movie 8","Movie 9","Movie 10","Movie 11"], ["Movie 12"]],
    [["Movie 13","Movie 14"], [], ["Movie 15","Movie 16"], ["Movie 17"], ["Movie 18","Movie 19","Movie 20"], ["Movie 21","Movie 22"]],
    [["Movie 23","Movie 24","Movie 25"], ["Movie 26","Movie 27"], [], ["Movie 28"], ["Movie 29","Movie 30"], ["Movie 31","Movie 32"]],
    [["Movie 33"], ["Movie 34"], ["Movie 35"], [], ["Movie 36","Movie 37"], ["Movie 38","Movie 39"]],
    [["Movie 40","Movie 41","Movie 42","Movie 43"], ["Movie 44","Movie 45","Movie 46"], ["Movie 47","Movie 48"], ["Movie 49","Movie 50"], [], ["Movie 51"]],
    [["Movie 52"], ["Movie 53","Movie 54"], ["Movie 55","Movie 56"], ["Movie 57","Movie 58"], ["Movie 59"], []],
]
In [5]:
details_thumbs = [
    [[], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png"]],
    [["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], [], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"]],
    [["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], [], ["https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"]],
    [["https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png"], [], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"]],
    [["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], [], ["https://shahinrostami.com/images/stami-labs/lablet.png"]],
    [["https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png","https://shahinrostami.com/images/stami-labs/lablet.png"], ["https://shahinrostami.com/images/stami-labs/lablet.png"], []],
]

Let's see what the Chord() defaults produce when we invoke the show() method.

In [6]:
Chord(matrix, names, details=details, details_thumbs=details_thumbs).show()
Chord Diagram

Chord Diagrams with Two Sides

In [7]:
matrix = [
    [0, 0, 0, 1, 4, 1],
    [0, 0, 0, 1, 3, 2],
    [0, 0, 0, 1, 2, 2],
    [1, 1, 1, 0, 0, 0],
    [4, 3, 2, 0, 0, 0],
    [1, 2, 2, 0, 0, 0],
]

names = ["A", "B", "C", "1", "2", "3"]
hex_colours = ["#7400B8", "#5E60CE", "#5684D6", "#56CFE1", "#64DFDF", "#80FFDB"]

Chord(matrix, names, title="A chord diagram with two sides.",
      colors=hex_colours, divide=True, divide_idx=3).show()
Chord Diagram

Optional Customisations

Chord PRO gives you finer control over the look and interactive components of the diagram. The parameters and their defaults include the following:

colors="d3.schemeSet1",
opacity=0.8,
padding=0.01,
width=700,
label_color="#454545",
wrap_labels=True,
margin=0,
credit=False,
font_size="16px",
font_size_large="20px",
details=[],
details_thumbs=[],
thumbs_width=85,
thumbs_margin=5,
thumbs_font_size=14,
popup_width=350,
noun="instances",
details_separator=", ",
divide=False,
divide_idx=0,
divide_size=0.5,
verb="occur together in",
symmetric=True,
title="",
arc_numbers=False,
divide_left_label = "",
divide_right_label = "",
inner_radius_scale = 0.39,
outer_radius_scale = 1.1,

You can see the most up-to-date API documentation at https://api.shahin.dev/docs.

Let's see a few of these in action.

In [8]:
matrix = [
    [0, 0, 0, 1, 4, 1],
    [0, 0, 0, 1, 3, 2],
    [0, 0, 0, 1, 2, 2],
    [1, 1, 1, 0, 0, 0],
    [4, 3, 2, 0, 0, 0],
    [1, 2, 2, 0, 0, 0],
]

names = ["A", "B", "C", "1", "2", "3"]
hex_colours = ["#ffadad", "#ffd6a5", "#fdffb6", "#caffbf", "#9bf6ff", "#a0c4ff"]

Chord(matrix, names, title="A chord diagram with customisations.",
      colors=hex_colours, divide=True, divide_idx=3,
      divide_size = 0.9, opacity=0.4, padding=0.01,
      outer_radius_scale=1.2, inner_radius_scale=0.35,
      divide_left_label="Numbers", divide_right_label="Letters",
      verb="do something together", noun="things",
      allow_download=True).show()
Chord Diagram
Download

Conclusion

In this section, we've introduced major PRO features of the chord package. We used the package and some synthetic data to demonstrate several chord diagram visualisations with different configurations. The chord Python package is available for free using pip install chord.


  1. Tintarev, N., Rostami, S., & Smyth, B. (2018, April). Knowing the unknown: visualising consumption blind-spots in recommender systems. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (pp. 1396-1399). 

Animal Crossing Villagers - Style Co-occurrence

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the Pokemon with stats Generation 8 dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [3]:
data_url = 'https://shahinrostami.com/datasets/ac/villagers.csv'
data = pd.read_csv(data_url)
data.head()
Out[3]:
Name Species Gender Personality Hobby Birthday Catchphrase Favorite Song Style 1 Style 2 Color 1 Color 2 Wallpaper Flooring Furniture List Filename Unique Entry ID
0 Admiral Bird Male Cranky Nature 27-Jan aye aye Steep Hill Cool Cool Black Blue dirt-clod wall tatami 717;1849;7047;2736;787;5970;3449;3622;3802;410... brd06 B3RyfNEqwGmcccRC3
1 Agent S Squirrel Female Peppy Fitness 2-Jul sidekick Go K.K. Rider Active Simple Blue Black concrete wall colorful tile flooring 7845;7150;3468;4080;290;3971;3449;1708;4756;25... squ05 SGMdki6dzpDZyXAw5
2 Agnes Pig Female Big Sister Play 21-Apr snuffle K.K. House Simple Elegant Pink White gray molded-panel wall arabesque flooring 4129;7236;7235;7802;896;3428;4027;7325;3958;71... pig17 jzWCiDPm9MqtCfecP
3 Al Gorilla Male Lazy Fitness 18-Oct ayyyeee Go K.K. Rider Active Active Red White concrete wall green rubber flooring 1452;4078;4013;833;4116;3697;7845;3307;3946;39... gor08 LBifxETQJGEaLhBjC
4 Alfonso Alligator Male Lazy Play 9-Jun it'sa me Forest Life Simple Simple Red Blue yellow playroom wall green honeycomb tile 4763;3205;3701;1557;3623;85;3208;3584;4761;121... crd00 REpd8KxB8p9aGBRSE

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [ ]:
 

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [4]:
pd.DataFrame(data.columns.values.tolist())
Out[4]:
0
0 Name
1 Species
2 Gender
3 Personality
4 Hobby
5 Birthday
6 Catchphrase
7 Favorite Song
8 Style 1
9 Style 2
10 Color 1
11 Color 2
12 Wallpaper
13 Flooring
14 Furniture List
15 Filename
16 Unique Entry ID

So let's select just these two columns and work with a list containing only them as we move forward.

Without further investigation, we can see that we have at least a few NaN values in the table above. We are only interested in co-occurrence of types, so we can remove all samples which contain a NaN value.

We can also see an instance where the type Fighting at index $1014$ is followed by \n. We'll strip all these out before continuing.

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First we'll populate our list of type names by looking for the unique ones.

In [5]:
names = np.unique(data[['Style 1', 'Style 2']]).tolist()
pd.DataFrame(names)
Out[5]:
0
0 Active
1 Cool
2 Cute
3 Elegant
4 Gorgeous
5 Simple

Now we can create our empty co-occurrence matrix using these type names for the row and column indeces.

In [6]:
matrix = pd.DataFrame(0, index=names, columns=names)
matrix
Out[6]:
Active Cool Cute Elegant Gorgeous Simple
Active 0 0 0 0 0 0
Cool 0 0 0 0 0 0
Cute 0 0 0 0 0 0
Elegant 0 0 0 0 0 0
Gorgeous 0 0 0 0 0 0
Simple 0 0 0 0 0 0

We can populate a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

Which we can now use to create the matrix.

In [7]:
for index, x in data.iterrows():
    if(x['Style 1'] != x['Style 2'] ):
        matrix.at[x['Style 1'], x['Style 2']] += 1
        matrix.at[x['Style 2'], x['Style 1']] += 1
    if(x['Style 1'] == x['Style 2']):
        matrix.at[x['Style 1'], x['Style 1']] += 1

matrix = matrix.values.tolist()

We can list DataFrame for better presentation.

In [8]:
pd.DataFrame(matrix)
Out[8]:
0 1 2 3 4 5
0 6 22 13 3 5 45
1 22 9 0 20 22 45
2 13 0 16 22 10 48
3 3 20 22 1 46 17
4 5 22 10 46 1 7
5 45 45 48 17 7 33
In [9]:
colors = ["#fee440","#00bbf9","#00f5d4","#9b5de5","#f15bb5","#f68251"]


Chord(matrix, names, colors=colors).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names and images when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

Let's also add a column to our dataset to store URLs that point to the images.

In [10]:
data['URL'] = ""

for index, row in data.iterrows():
    #url = f"http://127.0.0.1:8000/images/data-is-beautiful/lol/champion/{row.name}.png"
    url = "https://shahinrostami.com/images/data-is-beautiful/villagers/{}.png".format(row.Name.replace(' ', '_').replace("'", '_').replace('.', ''))
    data.at[index,'URL'] = url
In [11]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [12]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls =[]
        details_names=[]
        if(item_y == item_x):
            details_urls = data[
                (data['Style 1'] == item_x) & (data['Style 2'] == item_x)]['URL'].to_list()

            details_names = data[
                (data['Style 1'] == item_x) & (data["Style 2"] == item_x)]['Name'].to_list()
        else:
            details_urls = data[
                ((data['Style 1'].isin([item_x])) &
                (data['Style 2'].isin([item_y]))) |
                ((data['Style 2'].isin([item_x])) &
                (data['Style 1'].isin([item_y])))]['URL'].to_list()
            details_names = data[
                ((data['Style 1'].isin([item_x])) &
                (data['Style 2'].isin([item_y]))) |
                ((data['Style 2'].isin([item_x])) &
                (data['Style 1'].isin([item_y])))]['Name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            print("reset")
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
reset
reset
In [13]:
print(data.Name)
0       Admiral
1       Agent S
2         Agnes
3            Al
4       Alfonso
         ...   
386      Winnie
387    Wolfgang
388        Yuka
389        Zell
390      Zucker
Name: Name, Length: 391, dtype: object
In [14]:
pd.DataFrame(details)
Out[14]:
0 1 2 3 4 5
0 [Al, Bill, Coach, Jay, Jitters, Rooney] [Astrid, Bella, Camofrog, Canberra, Cyd, Cyran... [Audie, Bubbles, Charlise, Dom, Flora, Frita, ... [Annalise, Mott, Pierce] [Biff, Boots, Curlos, Leonardo, Lionel] [Agent S, Axel, Bam, Big Top, Billy, Buck, Bud...
1 [Astrid, Bella, Camofrog, Canberra, Cyd, Cyran... [Admiral, Angus, Curt, Fuchsia, Ike, Jambette,... [] [Amelia, Boone, Cesar, Cherry, Croque, Freya, ... [Boris, Chow, Eugene, Flo, Frank, Gruff, Julia... [Apollo, Barold, Bea, Boomer, Boyd, Bruce, But...
2 [Audie, Bubbles, Charlise, Dom, Flora, Frita, ... [] [Alice, Bianca, Bunnie, Carrie, Chrissy, Cooki... [Aurora, Ava, Bertha, Bitty, Bonbon, Cally, Ca... [Bangle, Caroline, Gabi, Mint, Pancetti, Penel... [Anabelle, Apple, Beau, Bluebear, Bob, Bones, ...
3 [Annalise, Mott, Pierce] [Amelia, Boone, Cesar, Cherry, Croque, Freya, ... [Aurora, Ava, Bertha, Bitty, Bonbon, Cally, Ca... [Beardo] [Alli, Annalisa, Baabara, Becky, Blaire, Blanc... [Agnes, Anicotti, Bettina, Clay, Derwin, Doc, ...
4 [Biff, Boots, Curlos, Leonardo, Lionel] [Boris, Chow, Eugene, Flo, Frank, Gruff, Julia... [Bangle, Caroline, Gabi, Mint, Pancetti, Penel... [Alli, Annalisa, Baabara, Becky, Blaire, Blanc... [Lopez] [Ankha, Avery, Biskit, Gaston, Pietro, Sylvia,...
5 [Agent S, Axel, Bam, Big Top, Billy, Buck, Bud... [Apollo, Barold, Bea, Boomer, Boyd, Bruce, But... [Anabelle, Apple, Beau, Bluebear, Bob, Bones, ... [Agnes, Anicotti, Bettina, Clay, Derwin, Doc, ... [Ankha, Avery, Biskit, Gaston, Pietro, Sylvia,... [Alfonso, Anchovy, Antonio, Benedict, Benjamin...

Finally, we can put it all together but this time with the details matrix passed in.

In [15]:
Chord(
    matrix,
    names,
    colors=colors,
    details=details,
    details_thumbs=details_thumbs,
    noun="Villagers",
    thumbs_width=50,
    thumbs_margin=0,
    popup_width=740,
    thumbs_font_size=10,
    credit=True,
    arc_numbers=True
).show()
Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

In [ ]:
 

League of Legends - Class Combinations

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the Pokemon with stats Generation 8 dataset to visualise the co-occurrence of Pokémon types from generations one to eight.

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [26]:
data_url = 'https://datacrayon.com/datasets/lol/champion.json'
data = pd.read_json(data_url)
data.head()
Out[26]:
type format version data
Aatrox champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Aatrox', 'key': ...
Ahri champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Ahri', 'key': '1...
Akali champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Akali', 'key': '...
Alistar champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Alistar', 'key':...
Amumu champion standAloneComplex 10.13.1 {'version': '10.13.1', 'id': 'Amumu', 'key': '...
In [ ]:
 

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [3]:
data = pd.DataFrame(data.data.tolist()).set_index(data.index)
In [4]:
data
Out[4]:
version id key name title blurb info image tags partype stats
Aatrox 10.13.1 Aatrox 266 Aatrox the Darkin Blade Once honored defenders of Shurima against the ... {'attack': 8, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Aatrox.png', 'sprite': 'champion0.pn... [Fighter, Tank] Blood Well {'hp': 580, 'hpperlevel': 90, 'mp': 0, 'mpperl...
Ahri 10.13.1 Ahri 103 Ahri the Nine-Tailed Fox Innately connected to the latent power of Rune... {'attack': 3, 'defense': 4, 'magic': 8, 'diffi... {'full': 'Ahri.png', 'sprite': 'champion0.png'... [Mage, Assassin] Mana {'hp': 526, 'hpperlevel': 92, 'mp': 418, 'mppe...
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe...
Alistar 10.13.1 Alistar 12 Alistar the Minotaur Always a mighty warrior with a fearsome reputa... {'attack': 6, 'defense': 9, 'magic': 5, 'diffi... {'full': 'Alistar.png', 'sprite': 'champion0.p... [Tank, Support] Mana {'hp': 600, 'hpperlevel': 106, 'mp': 350, 'mpp...
Amumu 10.13.1 Amumu 32 Amumu the Sad Mummy Legend claims that Amumu is a lonely and melan... {'attack': 2, 'defense': 6, 'magic': 8, 'diffi... {'full': 'Amumu.png', 'sprite': 'champion0.png... [Tank, Mage] Mana {'hp': 613.12, 'hpperlevel': 84, 'mp': 287.2, ...
... ... ... ... ... ... ... ... ... ... ... ...
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe...
Ziggs 10.13.1 Ziggs 115 Ziggs the Hexplosives Expert With a love of big bombs and short fuses, the ... {'attack': 2, 'defense': 4, 'magic': 9, 'diffi... {'full': 'Ziggs.png', 'sprite': 'champion4.png... [Mage] Mana {'hp': 536, 'hpperlevel': 92, 'mp': 480, 'mppe...
Zilean 10.13.1 Zilean 26 Zilean the Chronokeeper Once a powerful Icathian mage, Zilean became o... {'attack': 2, 'defense': 5, 'magic': 8, 'diffi... {'full': 'Zilean.png', 'sprite': 'champion4.pn... [Support, Mage] Mana {'hp': 504, 'hpperlevel': 82, 'mp': 452, 'mppe...
Zoe 10.13.1 Zoe 142 Zoe the Aspect of Twilight As the embodiment of mischief, imagination, an... {'attack': 1, 'defense': 7, 'magic': 8, 'diffi... {'full': 'Zoe.png', 'sprite': 'champion4.png',... [Mage, Support] Mana {'hp': 560, 'hpperlevel': 92, 'mp': 425, 'mppe...
Zyra 10.13.1 Zyra 143 Zyra Rise of the Thorns Born in an ancient, sorcerous catastrophe, Zyr... {'attack': 4, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Zyra.png', 'sprite': 'champion4.png'... [Mage, Support] Mana {'hp': 504, 'hpperlevel': 79, 'mp': 418, 'mppe...

148 rows × 11 columns

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [5]:
pd.DataFrame(data.columns.values.tolist())
Out[5]:
0
0 version
1 id
2 key
3 name
4 title
5 blurb
6 info
7 image
8 tags
9 partype
10 stats

So let's select just these two columns and work with a list containing only them as we move forward.

Without further investigation, we can see that we have at least a few NaN values in the table above. We are only interested in co-occurrence of types, so we can remove all samples which contain a NaN value.

We can also see an instance where the type Fighting at index $1014$ is followed by \n. We'll strip all these out before continuing.

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

First we'll populate our list of type names by looking for the unique ones.

In [6]:
types = [item for sublist in data.tags.tolist() for item in sublist]
names = np.unique(types).tolist()
pd.DataFrame(names)
Out[6]:
0
0 Assassin
1 Fighter
2 Mage
3 Marksman
4 Support
5 Tank

Now we can create our empty co-occurrence matrix using these type names for the row and column indeces.

In [7]:
matrix = pd.DataFrame(0, index=names, columns=names)
matrix
Out[7]:
Assassin Fighter Mage Marksman Support Tank
Assassin 0 0 0 0 0 0
Fighter 0 0 0 0 0 0
Mage 0 0 0 0 0 0
Marksman 0 0 0 0 0 0
Support 0 0 0 0 0 0
Tank 0 0 0 0 0 0

We can populate a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

Which we can now use to create the matrix.

In [8]:
len(data.tags[0])
Out[8]:
2
In [9]:
for x in data.tags:
    if(len(x) == 2):
        matrix.at[x[0], x[1]] += 1
        matrix.at[x[1], x[0]] += 1
    if(len(x) == 1):
        matrix.at[x[0], x[0]] += 1

matrix = matrix.values.tolist()

We can list DataFrame for better presentation.

In [10]:
pd.DataFrame(matrix)
Out[10]:
0 1 2 3 4 5
0 5 17 8 5 1 0
1 17 3 6 1 3 34
2 8 6 13 6 21 4
3 5 1 6 13 2 0
4 1 3 21 2 1 4
5 0 34 4 0 4 1
In [11]:
colors = ["#ffbe0b","#fb5607","#ff006e","#8338ec","#3a86ff","#80FF72"]

Chord(matrix, names, colors=colors).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names and images when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

Let's also add a column to our dataset to store URLs that point to the images.

In [12]:
data['URL'] = ""

for index, row in data.iterrows():
    #url = f"http://127.0.0.1:8000/images/data-is-beautiful/lol/champion/{row.name}.png"
    url = f"https://shahinrostami.com/images/data-is-beautiful/lol/champion/{row.name}.png"
    data.at[index,'URL'] = url
In [13]:
data.URL
Out[13]:
Aatrox     https://shahinrostami.com/images/data-is-beaut...
Ahri       https://shahinrostami.com/images/data-is-beaut...
Akali      https://shahinrostami.com/images/data-is-beaut...
Alistar    https://shahinrostami.com/images/data-is-beaut...
Amumu      https://shahinrostami.com/images/data-is-beaut...
                                 ...                        
Zed        https://shahinrostami.com/images/data-is-beaut...
Ziggs      https://shahinrostami.com/images/data-is-beaut...
Zilean     https://shahinrostami.com/images/data-is-beaut...
Zoe        https://shahinrostami.com/images/data-is-beaut...
Zyra       https://shahinrostami.com/images/data-is-beaut...
Name: URL, Length: 148, dtype: object
In [14]:
data.loc['Akali']
Out[14]:
version                                              10.13.1
id                                                     Akali
key                                                       84
name                                                   Akali
title                                     the Rogue Assassin
blurb      Abandoning the Kinkou Order and her title of t...
info       {'attack': 5, 'defense': 3, 'magic': 8, 'diffi...
image      {'full': 'Akali.png', 'sprite': 'champion0.png...
tags                                              [Assassin]
partype                                               Energy
stats      {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe...
URL        https://shahinrostami.com/images/data-is-beaut...
Name: Akali, dtype: object
In [15]:
names
Out[15]:
['Assassin', 'Fighter', 'Mage', 'Marksman', 'Support', 'Tank']

Next, we'll create an empty multi-dimensional arrays with the same shape as our matrix for our details and thumbnail images.

In [16]:
data[['tag_1','tag_2']] = pd.DataFrame(data.tags.tolist(), index= data.index)
In [17]:
data
Out[17]:
version id key name title blurb info image tags partype stats URL tag_1 tag_2
Aatrox 10.13.1 Aatrox 266 Aatrox the Darkin Blade Once honored defenders of Shurima against the ... {'attack': 8, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Aatrox.png', 'sprite': 'champion0.pn... [Fighter, Tank] Blood Well {'hp': 580, 'hpperlevel': 90, 'mp': 0, 'mpperl... https://shahinrostami.com/images/data-is-beaut... Fighter Tank
Ahri 10.13.1 Ahri 103 Ahri the Nine-Tailed Fox Innately connected to the latent power of Rune... {'attack': 3, 'defense': 4, 'magic': 8, 'diffi... {'full': 'Ahri.png', 'sprite': 'champion0.png'... [Mage, Assassin] Mana {'hp': 526, 'hpperlevel': 92, 'mp': 418, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Assassin
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Alistar 10.13.1 Alistar 12 Alistar the Minotaur Always a mighty warrior with a fearsome reputa... {'attack': 6, 'defense': 9, 'magic': 5, 'diffi... {'full': 'Alistar.png', 'sprite': 'champion0.p... [Tank, Support] Mana {'hp': 600, 'hpperlevel': 106, 'mp': 350, 'mpp... https://shahinrostami.com/images/data-is-beaut... Tank Support
Amumu 10.13.1 Amumu 32 Amumu the Sad Mummy Legend claims that Amumu is a lonely and melan... {'attack': 2, 'defense': 6, 'magic': 8, 'diffi... {'full': 'Amumu.png', 'sprite': 'champion0.png... [Tank, Mage] Mana {'hp': 613.12, 'hpperlevel': 84, 'mp': 287.2, ... https://shahinrostami.com/images/data-is-beaut... Tank Mage
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Ziggs 10.13.1 Ziggs 115 Ziggs the Hexplosives Expert With a love of big bombs and short fuses, the ... {'attack': 2, 'defense': 4, 'magic': 9, 'diffi... {'full': 'Ziggs.png', 'sprite': 'champion4.png... [Mage] Mana {'hp': 536, 'hpperlevel': 92, 'mp': 480, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage None
Zilean 10.13.1 Zilean 26 Zilean the Chronokeeper Once a powerful Icathian mage, Zilean became o... {'attack': 2, 'defense': 5, 'magic': 8, 'diffi... {'full': 'Zilean.png', 'sprite': 'champion4.pn... [Support, Mage] Mana {'hp': 504, 'hpperlevel': 82, 'mp': 452, 'mppe... https://shahinrostami.com/images/data-is-beaut... Support Mage
Zoe 10.13.1 Zoe 142 Zoe the Aspect of Twilight As the embodiment of mischief, imagination, an... {'attack': 1, 'defense': 7, 'magic': 8, 'diffi... {'full': 'Zoe.png', 'sprite': 'champion4.png',... [Mage, Support] Mana {'hp': 560, 'hpperlevel': 92, 'mp': 425, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Support
Zyra 10.13.1 Zyra 143 Zyra Rise of the Thorns Born in an ancient, sorcerous catastrophe, Zyr... {'attack': 4, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Zyra.png', 'sprite': 'champion4.png'... [Mage, Support] Mana {'hp': 504, 'hpperlevel': 79, 'mp': 418, 'mppe... https://shahinrostami.com/images/data-is-beaut... Mage Support

148 rows × 14 columns

In [18]:
#data.loc[data.tag_2.isna(), 'tag_2'] = data[data.tag_2.isna()].tag_1
In [19]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [20]:
data[(data['tag_1'] == "Assassin") & (data["tag_2"].isnull())]
Out[20]:
version id key name title blurb info image tags partype stats URL tag_1 tag_2
Akali 10.13.1 Akali 84 Akali the Rogue Assassin Abandoning the Kinkou Order and her title of t... {'attack': 5, 'defense': 3, 'magic': 8, 'diffi... {'full': 'Akali.png', 'sprite': 'champion0.png... [Assassin] Energy {'hp': 575, 'hpperlevel': 95, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
Khazix 10.13.1 Khazix 121 Kha'Zix the Voidreaver The Void grows, and the Void adapts—in none of... {'attack': 9, 'defense': 4, 'magic': 3, 'diffi... {'full': 'Khazix.png', 'sprite': 'champion1.pn... [Assassin] Mana {'hp': 572.8, 'hpperlevel': 85, 'mp': 327.2, '... https://shahinrostami.com/images/data-is-beaut... Assassin None
Shaco 10.13.1 Shaco 35 Shaco the Demon Jester Crafted long ago as a plaything for a lonely p... {'attack': 8, 'defense': 4, 'magic': 6, 'diffi... {'full': 'Shaco.png', 'sprite': 'champion3.png... [Assassin] Mana {'hp': 587, 'hpperlevel': 89, 'mp': 297.2, 'mp... https://shahinrostami.com/images/data-is-beaut... Assassin None
Talon 10.13.1 Talon 91 Talon the Blade's Shadow Talon is the knife in the darkness, a merciles... {'attack': 9, 'defense': 3, 'magic': 1, 'diffi... {'full': 'Talon.png', 'sprite': 'champion3.png... [Assassin] Mana {'hp': 588, 'hpperlevel': 95, 'mp': 377.2, 'mp... https://shahinrostami.com/images/data-is-beaut... Assassin None
Zed 10.13.1 Zed 238 Zed the Master of Shadows Utterly ruthless and without mercy, Zed is the... {'attack': 9, 'defense': 2, 'magic': 1, 'diffi... {'full': 'Zed.png', 'sprite': 'champion4.png',... [Assassin] Energy {'hp': 584, 'hpperlevel': 85, 'mp': 200, 'mppe... https://shahinrostami.com/images/data-is-beaut... Assassin None
In [21]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        if(item_y == item_x):
            details_urls = data[
                (data['tag_1'] == item_x) & (data["tag_2"].isnull())]['URL'].to_list()

            details_names = data[
                (data['tag_1'] == item_x) & (data["tag_2"].isnull())]['name'].to_list()
        else:
            details_urls = data[
                (data['tag_1'].isin([item_x, item_y])) &
                (data['tag_2'].isin([item_y, item_x]))]['URL'].to_list()

            details_names = data[
                (data['tag_1'].isin([item_x, item_y])) &
                (data['tag_2'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
In [22]:
pd.DataFrame(details)
Out[22]:
0 1 2 3 4 5
0 [Akali, Kha'Zix, Shaco, Talon, Zed] [Ekko, Fiora, Fizz, Irelia, Jax, Kayn, Lee Sin... [Ahri, Evelynn, Kassadin, Katarina, LeBlanc, M... [Quinn, Teemo, Tristana, Twitch, Vayne] [Pyke] []
1 [Ekko, Fiora, Fizz, Irelia, Jax, Kayn, Lee Sin... [Gangplank, Mordekaiser, Rek'Sai] [Diana, Elise, Gragas, Rumble, Ryze, Swain] [Jayce] [Kayle, Taric, Thresh] [Aatrox, Blitzcrank, Camille, Darius, Dr. Mund...
2 [Ahri, Evelynn, Kassadin, Katarina, LeBlanc, M... [Diana, Elise, Gragas, Rumble, Ryze, Swain] [Annie, Aurelion Sol, Brand, Cassiopeia, Karth... [Azir, Ezreal, Jhin, Kennen, Kog'Maw, Varus] [Anivia, Bard, Fiddlesticks, Heimerdinger, Ive... [Amumu, Cho'Gath, Galio, Maokai]
3 [Quinn, Teemo, Tristana, Twitch, Vayne] [Jayce] [Azir, Ezreal, Jhin, Kennen, Kog'Maw, Varus] [Aphelios, Caitlyn, Corki, Draven, Graves, Jin... [Ashe, Senna] []
4 [Pyke] [Kayle, Taric, Thresh] [Anivia, Bard, Fiddlesticks, Heimerdinger, Ive... [Ashe, Senna] [Rakan] [Alistar, Braum, Leona, Tahm Kench]
5 [] [Aatrox, Blitzcrank, Camille, Darius, Dr. Mund... [Amumu, Cho'Gath, Galio, Maokai] [] [Alistar, Braum, Leona, Tahm Kench] [Shen]

Finally, we can put it all together but this time with the details matrix passed in.

In [25]:
Chord(
    matrix,
    names,
    colors=colors,
    details=details,
    details_thumbs=details_thumbs,
    noun="Champions",
    thumbs_width=70,
    thumbs_margin=1,
    popup_width=670,
    thumbs_font_size=10,
    credit=True,
    padding=0.05,
    arc_numbers=True,
    verb="appear together in"
).show()
Chord Diagram

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Top Olympic Medal Earning Countries

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()
Out[2]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN
In [3]:
data = raw_data[raw_data.Medal.notna()]
data.head()
Out[3]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
37 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 200 metres Breaststroke Bronze
38 15 Arvo Ossian Aaltonen M 30.0 NaN NaN Finland FIN 1920 Summer 1920 Summer Antwerpen Swimming Swimming Men's 400 metres Breaststroke Bronze
40 16 Juhamatti Tapio Aaltonen M 28.0 184.0 85.0 Finland FIN 2014 Winter 2014 Winter Sochi Ice Hockey Ice Hockey Men's Ice Hockey Bronze
41 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland FIN 1948 Summer 1948 Summer London Gymnastics Gymnastics Men's Individual All-Around Bronze

capitalise the name, personality, and species of each villager.

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [4]:
data.shape
Out[4]:
(39783, 15)
In [5]:
data = data[data['NOC'].isin(list(data['NOC'].value_counts()[:20].index))]

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

In [6]:
species_personality = pd.DataFrame(data[['NOC', 'Medal']].values).dropna().astype(str)
species_personality
Out[6]:
0 1
0 FIN Bronze
1 FIN Bronze
2 FIN Bronze
3 FIN Bronze
4 FIN Gold
... ... ...
30152 URS Gold
30153 URS Silver
30154 URS Bronze
30155 RUS Bronze
30156 RUS Silver

30157 rows × 2 columns

In [7]:
species_personality = species_personality.dropna()
In [ ]:
 

Now for the names of our types.

In [8]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['Medal'].value_counts().index)[::-1]
#left.sort()
left = list(["Gold","Silver","Bronze"])

pd.DataFrame(left)
Out[8]:
0
0 Gold
1 Silver
2 Bronze
In [9]:
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['NOC'].value_counts().index)
#right.sort()
pd.DataFrame(right)
Out[9]:
0
0 USA
1 URS
2 GER
3 GBR
4 FRA
5 ITA
6 SWE
7 CAN
8 AUS
9 RUS
10 HUN
11 NED
12 NOR
13 GDR
14 CHN
15 JPN
16 FIN
17 SUI
18 ROU
19 KOR

Which we can now use to create the matrix.

In [10]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [11]:
species_personality.values
Out[11]:
array([['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ['FIN', 'Bronze'],
       ...,
       ['URS', 'Bronze'],
       ['RUS', 'Bronze'],
       ['RUS', 'Silver']], dtype=object)
In [12]:
for x in species_personality.values:
    d.at[x[0], x[1]] += 1
    d.at[x[1], x[0]] += 1
In [ ]:
 

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [13]:
colors =["#FFD700","#C0C0C0","#A57164",
'#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff', '#9a6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#808080', '#ffffff', '#000000'
         #'#e6194B', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#42d4f4', '#f032e6', '#bfef45', '#fabed4', '#469990', '#dcbeff', '#9A6324', '#fffac8', '#800000', '#aaffc3', '#808000', '#ffd8b1', '#000075', '#a9a9a9', '#ffffff', '#000000'
        ]
In [14]:
names = left + right
len(names)
Out[14]:
23

Finally, we can put it all together.
In [15]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="medals",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [16]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [17]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).to_html()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Olympic Weightlifting Medals with Stacked Bar Charts

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation

# Optional Customisations
import plotly.io as pio              # to set shahin plot layout
pio.templates['shahin'] = pio.to_templated(go.Figure().update_layout(
    legend=dict(orientation="h",y=1.1, x=.5, xanchor='center'),
    margin=dict(t=0,r=0,b=0,l=0))).layout.template
pio.templates.default = 'shahin'
pio.renderers.default = "notebook_connected" # remove when running locally 

Introduction

In this section, we're going to use 120 years of Olympic history to create a visualisation. Let's set our sights on something that illustrates the distribution of Olympic medals awarded for the weightlifting sport.

Weightlifting cats

The Dataset

We'll use the 120 years of Olympic history: athletes and results dataset, which we'll download and load with pandas. You're also welcome to use the mirrored that has been used in the following cell.

In [2]:
data_url = 'https://shahinrostami.com/datasets/athlete_events.csv'
raw_data = pd.read_csv(data_url)
raw_data.head()
Out[2]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN

It looks like the data was loaded without any issues. Let's have a quick look at the available features.

In [3]:
pd.DataFrame(raw_data.columns)
Out[3]:
0
0 ID
1 Name
2 Sex
3 Age
4 Height
5 Weight
6 Team
7 NOC
8 Games
9 Year
10 Season
11 City
12 Sport
13 Event
14 Medal

Data Wrangling

We're only interested in Olympic weightlifting data for our visualisation, so we'll filter by selecting all rows where the Sport is set to Weightlifting.

In [4]:
data = raw_data[raw_data.Sport =="Weightlifting"]
data.head()
Out[4]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
80 22 Andreea Aanei F 22.0 170.0 125.0 Romania ROU 2016 Summer 2016 Summer Rio de Janeiro Weightlifting Weightlifting Women's Super-Heavyweight NaN
154 59 Ivan Nikolov Abadzhiev M 24.0 164.0 71.0 Bulgaria BUL 1956 Summer 1956 Summer Melbourne Weightlifting Weightlifting Men's Lightweight NaN
155 59 Ivan Nikolov Abadzhiev M 28.0 164.0 71.0 Bulgaria BUL 1960 Summer 1960 Summer Roma Weightlifting Weightlifting Men's Middleweight NaN
156 60 Mikhail Abadzhiev M 24.0 172.0 75.0 Bulgaria BUL 1960 Summer 1960 Summer Roma Weightlifting Weightlifting Men's Middleweight NaN
234 112 Aziz Abbas M 21.0 169.0 67.0 Iraq IRQ 1964 Summer 1964 Summer Tokyo Weightlifting Weightlifting Men's Lightweight NaN

If we look at the Medal column in the table above, we can see NaN values for when an athlete was not awarded a medal. As we're only interested Olympic medalists for this visualisation, let's drop all the rows where no medal was awarded.

In [5]:
data = data[data.Medal.notna()]
data.head()
Out[5]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
2331 1301 Sri Wahyuni Agustiani F 21.0 147.0 47.0 Indonesia INA 2016 Summer 2016 Summer Rio de Janeiro Weightlifting Weightlifting Women's Flyweight Silver
2637 1480 Franz Aigner M 32.0 NaN 107.0 Austria AUT 1924 Summer 1924 Summer Paris Weightlifting Weightlifting Men's Heavyweight Silver
3045 1698 Khadzhimurat Magomedovich Akkayev M 19.0 178.0 105.0 Russia RUS 2004 Summer 2004 Summer Athina Weightlifting Weightlifting Men's Middle-Heavyweight Silver
3046 1698 Khadzhimurat Magomedovich Akkayev M 23.0 178.0 105.0 Russia RUS 2008 Summer 2008 Summer Beijing Weightlifting Weightlifting Men's Middle-Heavyweight Bronze
3067 1713 Artur Vladimirovich Akoyev M 26.0 NaN 109.0 Unified Team EUN 1992 Summer 1992 Summer Barcelona Weightlifting Weightlifting Men's Heavyweight II Silver

If we're interested, we can take a peek at how many medals have been awarded in total for bronze, silver, and gold.

In [6]:
pd.DataFrame(data.Medal.value_counts())
Out[6]:
Medal
Gold 217
Bronze 216
Silver 213

Now that we have our filtered and relevant data, let's build a list of participating countries. At first glance, it looks like Team may be the feature we're interested in, and for the Weightlifting sport, it is indeed a good selection. However, in other sports in the same dataset, we will see Teams such as Japan-1 and Japan-2.

In [7]:
pd.DataFrame(raw_data[raw_data.Team.str.contains("Japan")].Team.unique())
Out[7]:
0
0 Japan
1 Japan-1
2 Japan-2
3 Japan-3

For now, we'll continue with the NOC feature, which holds the name of the National Olympic Committee for each athlete.

In [8]:
noc = data.NOC.unique().tolist()
print(noc)
['INA', 'AUT', 'RUS', 'EUN', 'BUL', 'URS', 'LUX', 'USA', 'JPN', 'IRI', 'TUR', 'FRA', 'BLR', 'GEO', 'IRQ', 'HUN', 'AUS', 'POL', 'ROU', 'GER', 'SWE', 'ITA', 'CUB', 'GDR', 'CHN', 'NED', 'TPE', 'KAZ', 'PRK', 'MDA', 'GBR', 'ARM', 'UKR', 'BEL', 'CAN', 'PHI', 'LTU', 'GRE', 'TCH', 'EGY', 'COL', 'FIN', 'VIE', 'SUI', 'FRG', 'KOR', 'THA', 'NOR', 'DEN', 'MEX', 'EST', 'TTO', 'IND', 'UZB', 'NGR', 'CRO', 'VEN', 'QAT', 'LAT', 'ARG', 'SGP', 'LIB', 'ESP', 'AZE']

Visualising the Data

Now that we have prepared our data, let's create a few visualisations. Instead of just showing you the final visualisation, we will develop our visualisation incrementally, where each subsequent visualisation improves on the last.

Stacked Bar Chart - Iteration 1

When we started this notebook, we had the idea of creating a stacked bar chart to visualise the medals awarded to each country in the weightlifting sport. Our first visualisation may look something like the following.

In [9]:
fig = go.Figure(layout=dict(barmode='stack'))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="brown")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="silver")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="gold")

fig.show()

It's not a bad start! We have our bars stacked in the right order, from bronze up to gold, and our colours were selected to be gold, silver, and brown (as no colour parameter exists for bronze).

Stacked Bar Chart - Iteration 2

However, we can make some improvements to enhance the usefulness and beauty of the visualisation. Let's try the following:

  • Assign some specific HEX colour codes for our bar colours,
  • Order the bars in descending order by total medals awarded,
  • and Angle the bar (tick) labels at -45 degrees.
In [10]:
fig = go.Figure(layout=dict(
    barmode='stack', 
    xaxis= dict(categoryorder='total descending', tickangle=-45)))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="#A57164")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="#FFD700")

fig.show()

Great! It's already looking easier to navigate, and the colours are more suitable for the data they're representing.

Stacked Bar Chart - Iteration 3

Let's continue to make improvements, this time we'll try the following:

  • Reduce the font-size of the bar (tick) labels, as some currently disappear if the width of the plot is too small (e.g., when shrinking the browser width),
  • Change the font-colour of our bar (tick) labels,
  • Add an outline and some transparency to our bars,
  • Reduce the gaps between our bars,
  • Hide the y-axis ticks,
  • and Add a thick line at the bottom of the x-axis.
In [11]:
fig = go.Figure(layout=dict(
    barmode='stack', bargap = 0.1,
    xaxis= dict(categoryorder='total descending', tickangle=-45,
                showline=True, linewidth=2, linecolor='black',ticks='',
                tickfont=dict(size=8, color='black')),
    yaxis=dict(showticklabels=False)))

fig.add_bar(name="Bronze", x=noc, y=data[data.Medal == "Bronze"].NOC
            .value_counts().reindex(noc), marker_color="#A57164")

fig.add_bar(name="Silver", x=noc, y=data[data.Medal == "Silver"].NOC
            .value_counts().reindex(noc), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=noc, y=data[data.Medal == "Gold"].NOC
            .value_counts().reindex(noc), marker_color="#FFD700")

fig.update_traces(marker_line_color='#003366',
                  marker_line_width=1, opacity=0.7)

fig.show()

Looking good!

Stacked Bar Chart - Final Iteration

Now to wrap things up, we may be interested in just selecting the "top 15" medal earning countries for weightlifting. We'll also start using the Team feature instead of working with the NOC. This will require some additional preparation. First, we'll determine the top 15 medal earners.

In [12]:
top_15 = data.Team.value_counts()[:15]
pd.DataFrame(top_15)
Out[12]:
Team
Soviet Union 62
China 57
United States 42
Bulgaria 36
Poland 32
Russia 26
Germany 25
Hungary 20
Iran 18
North Korea 17
Kazakhstan 16
Greece 16
France 16
Italy 15
Japan 14

Next, we'll filter our data to only include rows from these teams.

In [13]:
data =  data[data.Team.isin(list(top_15.index.values))]
teams = data.Team.unique().tolist()
data.head()
Out[13]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
3045 1698 Khadzhimurat Magomedovich Akkayev M 19.0 178.0 105.0 Russia RUS 2004 Summer 2004 Summer Athina Weightlifting Weightlifting Men's Middle-Heavyweight Silver
3046 1698 Khadzhimurat Magomedovich Akkayev M 23.0 178.0 105.0 Russia RUS 2008 Summer 2008 Summer Beijing Weightlifting Weightlifting Men's Middle-Heavyweight Bronze
4001 2306 Ruslan Vladimirovich Albegov M 24.0 192.0 156.0 Russia RUS 2012 Summer 2012 Summer London Weightlifting Weightlifting Men's Super-Heavyweight Bronze
4360 2483 Rumen Aleksandrov M 20.0 176.0 89.0 Bulgaria BUL 1980 Summer 1980 Summer Moskva Weightlifting Weightlifting Men's Middle-Heavyweight Silver
4404 2511 Vasily Ivanovich Alekseyev M 30.0 185.0 160.0 Soviet Union URS 1972 Summer 1972 Summer Munich Weightlifting Weightlifting Men's Super-Heavyweight Gold

Finally, we'll produce our final visualisation that will display the top 15 medal earning countries (or teams) for the weightlifting sport. We'll also try the following improvements to our visualisation:

  • Changing the fonts to use Muli (if it's available),
  • Hide the legend as the bar colours are all we need,
  • Adding a title (and some top-margin to give it space),
  • Adding text above our bars indicating the total medals per country,
  • Increasing the thickness of our bar outlines (as we have fewer bars now),
  • and Changing the angle of the bar (tick) labels to 60 degrees, so they stay within the boundaries of our visualisation.
In [14]:
fig = go.Figure(layout=dict(
    title="Top 15 Olympic weightlifting medal earners between {}-{}"
        .format(data.Year.min(),data.Year.max()),
    barmode='stack', bargap = 0.1, margin=dict(t=40, r=0, b=0, l=0),
    font=dict(family="Muli", size=14, color="#212529",), showlegend=False,
    xaxis= dict(categoryorder='total descending', tickangle=60,
                showline=True, linewidth=2, linecolor='black',ticks='',
                tickfont=dict(family="Muli", size=16, color="#212529")),
    yaxis=dict(showticklabels=False)),
)

fig.add_bar(name="Bronze", x=teams, y=data[data.Medal == "Bronze"].Team
            .value_counts().reindex(teams), marker_color="#A57164")

fig.add_bar(name="Silver", x=teams, y=data[data.Medal == "Silver"].Team
            .value_counts().reindex(teams), marker_color="#C0C0C0")

fig.add_bar(name="Gold", x=teams, y=data[data.Medal == "Gold"].Team
            .value_counts().reindex(teams), marker_color="#FFD700",
            text=data.Team.value_counts().reindex(teams), textposition="outside")

fig.update_traces(marker_line_color='#003366',
                  marker_line_width=1.5, opacity=0.7, textfont_size=14)

fig.show()

Conclusion

In this section, we went through a few improvement cycles to produce a visualisation illustrating the top Olympic weightlifting medal earners in the 120 years of Olympic history: athletes and results dataset.

The visualisation ended up looking great, but a few plotly limitations prevented one final improvement - changing the bar colours to be gradients.

Weightlifting cats

Bootstrap 5 is Here!

Bootstrap 5’s very first alpha has arrived, and it looks like it’s something to celebrate! Amongst the many differences between bootstrap 5, and its previous version, Bootstrap 4, there are some major ones to look out for.

We'll explore these in the remainder of this article, but you may want to watch the video too.

jQuery and JavaScript

The first major change, which I think people are going to be very happy about, is that Bootstrap no longer depends on jQuery! For many developers, Bootstrap's dependency on jQuery was a deal-breaker, meaning many of them moved away to other frameworks. This could bring many of those developers back meaning Bootstrap may be an even more popular moving forward.

Although I had no issue with jQuery, I understand that it’s an excellent solution to an old problem. However, a lot has changed over the years, meaning most of these problems have now been addressed in newer web browsers.

Dropped Support for Internet Explorer

The second major change is about something even older than jQuery. Bootstrap 5 has officially dropped support for Internet Explorer!

Supporting Internet Explorer was certainly a nightmare, especially over a decade ago when I was working in web development. However with Microsoft promoting Edge, dropping support for Internet Explorer is the norm nowadays.

CSS Custom Properties

Even so, by dropping support for Internet Explorer, we can talk about our third major change - Bootstrap 5 has been able to start using CSS custom properties.

This means being able to define easy to understand variables in one place and use them in multiple other places. This should improve the theming experience, and it looks like theme creators will be busy moving their themes over.

Improved Documentation

The fourth major change is actually to the Bootstrap documentation. It looks like the team have put some great effort in improving their documentation by removing ambiguity, and giving more support to those wanting to extend Bootstrap.

There’s now more content on theming, complete with even more code snippets that help you build on top of Bootstrap's source files. The colour palette has also been expanded!

There’s even an npm project to get you started quicker.

Updated Forms

The fifth major change comes with the re-design of all of Bootstrap's form controls.

Custom form controls for things like checkboxes and switches were possible in Bootstrap 4, but in Bootstrap 5, the claim is that they’ve gone fully custom with standard markup!

Utility API

The sixth major change is the implementation of the new Utility API in Bootstrap 5.

Utilities have become the preferred way to build, which we can see with the success of Utility-first CSS frameworks like Tailwind CSS.

If you build on Bootstrap using the source files, supposedly your mind will be blown by the new experience!

Conclusion

There are many differences in Bootstrap 5, but you’ll, of course, find familiarity too. Things like the grid system are still here in an enhanced form.

If you haven’t seen it before, Bootstrap now has its own icon library called Bootstrap Icons which is definitely worth checking out.

But these are just the first of many enhancements in Bootstrap 5, and it’s just the alpha after all.

It does look like there’s still no built-in dark mode, which appears to be a highly requested addition. Although, for now, custom dark modes can be created by changing a few variables.

Still, you can head over to https://v5.getbootstrap.com to explore the new release for yourself. You can even get it as pre-release using the node package manager, npm i bootstrap@next.

StamiStudios.com Everyday Ita Bag - Panels and Colours

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/stami_bags_panel_colour.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
panel colour
0 Heart Pink
1 Circle lilac
2 Sakura Mint
3 circle mint
4 Animal Black
In [3]:
#data['Country.of.Origin'][data['Country.of.Origin'] == 'United States (Hawaii)'] = 'Hawaii'
#data['Country.of.Origin'][data['Country.of.Origin'] == 'Tanzania, United Republic Of'] = 'Tanzania'
In [4]:
#data = data[data.Variety != 'Other']
data = data[data.notna()]
In [5]:
data = data[data['panel'].isin(list(data['panel'].value_counts()[:20].index))]
data = data[data['colour'].isin(list(data['colour'].value_counts()[:11].index))]
In [ ]:
 

capitalise the name, personality, and species of each villager.

In [6]:
data['panel'] = data['panel'].str.capitalize()
data['colour'] = data['colour'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [7]:
data.shape
Out[7]:
(3040, 2)
In [8]:
d_colours = list(data.colour.value_counts().index)
d_colours.sort()
d_colours
Out[8]:
['Black', 'Blue', 'Green', 'Lilac', 'Mint', 'Navy', 'Pink', 'White', 'Yellow']

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

So let's select just these two columns and work with a list containing only them as we move forward.

In [9]:
species_personality = pd.DataFrame(data[['colour', 'panel']].values).dropna().astype(str)
species_personality
Out[9]:
0 1
0 Pink Heart
1 Mint Sakura
2 Black Animal
3 White Circle
4 White Star
... ... ...
3035 Blue Circle
3036 White Circle
3037 White Heart
3038 White Circle
3039 Black Circle

3040 rows × 2 columns

In [10]:
species_personality = species_personality.dropna()

Now for the names of our types.

In [11]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data['colour'].value_counts().index)[::-1]
#left.sort()

pd.DataFrame(left)
Out[11]:
0
0 Yellow
1 Green
2 Blue
3 Navy
4 Mint
5 Lilac
6 Pink
7 White
8 Black
In [12]:
#right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
right = list(data['panel'].value_counts().index)
#right.sort()
pd.DataFrame(right)
Out[12]:
0
0 Circle
1 Star
2 Crescent
3 Moon
4 Bat wings
5 Sakura
6 Heart
7 Dice20
8 Frog
9 Animal
10 Feline-ears
11 Cat
12 Angel-wings
13 Hive
14 Bottle
15 Paw
16 Petals
17 Pixel
18 Citrus

Which we can now use to create the matrix.

In [13]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [14]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [15]:
for x in species_personality:
    d.at[x[0], x[1]] += 1
In [16]:
d=d/(d.values.sum()/2)*100
In [17]:
d
Out[17]:
Yellow Green Blue Navy Mint Lilac Pink White Black Circle ... Animal Feline-ears Cat Angel-wings Hive Bottle Paw Petals Pixel Citrus
Yellow 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.032895 ... 0.000000 0.000000 0.000000 0.000000 0.230263 0.000000 0.032895 0.032895 0.000000 0.855263
Green 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.032895
Blue 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.394737 ... 0.296053 0.197368 0.164474 0.164474 0.000000 0.361842 0.098684 0.000000 0.098684 0.000000
Navy 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.625000 ... 0.131579 0.098684 0.263158 0.000000 0.164474 0.328947 0.131579 0.065789 0.098684 0.098684
Mint 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.361842 ... 0.098684 0.000000 0.131579 0.032895 0.065789 0.098684 0.263158 0.065789 0.197368 0.263158
Lilac 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.723684 ... 0.493421 0.296053 0.493421 0.032895 0.098684 0.460526 0.296053 0.526316 0.263158 0.065789
Pink 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.986842 ... 0.427632 0.328947 0.361842 0.263158 0.263158 0.032895 0.328947 0.789474 0.394737 0.164474
White 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.927632 ... 1.348684 0.493421 0.493421 2.861842 0.625000 0.361842 0.263158 0.657895 0.197368 0.361842
Black 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 7.796053 ... 1.940789 3.289474 2.138158 0.427632 1.743421 1.315789 0.953947 0.098684 0.953947 0.164474
Circle 0.032895 0.000000 0.394737 0.625000 0.361842 0.723684 0.986842 2.927632 7.796053 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Star 0.098684 0.000000 0.625000 0.427632 0.493421 0.723684 1.118421 1.513158 3.684211 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Crescent 0.098684 0.000000 0.328947 0.657895 0.197368 0.789474 1.743421 0.855263 3.256579 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Moon 0.164474 0.000000 0.690789 0.855263 0.361842 0.921053 0.723684 0.953947 3.059211 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Bat wings 0.000000 0.000000 0.098684 0.230263 0.032895 0.394737 0.065789 0.197368 6.644737 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Sakura 0.000000 0.000000 0.230263 0.065789 0.197368 0.394737 3.026316 0.953947 1.414474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Heart 0.000000 0.000000 0.164474 0.098684 0.164474 0.592105 1.940789 0.888158 1.743421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Dice20 0.000000 0.000000 0.164474 0.328947 0.230263 0.493421 0.394737 0.427632 2.993421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Frog 0.000000 2.565789 0.065789 0.000000 1.907895 0.098684 0.328947 0.032895 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Animal 0.000000 0.000000 0.296053 0.131579 0.098684 0.493421 0.427632 1.348684 1.940789 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Feline-ears 0.000000 0.000000 0.197368 0.098684 0.000000 0.296053 0.328947 0.493421 3.289474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Cat 0.000000 0.000000 0.164474 0.263158 0.131579 0.493421 0.361842 0.493421 2.138158 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Angel-wings 0.000000 0.000000 0.164474 0.000000 0.032895 0.032895 0.263158 2.861842 0.427632 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Hive 0.230263 0.000000 0.000000 0.164474 0.065789 0.098684 0.263158 0.625000 1.743421 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Bottle 0.000000 0.000000 0.361842 0.328947 0.098684 0.460526 0.032895 0.361842 1.315789 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Paw 0.032895 0.000000 0.098684 0.131579 0.263158 0.296053 0.328947 0.263158 0.953947 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Petals 0.032895 0.000000 0.000000 0.065789 0.065789 0.526316 0.789474 0.657895 0.098684 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Pixel 0.000000 0.000000 0.098684 0.098684 0.197368 0.263158 0.394737 0.197368 0.953947 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Citrus 0.855263 0.032895 0.000000 0.098684 0.263158 0.065789 0.164474 0.361842 0.164474 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

28 rows × 28 columns

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [18]:
colors =["#ffe75f", "#85e063", "#aed6f8", "#1e3c6f", "#affec6", "#d3b0e7", "#fedbe8", "#f5f4e9", "#222222", "#f7a296", "#f48fb1", "#ce93d8", "#a9a3db", "#89cffa", "#80deea", "#80cbc4", "#a5d6a7", "#e6ee9c", "#fff59d", "#ffe082", "#ffcc80", "#f7a296", "#f06292", "#a76fcb", "#7986cb", "#64b5f6", "#4ecaec", "#4db6ac"]
In [19]:
names = left + right

Finally, we can put it all together.
In [24]:
Chord(d.values.round(2).tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, noun="percent",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [21]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [22]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/miniconda3/envs/dib/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-22-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

/opt/miniconda3/envs/dib/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

/opt/miniconda3/envs/dib/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!

Video Game Titles - Publishers and Genres

Preamble

In [1]:
import numpy as np                   # for multi-dimensional containers 
import pandas as pd                  # for DataFrames
import itertools
from chord import Chord

Introduction

In previous sections, we visualised co-occurrences of Pokémon type. Whilst it was interesting to look at, the dataset only contained Pokémon from the first six geerations. In this section, we're going to use the TidyTuesday Animal Crossing villagers dataset to visualise the relationship between Species and .

The Dataset

The dataset documentation states that we can expect 13 variables per each of the 1017 Pokémon of the first eight generations.

Let's download the mirrored dataset and have a look for ourselves.

In [2]:
data_url = 'https://shahinrostami.com/datasets/vgsales.csv'
data = pd.read_csv(data_url)
data.head()
Out[2]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37

capitalise the name, personality, and species of each villager.

In [3]:
data['Publisher'] = data['Publisher'].str.capitalize()
data['Genre'] = data['Genre'].str.capitalize()

It looks good so far, but let's confirm the 13 variables against 1017 samples from the documentation.

In [4]:
data.shape
Out[4]:
(16598, 11)

Perfect, that's exactly what we were expecting.

Data Wrangling

We need to do a bit of data wrangling before we can visualise our data. We can see from the columns names that the Pokémon types are split between the columns Type 1 and Type 2.

In [5]:
pd.DataFrame(data.columns.values.tolist())
Out[5]:
0
0 Rank
1 Name
2 Platform
3 Year
4 Genre
5 Publisher
6 NA_Sales
7 EU_Sales
8 JP_Sales
9 Other_Sales
10 Global_Sales

So let's select just these two columns and work with a list containing only them as we move forward.

In [6]:
species_personality = pd.DataFrame(data[['Publisher', 'Genre']].values).dropna().astype(str)
species_personality
Out[6]:
0 1
0 Nintendo Sports
1 Nintendo Platform
2 Nintendo Racing
3 Nintendo Sports
4 Nintendo Role-playing
... ... ...
16593 Kemco Platform
16594 Infogrames Shooter
16595 Activision Racing
16596 7g//ames Puzzle
16597 Wanadoo Platform

16540 rows × 2 columns

Now for the names of our types.

In [7]:
#left = np.unique(pd.DataFrame(species_personality)[0]).tolist()
left = list(data.Publisher.value_counts()[:14].index)
pd.DataFrame(left)
Out[7]:
0
0 Electronic arts
1 Activision
2 Namco bandai games
3 Ubisoft
4 Konami digital entertainment
5 Thq
6 Nintendo
7 Sony computer entertainment
8 Sega
9 Take-two interactive
10 Capcom
11 Atari
12 Tecmo koei
13 Square enix
In [8]:
right = np.unique(pd.DataFrame(species_personality)[1]).tolist()
pd.DataFrame(right)
Out[8]:
0
0 Action
1 Adventure
2 Fighting
3 Misc
4 Platform
5 Puzzle
6 Racing
7 Role-playing
8 Shooter
9 Simulation
10 Sports
11 Strategy

Which we can now use to create the matrix.

In [9]:
features= left+right
d = pd.DataFrame(0, index=features, columns=features)

Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.

We can build a co-occurrence matrix with the following approach. We'll start by creating a list with every type pairing in its original and reversed form.

In [10]:
species_personality = list(itertools.chain.from_iterable((i, i[::-1]) for i in species_personality.values))
In [11]:
for x in species_personality:
    if(x[0] in left or x[1] in left):
        d.at[x[0], x[1]] += 1
In [12]:
d
Out[12]:
Electronic arts Activision Namco bandai games Ubisoft Konami digital entertainment Thq Nintendo Sony computer entertainment Sega Take-two interactive ... Fighting Misc Platform Puzzle Racing Role-playing Shooter Simulation Sports Strategy
Electronic arts 0 0 0 0 0 0 0 0 0 0 ... 39 46 16 7 159 35 139 116 561 37
Activision 0 0 0 0 0 0 0 0 0 0 ... 7 103 60 7 74 41 159 23 144 22
Namco bandai games 0 0 0 0 0 0 0 0 0 0 ... 134 97 19 20 27 151 37 29 51 61
Ubisoft 0 0 0 0 0 0 0 0 0 0 ... 19 151 70 24 52 41 92 119 72 29
Konami digital entertainment 0 0 0 0 0 0 0 0 0 0 ... 20 77 40 10 13 37 40 86 280 28
Thq 0 0 0 0 0 0 0 0 0 0 ... 71 66 85 17 101 8 36 27 31 32
Nintendo 0 0 0 0 0 0 0 0 0 0 ... 18 100 112 74 37 106 26 29 55 32
Sony computer entertainment 0 0 0 0 0 0 0 0 0 0 ... 30 128 66 12 65 49 51 15 124 12
Sega 0 0 0 0 0 0 0 0 0 0 ... 37 62 52 22 48 64 40 12 135 35
Take-two interactive 0 0 0 0 0 0 0 0 0 0 ... 1 27 11 1 20 6 65 4 151 22
Capcom 0 0 0 0 0 0 0 0 0 0 ... 58 11 46 6 13 38 25 2 3 3
Atari 0 0 0 0 0 0 0 0 0 0 ... 37 26 21 22 36 28 40 9 56 17
Tecmo koei 0 0 0 0 0 0 0 0 0 0 ... 12 14 1 0 5 47 3 13 39 50
Square enix 0 0 0 0 0 0 0 0 0 0 ... 3 6 0 4 0 129 16 4 0 9
Action 183 310 248 193 148 194 79 90 101 93 ... 0 0 0 0 0 0 0 0 0 0
Adventure 13 25 58 59 53 47 35 41 31 12 ... 0 0 0 0 0 0 0 0 0 0
Fighting 39 7 134 19 20 71 18 30 37 1 ... 0 0 0 0 0 0 0 0 0 0
Misc 46 103 97 151 77 66 100 128 62 27 ... 0 0 0 0 0 0 0 0 0 0
Platform 16 60 19 70 40 85 112 66 52 11 ... 0 0 0 0 0 0 0 0 0 0
Puzzle 7 7 20 24 10 17 74 12 22 1 ... 0 0 0 0 0 0 0 0 0 0
Racing 159 74 27 52 13 101 37 65 48 20 ... 0 0 0 0 0 0 0 0 0 0
Role-playing 35 41 151 41 37 8 106 49 64 6 ... 0 0 0 0 0 0 0 0 0 0
Shooter 139 159 37 92 40 36 26 51 40 65 ... 0 0 0 0 0 0 0 0 0 0
Simulation 116 23 29 119 86 27 29 15 12 4 ... 0 0 0 0 0 0 0 0 0 0
Sports 561 144 51 72 280 31 55 124 135 151 ... 0 0 0 0 0 0 0 0 0 0
Strategy 37 22 61 29 28 32 32 12 35 22 ... 0 0 0 0 0 0 0 0 0 0

26 rows × 26 columns

Chord Diagram

Time to visualise the co-occurrence of types using a chord diagram. We are going to use a list of custom colours that represent the types.

In [13]:
left[5] = 'THQ'
In [14]:
left[0] = 'EA'
left[2] = 'Namco'
left[4] = 'Konami'
left[7] = 'Sony'
left[9] = 'Take-Two'
left[-1] = "Square"
left[-2]= 'Tecmo'
left
Out[14]:
['EA',
 'Activision',
 'Namco',
 'Ubisoft',
 'Konami',
 'THQ',
 'Nintendo',
 'Sony',
 'Sega',
 'Take-Two',
 'Capcom',
 'Atari',
 'Tecmo',
 'Square']
In [15]:
right[7] = 'RPG'
right
Out[15]:
['Action',
 'Adventure',
 'Fighting',
 'Misc',
 'Platform',
 'Puzzle',
 'Racing',
 'RPG',
 'Shooter',
 'Simulation',
 'Sports',
 'Strategy']
In [16]:
colors =["#312f85",
         "#f4e301",
         "#f75802",
         "#3e4682",
         "#ad0332",
         "#666769",
         "#e80113",
         "#f78700",
         "#0100f4",
         "#1272c3","#f7cd01","#dd1a22","#00407b","#f70000",
         
        "#ff4400","#ffcc00","#5c6633","#00e63d","#00d6e6","#566d73","#3d85f2","#00fff2","#0000e6","#290066","#ff80e5","#731d28"]
In [17]:
names = left + right

Finally, we can put it all together.
In [18]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,noun="titles",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
Chord Diagram

Chord Diagram with Names

It would be nice to show a list of Pokémon names when hovering over co-occurring Pokémon types. To do this, we can make use of the optional details parameter.

In [ ]:
 

Next, we'll create an empty multi-dimensional array with the same shape as our matrix.

In [19]:
details = np.empty((len(names),len(names)),dtype=object)
details_thumbs = np.empty((len(names),len(names)),dtype=object)

Now we can populate the details array with lists of Pokémon names in the correct positions.

In [20]:
for count_x, item_x in enumerate(names):
    for count_y, item_y in enumerate(names):
        details_urls = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['url'].to_list()
        
        details_names = data[
            (data['species'].isin([item_x, item_y])) &
            (data['personality'].isin([item_y, item_x]))]['name'].to_list()
        
        urls_names = np.column_stack((details_urls, details_names))
        if(urls_names.size > 0):
            details[count_x][count_y] = details_names
            details_thumbs[count_x][count_y] = details_urls

        else:
            details[count_x][count_y] = []
            details_thumbs[count_x][count_y] = []

details=pd.DataFrame(details).values.tolist()
details_thumbs=pd.DataFrame(details_thumbs).values.tolist()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-20-f54e3a0b2906> in <module>
      2     for count_y, item_y in enumerate(names):
      3         details_urls = data[
----> 4             (data['species'].isin([item_x, item_y])) &
      5             (data['personality'].isin([item_y, item_x]))]['url'].to_list()
      6 

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/miniconda3/envs/analytics/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'species'
In [ ]:
len(right)

Finally, we can put it all together but this time with the details matrix passed in.

In [ ]:
Chord(d.values.tolist(), names,credit=True, colors=colors, wrap_labels=False,
      margin=40, font_size_large=7,details=details,details_thumbs=details_thumbs,noun="villagers",
        details_separator="", divide=True, divide_idx=len(left),divide_size=.2, width=850).show()
In [ ]:
np.empty(shape=(6,1)).tolist()

Conclusion

In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!