Data is Beautiful
A practical book on data visualisation that shows you how to create static and interactive visualisations that are engaging and beautiful.
Get the book
Co-occurrence of Anime Genres with Chord Diagrams
Contents Chat Share Follow Download Source
Made with Chord Pro
You can create beautiful interactive visualisations like this one with Chord Pro. Learn how to make beautiful visualisations with the book, Data is Beautiful.
Preamble¶
import numpy as np # for multi-dimensional containers
import pandas as pd # for DataFrames
import itertools
from ast import literal_eval
from chord import Chord
Introduction¶
In this section, we're going to use the MyAnimeList dataset to visualise the co-occurrence of anime genres.
The Dataset¶
The dataset documentation states that we can expect 31 variables per each of the 14478 entries. Let's download the mirrored dataset and have a look for ourselves.
data_url = 'https://datacrayon.com/datasets/anime_list.csv'
data = pd.read_csv(data_url)
data.head()
anime_id | title | title_english | title_japanese | title_synonyms | image_url | type | source | episodes | status | ... | background | premiered | broadcast | related | producer | licensor | studio | genre | opening_theme | ending_theme | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 11013 | Inu x Boku SS | Inu X Boku Secret Service | 妖狐×僕SS | Youko x Boku SS | https://myanimelist.cdn-dena.com/images/anime/... | TV | Manga | 12 | Finished Airing | ... | Inu x Boku SS was licensed by Sentai Filmworks... | Winter 2012 | Fridays at Unknown | {'Adaptation': [{'mal_id': 17207, 'type': 'man... | Aniplex, Square Enix, Mainichi Broadcasting Sy... | Sentai Filmworks | David Production | Comedy, Supernatural, Romance, Shounen | ['"Nirvana" by MUCC'] | ['#1: "Nirvana" by MUCC (eps 1, 11-12)', '#2: ... |
1 | 2104 | Seto no Hanayome | My Bride is a Mermaid | 瀬戸の花嫁 | The Inland Sea Bride | https://myanimelist.cdn-dena.com/images/anime/... | TV | Manga | 26 | Finished Airing | ... | NaN | Spring 2007 | Unknown | {'Adaptation': [{'mal_id': 759, 'type': 'manga... | TV Tokyo, AIC, Square Enix, Sotsu | Funimation | Gonzo | Comedy, Parody, Romance, School, Shounen | ['"Romantic summer" by SUN&LUNAR'] | ['#1: "Ashita e no Hikari (明日への光)" by Asuka Hi... |
2 | 5262 | Shugo Chara!! Doki | Shugo Chara!! Doki | しゅごキャラ!!どきっ | Shugo Chara Ninenme, Shugo Chara! Second Year | https://myanimelist.cdn-dena.com/images/anime/... | TV | Manga | 51 | Finished Airing | ... | NaN | Fall 2008 | Unknown | {'Adaptation': [{'mal_id': 101, 'type': 'manga... | TV Tokyo, Sotsu | NaN | Satelight | Comedy, Magic, School, Shoujo | ['#1: "Minna no Tamago (みんなのたまご)" by Shugo Cha... | ['#1: "Rottara Rottara (ロッタラ ロッタラ)" by Buono! ... |
3 | 721 | Princess Tutu | Princess Tutu | プリンセスチュチュ | NaN | https://myanimelist.cdn-dena.com/images/anime/... | TV | Original | 38 | Finished Airing | ... | Princess Tutu aired in two parts. The first pa... | Summer 2002 | Fridays at Unknown | {'Adaptation': [{'mal_id': 1581, 'type': 'mang... | Memory-Tech, GANSIS, Marvelous AQL | ADV Films | Hal Film Maker | Comedy, Drama, Magic, Romance, Fantasy | ['"Morning Grace" by Ritsuko Okazaki'] | ['"Watashi No Ai Wa Chiisaikeredo" by Ritsuko ... |
4 | 12365 | Bakuman. 3rd Season | Bakuman. | バクマン。 | Bakuman Season 3 | https://myanimelist.cdn-dena.com/images/anime/... | TV | Manga | 25 | Finished Airing | ... | NaN | Fall 2012 | Unknown | {'Adaptation': [{'mal_id': 9711, 'type': 'mang... | NHK, Shueisha | NaN | J.C.Staff | Comedy, Drama, Romance, Shounen | ['#1: "Moshimo no Hanashi (もしもの話)" by nano.RIP... | ['#1: "Pride on Everyday" by Sphere (eps 1-13)... |
5 rows × 31 columns
It looks good so far, but let's confirm the 31 variables against 14478 samples from the documentation.
data.shape
(14478, 31)
Perfect, that's exactly what we were expecting.
Data Wrangling¶
We need to do a bit of data wrangling before we can visualise our data. We can see from the column names there's a single column for genres, containing comma separated values.
Let's convert them to lists of strings.
def get_list(x):
if isinstance(x, int):
return []
if isinstance(x,str):
result = [s.strip() for s in x.split(',')]
return sorted(result)
return []
genres = data['genre'].apply(get_list)
pd.DataFrame(genres)
genre | |
---|---|
0 | [Comedy, Romance, Shounen, Supernatural] |
1 | [Comedy, Parody, Romance, School, Shounen] |
2 | [Comedy, Magic, School, Shoujo] |
3 | [Comedy, Drama, Fantasy, Magic, Romance] |
4 | [Comedy, Drama, Romance, Shounen] |
... | ... |
14473 | [Kids] |
14474 | [Comedy] |
14475 | [Action, Adventure, Fantasy, Sci-Fi] |
14476 | [Fantasy, Kids] |
14477 | [Comedy] |
14478 rows × 1 columns
Without further investigation, we can see that we have at least a few empty list values, []
, and a few single-entry lists in the table above, so let's remove all samples which contain an empty or single-entry list.
genres = genres[genres.str.len() > 1]
pd.DataFrame(genres)
genre | |
---|---|
0 | [Comedy, Romance, Shounen, Supernatural] |
1 | [Comedy, Parody, Romance, School, Shounen] |
2 | [Comedy, Magic, School, Shoujo] |
3 | [Comedy, Drama, Fantasy, Magic, Romance] |
4 | [Comedy, Drama, Romance, Shounen] |
... | ... |
14467 | [Drama, Kids] |
14469 | [Kids, School] |
14471 | [Drama, Fantasy, Kids] |
14475 | [Action, Adventure, Fantasy, Sci-Fi] |
14476 | [Fantasy, Kids] |
10974 rows × 1 columns
Our chord diagram will need two inputs: the co-occurrence matrix, and a list of names to label the segments.
We can build a co-occurrence matrix with the following approach. We'll start by cgetting all combinations within each list.
genres = [list(itertools.combinations(i,2)) for i in genres]
pd.DataFrame(genres)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | (Comedy, Romance) | (Comedy, Shounen) | (Comedy, Supernatural) | (Romance, Shounen) | (Romance, Supernatural) | (Shounen, Supernatural) | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
1 | (Comedy, Parody) | (Comedy, Romance) | (Comedy, School) | (Comedy, Shounen) | (Parody, Romance) | (Parody, School) | (Parody, Shounen) | (Romance, School) | (Romance, Shounen) | (School, Shounen) | ... | None | None | None | None | None | None | None | None | None | None |
2 | (Comedy, Magic) | (Comedy, School) | (Comedy, Shoujo) | (Magic, School) | (Magic, Shoujo) | (School, Shoujo) | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
3 | (Comedy, Drama) | (Comedy, Fantasy) | (Comedy, Magic) | (Comedy, Romance) | (Drama, Fantasy) | (Drama, Magic) | (Drama, Romance) | (Fantasy, Magic) | (Fantasy, Romance) | (Magic, Romance) | ... | None | None | None | None | None | None | None | None | None | None |
4 | (Comedy, Drama) | (Comedy, Romance) | (Comedy, Shounen) | (Drama, Romance) | (Drama, Shounen) | (Romance, Shounen) | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10969 | (Drama, Kids) | None | None | None | None | None | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
10970 | (Kids, School) | None | None | None | None | None | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
10971 | (Drama, Fantasy) | (Drama, Kids) | (Fantasy, Kids) | None | None | None | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
10972 | (Action, Adventure) | (Action, Fantasy) | (Action, Sci-Fi) | (Adventure, Fantasy) | (Adventure, Sci-Fi) | (Fantasy, Sci-Fi) | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
10973 | (Fantasy, Kids) | None | None | None | None | None | None | None | None | None | ... | None | None | None | None | None | None | None | None | None | None |
10974 rows × 78 columns
Now we will flatten the nested lists, this will give us all the genre pairings in original and reversed order.
genres = list(itertools.chain.from_iterable((i, i[::-1]) for c_ in genres for i in c_))
pd.DataFrame(genres)
0 | 1 | |
---|---|---|
0 | Comedy | Romance |
1 | Romance | Comedy |
2 | Comedy | Shounen |
3 | Shounen | Comedy |
4 | Comedy | Supernatural |
... | ... | ... |
119691 | Sci-Fi | Adventure |
119692 | Fantasy | Sci-Fi |
119693 | Sci-Fi | Fantasy |
119694 | Fantasy | Kids |
119695 | Kids | Fantasy |
119696 rows × 2 columns
Which we can now use to create the matrix.
matrix = pd.pivot_table(
pd.DataFrame(genres), index=0, columns=1, aggfunc="size", fill_value=0
).values.tolist()
We can list this using a DataFrame
for better presentation.
pd.DataFrame(matrix)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1022 | 34 | 985 | 12 | 165 | 550 | 201 | 860 | 103 | ... | 13 | 38 | 232 | 141 | 390 | 475 | 32 | 56 | 3 | 2 |
1 | 1022 | 0 | 22 | 994 | 6 | 97 | 428 | 67 | 1054 | 80 | ... | 7 | 70 | 134 | 39 | 136 | 229 | 6 | 10 | 0 | 0 |
2 | 34 | 22 | 0 | 11 | 0 | 0 | 12 | 0 | 1 | 8 | ... | 0 | 2 | 1 | 36 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 985 | 994 | 11 | 0 | 28 | 103 | 499 | 500 | 952 | 65 | ... | 41 | 835 | 79 | 246 | 249 | 439 | 11 | 52 | 12 | 11 |
4 | 12 | 6 | 0 | 28 | 0 | 1 | 21 | 3 | 18 | 1 | ... | 1 | 1 | 2 | 1 | 0 | 16 | 4 | 0 | 0 | 0 |
5 | 165 | 97 | 0 | 103 | 1 | 0 | 41 | 24 | 166 | 8 | ... | 0 | 3 | 1 | 1 | 25 | 194 | 4 | 11 | 0 | 1 |
6 | 550 | 428 | 12 | 499 | 21 | 41 | 0 | 54 | 369 | 19 | ... | 37 | 336 | 137 | 148 | 58 | 236 | 36 | 23 | 23 | 1 |
7 | 201 | 67 | 0 | 500 | 3 | 24 | 54 | 0 | 136 | 11 | ... | 0 | 35 | 10 | 22 | 45 | 97 | 0 | 13 | 0 | 8 |
8 | 860 | 1054 | 1 | 952 | 18 | 166 | 369 | 136 | 0 | 101 | ... | 11 | 132 | 17 | 11 | 110 | 360 | 7 | 33 | 0 | 3 |
9 | 103 | 80 | 8 | 65 | 1 | 8 | 19 | 11 | 101 | 0 | ... | 0 | 20 | 3 | 7 | 1 | 14 | 6 | 0 | 0 | 0 |
10 | 73 | 19 | 0 | 225 | 0 | 18 | 53 | 156 | 74 | 1 | ... | 1 | 20 | 6 | 1 | 10 | 55 | 0 | 11 | 0 | 2 |
11 | 41 | 15 | 0 | 52 | 2 | 57 | 28 | 0 | 68 | 0 | ... | 0 | 1 | 4 | 3 | 5 | 63 | 0 | 3 | 11 | 32 |
12 | 224 | 214 | 1 | 195 | 3 | 56 | 335 | 7 | 184 | 4 | ... | 7 | 82 | 3 | 6 | 15 | 135 | 3 | 3 | 2 | 0 |
13 | 139 | 62 | 1 | 71 | 30 | 74 | 88 | 14 | 94 | 2 | ... | 3 | 1 | 11 | 2 | 10 | 201 | 21 | 35 | 1 | 0 |
14 | 17 | 8 | 0 | 39 | 0 | 4 | 33 | 0 | 18 | 3 | ... | 3 | 29 | 0 | 3 | 0 | 17 | 0 | 3 | 0 | 0 |
15 | 134 | 476 | 23 | 596 | 1 | 22 | 224 | 3 | 588 | 34 | ... | 0 | 122 | 19 | 35 | 36 | 48 | 0 | 5 | 0 | 0 |
16 | 316 | 278 | 0 | 407 | 0 | 57 | 131 | 89 | 538 | 19 | ... | 4 | 60 | 5 | 0 | 52 | 168 | 7 | 7 | 0 | 0 |
17 | 236 | 103 | 0 | 101 | 1 | 20 | 41 | 26 | 72 | 3 | ... | 1 | 15 | 2 | 20 | 86 | 26 | 0 | 0 | 0 | 1 |
18 | 623 | 302 | 10 | 224 | 8 | 4 | 188 | 35 | 73 | 16 | ... | 0 | 13 | 182 | 13 | 44 | 22 | 1 | 0 | 1 | 0 |
19 | 304 | 90 | 0 | 61 | 3 | 14 | 193 | 19 | 52 | 5 | ... | 0 | 9 | 137 | 4 | 18 | 31 | 2 | 6 | 1 | 1 |
20 | 64 | 35 | 3 | 137 | 51 | 4 | 114 | 9 | 89 | 7 | ... | 3 | 113 | 28 | 19 | 6 | 23 | 0 | 5 | 3 | 1 |
21 | 206 | 154 | 0 | 210 | 11 | 19 | 140 | 10 | 79 | 14 | ... | 5 | 24 | 6 | 3 | 44 | 188 | 48 | 23 | 0 | 0 |
22 | 88 | 27 | 2 | 453 | 8 | 6 | 6 | 40 | 55 | 15 | ... | 3 | 29 | 11 | 11 | 44 | 18 | 1 | 3 | 1 | 2 |
23 | 105 | 73 | 5 | 114 | 2 | 1 | 29 | 6 | 3 | 0 | ... | 1 | 5 | 3 | 5 | 1 | 11 | 9 | 1 | 0 | 0 |
24 | 64 | 24 | 0 | 32 | 29 | 6 | 117 | 6 | 32 | 16 | ... | 4 | 12 | 7 | 2 | 2 | 74 | 38 | 0 | 1 | 0 |
25 | 277 | 244 | 2 | 838 | 6 | 64 | 606 | 243 | 306 | 13 | ... | 45 | 241 | 45 | 39 | 32 | 224 | 9 | 24 | 21 | 3 |
26 | 114 | 38 | 0 | 53 | 0 | 10 | 39 | 8 | 23 | 0 | ... | 0 | 1 | 0 | 0 | 11 | 21 | 0 | 1 | 2 | 0 |
27 | 227 | 43 | 1 | 831 | 3 | 25 | 238 | 209 | 125 | 29 | ... | 13 | 361 | 7 | 135 | 62 | 140 | 4 | 16 | 3 | 6 |
28 | 1143 | 695 | 10 | 676 | 20 | 32 | 464 | 110 | 262 | 40 | ... | 6 | 72 | 377 | 41 | 143 | 98 | 18 | 6 | 3 | 2 |
29 | 256 | 105 | 21 | 373 | 2 | 20 | 144 | 90 | 65 | 12 | ... | 0 | 153 | 24 | 40 | 26 | 102 | 16 | 13 | 0 | 3 |
30 | 76 | 77 | 0 | 242 | 1 | 31 | 191 | 0 | 205 | 1 | ... | 15 | 119 | 1 | 23 | 9 | 66 | 1 | 14 | 0 | 0 |
31 | 13 | 1 | 0 | 33 | 1 | 0 | 20 | 14 | 9 | 0 | ... | 0 | 22 | 0 | 0 | 2 | 4 | 0 | 0 | 0 | 2 |
32 | 809 | 675 | 19 | 963 | 1 | 73 | 266 | 114 | 431 | 63 | ... | 0 | 85 | 63 | 276 | 183 | 249 | 7 | 24 | 0 | 0 |
33 | 13 | 7 | 0 | 41 | 1 | 0 | 37 | 0 | 11 | 0 | ... | 0 | 10 | 0 | 2 | 1 | 16 | 1 | 4 | 1 | 0 |
34 | 38 | 70 | 2 | 835 | 1 | 3 | 336 | 35 | 132 | 20 | ... | 10 | 0 | 7 | 39 | 8 | 93 | 1 | 1 | 2 | 0 |
35 | 232 | 134 | 1 | 79 | 2 | 1 | 137 | 10 | 17 | 3 | ... | 0 | 7 | 0 | 3 | 9 | 7 | 0 | 0 | 0 | 0 |
36 | 141 | 39 | 36 | 246 | 1 | 1 | 148 | 22 | 11 | 7 | ... | 2 | 39 | 3 | 0 | 11 | 3 | 0 | 0 | 2 | 0 |
37 | 390 | 136 | 0 | 249 | 0 | 25 | 58 | 45 | 110 | 1 | ... | 1 | 8 | 9 | 11 | 0 | 90 | 3 | 8 | 0 | 0 |
38 | 475 | 229 | 1 | 439 | 16 | 194 | 236 | 97 | 360 | 14 | ... | 16 | 93 | 7 | 3 | 90 | 0 | 39 | 86 | 2 | 0 |
39 | 32 | 6 | 0 | 11 | 4 | 4 | 36 | 0 | 7 | 6 | ... | 1 | 1 | 0 | 0 | 3 | 39 | 0 | 1 | 0 | 0 |
40 | 56 | 10 | 0 | 52 | 0 | 11 | 23 | 13 | 33 | 0 | ... | 4 | 1 | 0 | 0 | 8 | 86 | 1 | 0 | 0 | 0 |
41 | 3 | 0 | 0 | 12 | 0 | 0 | 23 | 0 | 0 | 0 | ... | 1 | 2 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 0 |
42 | 2 | 0 | 0 | 11 | 0 | 1 | 1 | 8 | 3 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
43 rows × 43 columns
Now for the names of our genres.
names = np.unique(genres).tolist()
pd.DataFrame(names)
0 | |
---|---|
0 | Action |
1 | Adventure |
2 | Cars |
3 | Comedy |
4 | Dementia |
5 | Demons |
6 | Drama |
7 | Ecchi |
8 | Fantasy |
9 | Game |
10 | Harem |
11 | Hentai |
12 | Historical |
13 | Horror |
14 | Josei |
15 | Kids |
16 | Magic |
17 | Martial Arts |
18 | Mecha |
19 | Military |
20 | Music |
21 | Mystery |
22 | Parody |
23 | Police |
24 | Psychological |
25 | Romance |
26 | Samurai |
27 | School |
28 | Sci-Fi |
29 | Seinen |
30 | Shoujo |
31 | Shoujo Ai |
32 | Shounen |
33 | Shounen Ai |
34 | Slice of Life |
35 | Space |
36 | Sports |
37 | Super Power |
38 | Supernatural |
39 | Thriller |
40 | Vampire |
41 | Yaoi |
42 | Yuri |
We may wish to remove some genres for our visualisation. The example below will remove a single genre from the co-occurrence matrix and list of names, however, if you add more genre names to the discarded_categories
list it will work for them too.
matrix = pd.DataFrame(matrix)
names = pd.DataFrame(names)
discarded_categories = ["Hentai", "Yaoi", "Yuri", "Ecchi",
"Shounen Ai", "Shoujo Ai"]
discard_mask = names.isin(discarded_categories).values
discard_indices = names[discard_mask].index
for drop_idx in discard_indices:
matrix = matrix.drop(drop_idx, axis=1)
matrix = matrix.drop(drop_idx, axis=0)
names = names.drop(drop_idx, axis=0)
Chord Diagram¶
Time to visualise the co-occurrence of genres using a chord diagram. We are going to use a list of custom colours that represent the genres.
colors = ["#660000", "#734139", "#e59173", "#ff4400", "#332b26", "#593000",
"#998773", "#d97400", "#8c5e00", "#f2ca79", "#ffcc00", "#59562d",
"#736b00", "#c2cc33", "#245900", "#8cff40", "#269926", "#ace6ac",
"#40ffa6", "#336655", "#008c5e", "#39e6da", "#ace6e2", "#566d73",
"#39c3e6", "#1d5673", "#3d9df2", "#163159", "#acc3e6", "#000f73",
"#565a73", "#000033", "#8273e6", "#6d00cc", "#633366", "#e2ace6",
"#f23de6", "#cc0088", "#590024", "#cc0036", "#f27999", "#e6acb4"];
Finally, we can put it all together.
Chord(
matrix.values.tolist(),
names.values.tolist(),
padding=0.03,
colors=colors,
wrap_labels=False,
margin=40,
font_size="14px",
font_size_large="14px",
credit=True,
noun = "Anime",
allow_download=True
).show()
Conclusion¶
In this section, we demonstrated how to conduct some data wrangling on a downloaded dataset to prepare it for a chord diagram. Our chord diagram is interactive, so you can use your mouse or touchscreen to investigate the co-occurrences!
Made with Chord Pro
You can create beautiful interactive visualisations like this one with Chord Pro. Learn how to make beautiful visualisations with the book, Data is Beautiful.
Support this work
Get the practical book on data visualisation that shows you how to create static and interactive visualisations that are engaging and beautiful.
Data is Beautiful
A practical book on data visualisation that shows you how to create static and interactive visualisations that are engaging and beautiful.
Get the book