## Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice. # Unique Array Elements and their Frequency

Contents

## Preamble¶

In :
```:dep darn = {version = "0.3.0"}
:dep ndarray = {version = "0.13.1"}
:dep ndarray-csv = {version = "0.4.1"}
:dep ureq = {version = "0.11.4"}
:dep itertools = {version = "0.9.0"}
:dep plotly = {version = "0.4.0"}
extern crate csv;
extern crate itertools;

use std::io::prelude::*;
use std::fs::*;
use ndarray::prelude::*;
use std::str::FromStr;
use itertools::Itertools;
use plotly::{Plot, Bar, Layout};
use plotly::common::{Mode, Title};
use plotly::layout::{Axis};
use std::collections::HashMap;```

## Introduction¶

In this section, we're going to take a look at some approaches to determining the unique elements in a column of data and their frequency. This is a common task which we may often apply to categorical data, and it can easily give us a good idea of the balance of categorical values across our dataset.

If you're familiar with Python and NumPy, you may have encountered the `numpy.unique()`function that returns a list of sorted unique elements of an array. In more recent versions of NumPy, the `numpy.unique()` function also takes a parameter named `return_counts`, which when set to `True` will also return the number of times each unique element appears in the array.

Let's see how we can do the same for our `ndarray::Array2`.

We will continue using the Iris Flower dataset, so we need to load it into our raw string array first.

In :
```let file_name = "Iris.csv";

let res = ureq::get("https://datacrayon.com/datasets/Iris.csv").call().into_string()?;

let mut file = File::create(file_name)?;
file.write_all(res.as_bytes());
remove_file(file_name)?;

let data: Array2<String>= rdr.deserialize_array2_dynamic().unwrap();
let mut headers : Vec<String> = Vec::new();

};```

### Moving Data to Typed Arrays¶

We need to convert from String to the desired type, and move our data over to the typed arrays.

In :
```let mut features: Array2::<f32> =  Array2::<f32>::zeros((data.shape(),0));

for &f in [1, 2, 3, 4].iter() {
features = ndarray::stack![Axis(1), features,
data.column(f as usize)
.mapv(|elem| f32::from_str(&elem).unwrap())
.insert_axis(Axis(1))];
};

let labels: Array1::<String> = data.column(5).to_owned();```

We will only be using our species labels, stored in `labels`, throughout the rest of this section.

## Unique Elements¶

We can get the unique elements in our array using the `unique()` function provided by the `itertools` crate.

Return an iterator adaptor that filters out elements that have already been produced once during the iteration. Duplicates are detected using hash and equality.

We can then iterate through this and output the elements to the output cell.

In :
```for i in labels.iter().unique() {
println!("{}",i);
};```
```Iris-setosa
Iris-versicolor
Iris-virginica```

We can also use the `.format()` function on `.unique()` to print out our unique elements.

In :
`labels.iter().unique().format("\n")`
Out:
```"Iris-setosa"
"Iris-versicolor"
"Iris-virginica"```

However, we may want to store these in a vector to use later too. Let's store them in `unique_elements`.

In :
```let unique_elements = labels.iter().cloned().unique().collect_vec();

unique_elements```
Out:
`["Iris-setosa", "Iris-versicolor", "Iris-virginica"]`

## Count of Unique Elements¶

In this case it's easy to see we have three unique elements, but if the list is too long or we wanted to store the value for later use, we can use the `len()` function on our vector containing the unique elements.

In :
`unique_elements.len()`
Out:
`3`

We could also get the count from our iterator.

In :
`labels.iter().unique().count()`
Out:
`3`

## Frequency of Unique Elements¶

To find the frequency of a specific string we can use `filter()` and then `count()`.

In :
`labels.iter().filter(|&elem| *elem == "Iris-virginica").count()`
Out:
`50`

We may also want to find the frequency of every unique element and store it in a vector. We'll call this `unique_frequency`.

In :
`let mut unique_frequency = Vec::<usize>::new();`

We can populate this by iterating through the unique elements and use the strings to find the frequency using the `filter()` approach above.

In :
```for unique_elem in unique_elements.iter() {
unique_frequency.push(labels.iter().filter(|&elem| elem == unique_elem).count());
};

unique_frequency```
Out:
`[50, 50, 50]`

## Visualise the Frequency of Unique Elements¶

It's useful to visualise the frequency of elements in our array, especially if we're interested in looking at the balance in our dataset. We can do this with a `Bar` plot using `Plotly`.

In :
```let layout = Layout::new()
.yaxis(Axis::new().title(Title::new("Frequency")));

let freq_bars = Bar::new(unique_elements, unique_frequency);

let mut plot = Plot::new();

plot.set_layout(layout);

darn::show_plot(plot)```

## Unique Elements and their Frequency with Hashmaps¶

We can do something similar with a `HashMap`. First we'll define a variable of type `HashMap<String, i32>`, where the `String` part will be the unique element, and the `i32` will be the frequency.

In :
`let mut value_counts : HashMap<String, i32> = HashMap::new();`

We can then populate this by iterating through our original `labels` array.

In :
```for item in labels.iter() {
*value_counts.entry(String::from(item)).or_insert(0) += 1;
};```

Printing out the results, we can see that `value_counts` now contains all the information that we're after.

In :
`println!("{:#?}", value_counts);`
```{
"Iris-virginica": 50,
"Iris-versicolor": 50,
"Iris-setosa": 50,
}```

We can also get these values directly by key.

In :
`value_counts.get("Iris-versicolor").unwrap()`
Out:
`50`

However, you will need to map this `HashMap` to separate vectors if you want to use it for plotting with Plotly.

## Conclusion¶

In this section, we've demonstrated a few approaches to identifying the unique elements in an array, counting the number of unique elements, and the frequency of these unique elements. We also visualised the frequency of our unique elements which is useful during exploratory data analysis.

Support this work

You can access this notebook and more by getting the e-book on Data Analysis with Rust Notebooks.

## Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice. 