Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.

Get the book

Multidimensional Arrays and Operations with NDArray

Preamble

In [2]:
extern crate ndarray;

use std::fs;

This module contains the most used types, type aliases, traits and functions that you can import easily as a group:

In [3]:
use ndarray::prelude::*;

This gives us access to the following: ArrayBase, Array, RcArray, ArrayView, ArrayViewMut, Axis, Dim, Dim, Dimension, Array0, Array1, Array2, Array3, Array4, Array5, Array6, ArrayD, ArrayView0, ArrayView1, ArrayView2, ArrayView3, ArrayView4, ArrayView5, ArrayView6, ArrayViewD, ArrayViewMut0, ArrayViewMut1, ArrayViewMut2, ArrayViewMut3, ArrayViewMut4, ArrayViewMut5, ArrayViewMut6, ArrayViewMutD, Ix0, Ix0, Ix1, Ix1, Ix2, Ix2, Ix3, Ix3, Ix4, Ix4, Ix5, Ix5, Ix6, Ix6, IxDyn, IxDyn, arr0, arr1, arr2, aview0, aview1, aview2, aview_mut1, ShapeBuilder, NdFloat, and AsArray.

Introduction

The ndarray crate provides us with a multidimensional container that can contain general or numerical elements. If you're familiar with Python, then you can consider it to be similar to the numpy package. With ndarray we get our $n$-dimensional arrays, slicing, views, mathematical operations, and more. We'll need these in later sections to load in our datasets into containers that we can operate on and conduct our analyses.

Creating Arrays

From a Vector

Let's take a look at how we can create a two-dimensional ndarray Array from a Vec with the arr2() function.

In [4]:
arr2(&[[1.,2.,3.],
       [4.,5.,6.]])
Out[4]:
[[1.0, 2.0, 3.0],
 [4.0, 5.0, 6.0]], shape=[2, 3], strides=[3, 1], layout=C (0x1), const ndim=2

It's as easy as that, This has given us a 2 by 3 array with our desired floating point values. We can also use the array! macro as a shorthand for creating an array.

In [5]:
array![[1.,2.,3.],
       [4.,5.,6.]]
Out[5]:
[[1.0, 2.0, 3.0],
 [4.0, 5.0, 6.0]], shape=[2, 3], strides=[3, 1], layout=C (0x1), const ndim=2

Filled with Zeros

We can also construct an array filled with zeros, we can do this with the zeros() function and pass in our desired shape.

In [6]:
Array2::<f64>::zeros((4,4))
Out[6]:
[[0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0]], shape=[4, 4], strides=[4, 1], layout=C (0x1), const ndim=2

Filled with Ones

Similarly, we can also construct an array filled with ones, we can do this with the ones() function and pass in our desired shape.

In [7]:
Array2::<f64>::ones((4,4))
Out[7]:
[[1.0, 1.0, 1.0, 1.0],
 [1.0, 1.0, 1.0, 1.0],
 [1.0, 1.0, 1.0, 1.0],
 [1.0, 1.0, 1.0, 1.0]], shape=[4, 4], strides=[4, 1], layout=C (0x1), const ndim=2

Let's create variables to store a 1D array and a 2D array for use in the following subsections.

In [24]:
let data_1D = array![1.,2.,3.];

let data_2D = array![[1.,2.,3.],
                     [4.,5.,6.]];

Dimensions

It's often the case that we need to find out the dimensionality of our arrays. There are many ways to do this, and the following contains some of the common approaches.

From Length

We can use Array.len() to return the shape along a single axis.

In [9]:
data_1D.len()
Out[9]:
3

This is simple enough if we have a one-dimensional array. However, for higher dimensions, we can see that for a len() returns the flattened length.

In [10]:
data_2D.len()
Out[10]:
6

If we want to get the length along one of the axes instead, e.g. the second one, we can use Array.len_of(Axis(n))

In [11]:
data_2D.len_of(Axis(1))
Out[11]:
3

From Shape

Another approach is to use Array.shape() which returns more information.

In [14]:
data_2D.shape()
Out[14]:
[2, 3]

We can see it has returned an array that indicates the length along all of our axes. This can be indexed to get the length along a specific axis.

In [15]:
data_2D.shape()[1]
Out[15]:
3

Indexing

Like most data structures, the indexing starts at $0$. To access the first element in our one-dimensional arrays we can do the following.

In [16]:
data_1D[0]
Out[16]:
1.0

For higher dimensions, we need to use a primitive array.

In [17]:
data_2D[[0,0]]
Out[17]:
1.0

Likewise, to access the second element in our one-dimensional arrays we need to index with $1$.

In [18]:
data_1D[1]
Out[18]:
2.0

Again, for our higher dimensions, we use a primitive array..

In [19]:
data_2D[[0,1]]
Out[19]:
2.0

To select the last element in our one-dimensional arrays we can index with Array.len() -1.

In [20]:
data_1D[data_1D.len() -1]
Out[20]:
3.0

But for our multidimensional arrays we need to use a primitive array and use Array.len_of(Axis(n)).

In [21]:
data_2D[[0, data_2D.len_of(Axis(1)) -1]]
Out[21]:
3.0

Alternatively, we could use Array.shape()[n].

In [25]:
data_2D[[0, data_2D.shape()[1] - 1]]
Out[25]:
3.0

Mathematics

Let's look at some common mathematical operations that can operate on our arrays.

Summing Array Elements

All elements in an array can be summed with sum().

In [ ]:
data_2D.sum()

We may instead wish to sum all elements along a specific axis in an array, e.g. the first axis.

In [ ]:
data_2D.sum_axis(Axis(0))

Or the second axis:

In [ ]:
data_2D.sum_axis(Axis(1))

Element-wise Operations

It's quite common to apply mathematical operations to each element of an array. Let's have a look at some examples.

Addition

We can add values, e.g. $1.0$, to every element.

In [ ]:
&data_2D + 1.0

We can also add the elements of one array to another.

In [ ]:
&data_2D + &data_2D

Finally, we can add a one-dimensional array to a two-dimensional array.

In [ ]:
&data_2D + &data_1D

Warning

When summing two arrays together they don't need to have the same shape, but their shapes must be compatible. This means we should be able to broadcast one array across another, i.e. they must be identical in the size of at least one dimension.

Subtraction

We can subtract values, e.g. $1.0$, from every element.

In [ ]:
&data_2D - 1.0

We can also subtract elements of one array from another.

In [ ]:
&data_2D - &data_2D

Finally, we can subtract a one-dimensional array from a two-dimensional array array.

In [ ]:
&data_2D - &data_1D

Multiplication

We can multiply every element by a value, e.g. by $2.0$.

In [ ]:
&data_2D * 2.0

We can also multiply every element of one array by another.

In [ ]:
&data_2D * &data_1D

Division

We can divide every element by a value, e.g. by $2.0$.

In [ ]:
&data_2D / 2.0

We can also divide every element of one array by another.

In [ ]:
&data_2D / &data_1D

Power

We can raise the elements in an array to a power, e.g. of $3.0$.

In [ ]:
data_2D.mapv(|data_2D| data_2D.powi(3))

Square root

We can calculate the square root of elements in an array. The specified data type must match.

In [ ]:
data_2D.mapv(f32::sqrt)

Conclusion

In this section, we've introduced ndarray as a crate that gives us multidimensional containers and operations. We demonstrated how to create arrays, find out their dimensionality, index them, and how to invoke some basic mathematical operations.

Support this work

You can access this notebook and more by getting the e-book on Data Analysis with Rust Notebooks.