Data Analysis with Rust Notebooks

A practical book on Data Analysis with Rust Notebooks that teaches you the concepts and how they’re implemented in practice.

Get the book

Better Plotting with Plotly

Preamble

In [2]:
extern crate plotly;
extern crate nanoid;

use plotly::{Plot, Scatter};
use plotly::common::{Mode};
use nanoid::nanoid;
use std::fs;

let plotly_file = "temp_plot.html";

Introduction

In the last section, we covered how to get plotting with Ploty using Plotly for Rust paired with our very own workaround. If you continued experimenting with this approach before starting this section you may have encountered some limitations:

  • File size. The notebook file from the previous section, plotting-with-plotly.ipynb, weighed in at around $3.4$ MB. This is an unusually large file for what was only a few paragraphs and a single interactive plot.
  • Multiple plots. If you tried to output a second Plotly plot in the same notebook, only the first one would be rendered.
  • File size, again. If you did solve the issue regarding multiple plots, your file size would grow linearly for every plot output. A second plot would take you from $3.4$ MB to $6.8$ MB.

We're going to improve our workaround so that we can produce many of our nice interactive plots without bloating our notebooks and any HTML files we may save to.

Example Plotly Plot

Let's use the code from the previous section to generate our plot. We will then save this to a file as HTML, and load it back into a string for further processing.

In [3]:
let trace1 = Scatter::new(vec![1, 2, 3, 4], vec![10, 15, 13, 17])
    .name("trace1")
    .mode(Mode::Markers);
let trace2 = Scatter::new(vec![2, 3, 4, 5], vec![16, 5, 11, 9])
    .name("trace2")
    .mode(Mode::Lines);
let trace3 = Scatter::new(vec![1, 2, 3, 4], vec![12, 9, 15, 12]).name("trace3");

let mut plot = Plot::new();

plot.add_trace(trace1);
plot.add_trace(trace2);
plot.add_trace(trace3);

plot.to_html(plotly_file);

let plotly_contents = fs::read_to_string(plotly_file).unwrap();

Reducing the File Size

If you open the HTML output that was saved to temp_plot.html, you may notice that the entire contents of plotly.js have also been embedded. This will be true for all output created by Plotly for Rust's .to_html() function. This also means that if we have two of these plots in our notebook using the workaround, we will have two copies of plotly.js also embedded. Because we're using the Plotly Jupyter Lab extension, @jupyterlab/plotly-extension, we don't need to embed plotly.js at all.

So let's extract the part of this HTML file that we actually need. We can do this by slicing out a substring starting from one part of the string that we know starts off the part we need, <div id=\"plotly-html-element\" class=\"plotly-graph-div

In [4]:
let start_bytes = plotly_contents
    .find("<div id=\"plotly-html-element\" class=\"plotly-graph-div\"")
    .unwrap_or(0);

and ending at another part that we know immediately follows the last part we need </div></body></html>.

In [5]:
let end_bytes = plotly_contents
    .find("\n</div>\n</body>\n</html>")
    .unwrap_or(plotly_contents.len());

Let's print out our substring to see what we've ended up with.

In [6]:
&plotly_contents[start_bytes..end_bytes]
Out[6]:
"<div id=\"plotly-html-element\" class=\"plotly-graph-div\" style=\"height:100%; width:100%;\"></div>\n    <div ><img id=\"image-export\" class=\"plotly-graph-div\" hidden></img></div>\n    <script type=\"text/javascript\">\n                \n                window.PLOTLYENV=window.PLOTLYENV || {};                    \n                if (document.getElementById(\"plotly-html-element\")) {\n\n                    var d3 = Plotly.d3;\n                    var image_element= d3.select(\'#image-export\');\n\n                    var trace_0 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[10,15,13,17],\"name\":\"trace1\",\"mode\":\"markers\"};\nvar trace_1 = {\"type\":\"scatter\",\"x\":[2,3,4,5],\"y\":[16,5,11,9],\"name\":\"trace2\",\"mode\":\"lines\"};\nvar trace_2 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[12,9,15,12],\"name\":\"trace3\"};\n\nvar data = [trace_0,trace_1,trace_2];\nvar layout = {};\n\n\n                    Plotly.newPlot(\'plotly-html-element\', data, layout,\n                        {\"responsive\": true})\n                        .then(\n                            function(gd) {\n                              Plotly.toImage(gd,{height:0,width:0})\n                                 .then(\n                                     function(url) {\n                                         if(false) {\n                                             image_element.attr(\"src\", url);\n                                             return Plotly.toImage(gd,{format:\'\',height:0,width:0});\n                                         }\n                                    })\n                            });\n\n                };\n\n\n    </script>"

This now looks to be dramatically smaller in file size.

Allowing Multiple Plots

However, you may have noticed a clue as to why we can only properly output a single Plotly plot per notebook. This is because of <div id=\"plotly-html-element\", meaning that every plot will have the same ID. In the Python version of Plotly, each plot has a randomly generated ID, so let's do the same using nanoid.

In [7]:
nanoid!()
Out[7]:
"I8bxdrJZBfbPpPX8rOkud"

If we replace every occurrence of the original ID, plotly-html-element, with a new one generated by nanoid, then we should be able to output multiple plots.

In [8]:
&plotly_contents[start_bytes..end_bytes]
        .replace("plotly-html-element", Box::leak(nanoid!().into_boxed_str()))
Out[8]:
"<div id=\"5vwd8kP3GIIjXLVy98H3I\" class=\"plotly-graph-div\" style=\"height:100%; width:100%;\"></div>\n    <div ><img id=\"image-export\" class=\"plotly-graph-div\" hidden></img></div>\n    <script type=\"text/javascript\">\n                \n                window.PLOTLYENV=window.PLOTLYENV || {};                    \n                if (document.getElementById(\"5vwd8kP3GIIjXLVy98H3I\")) {\n\n                    var d3 = Plotly.d3;\n                    var image_element= d3.select(\'#image-export\');\n\n                    var trace_0 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[10,15,13,17],\"name\":\"trace1\",\"mode\":\"markers\"};\nvar trace_1 = {\"type\":\"scatter\",\"x\":[2,3,4,5],\"y\":[16,5,11,9],\"name\":\"trace2\",\"mode\":\"lines\"};\nvar trace_2 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[12,9,15,12],\"name\":\"trace3\"};\n\nvar data = [trace_0,trace_1,trace_2];\nvar layout = {};\n\n\n                    Plotly.newPlot(\'5vwd8kP3GIIjXLVy98H3I\', data, layout,\n                        {\"responsive\": true})\n                        .then(\n                            function(gd) {\n                              Plotly.toImage(gd,{height:0,width:0})\n                                 .then(\n                                     function(url) {\n                                         if(false) {\n                                             image_element.attr(\"src\", url);\n                                             return Plotly.toImage(gd,{format:\'\',height:0,width:0});\n                                         }\n                                    })\n                            });\n\n                };\n\n\n    </script>"

Loading Plotly with RequireJS

Now that we've stopped embedding the entire contents of plotly.js in our notebooks, we'll need some way to load in plotly.js to view our visualisations. There are many different solutions to this problem, such as the @jupyterlab/plotly-extension Jupyter Lab extension that was previously used in this book. However, a solution that is more suitable for our use cases is to use RequireJS, a JavaScript file and module loader, and the @jupyterlab_requirejs Jupyter Lab extension to view our visualisation within our notebooks.

To achieve this, we'll need to wrap our Plotly JavaScript like the following:

require(["plotly"], function(Plotly) {
    // Plotly code here
});

We know our Plotly scripts will always begin with :

window.PLOTLYENV=

and end in:

};\n\n\n    </script>

So let's take advange of this and use .replace() to inject our wrapper.

In [9]:
&plotly_contents[start_bytes..end_bytes]
        .replace("plotly-html-element", Box::leak(nanoid!().into_boxed_str()))
        .replace("window.PLOTLYENV=",
                 "require([\"plotly\"], function(Plotly) { window.PLOTLYENV=")
        .replace("};\n\n\n    </script>","};\n\n\n});    </script>")
Out[9]:
"<div id=\"CdYUrq7T7DzrieMPDiHC9\" class=\"plotly-graph-div\" style=\"height:100%; width:100%;\"></div>\n    <div ><img id=\"image-export\" class=\"plotly-graph-div\" hidden></img></div>\n    <script type=\"text/javascript\">\n                \n                require([\"plotly\"], function(Plotly) { window.PLOTLYENV=window.PLOTLYENV || {};                    \n                if (document.getElementById(\"CdYUrq7T7DzrieMPDiHC9\")) {\n\n                    var d3 = Plotly.d3;\n                    var image_element= d3.select(\'#image-export\');\n\n                    var trace_0 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[10,15,13,17],\"name\":\"trace1\",\"mode\":\"markers\"};\nvar trace_1 = {\"type\":\"scatter\",\"x\":[2,3,4,5],\"y\":[16,5,11,9],\"name\":\"trace2\",\"mode\":\"lines\"};\nvar trace_2 = {\"type\":\"scatter\",\"x\":[1,2,3,4],\"y\":[12,9,15,12],\"name\":\"trace3\"};\n\nvar data = [trace_0,trace_1,trace_2];\nvar layout = {};\n\n\n                    Plotly.newPlot(\'CdYUrq7T7DzrieMPDiHC9\', data, layout,\n                        {\"responsive\": true})\n                        .then(\n                            function(gd) {\n                              Plotly.toImage(gd,{height:0,width:0})\n                                 .then(\n                                     function(url) {\n                                         if(false) {\n                                             image_element.attr(\"src\", url);\n                                             return Plotly.toImage(gd,{format:\'\',height:0,width:0});\n                                         }\n                                    })\n                            });\n\n                };\n\n\n});    </script>"

Putting Everything Together

Let's put everything together and demonstrate our ability to output multiple plots.

The following will be the first plot.

In [10]:
println!("EVCXR_BEGIN_CONTENT text/html\n{}\nEVCXR_END_CONTENT",
    format!("<div>{}</div>",
        &plotly_contents[start_bytes..end_bytes]
        .replace("plotly-html-element", Box::leak(nanoid!().into_boxed_str()))
        .replace("window.PLOTLYENV=",
                 "require([\"plotly\"], function(Plotly) { window.PLOTLYENV=")
        .replace("};\n\n\n    </script>","};\n\n\n});    </script>")));
Out[10]:

The following will be the second plot.

In [11]:
println!("EVCXR_BEGIN_CONTENT text/html\n{}\nEVCXR_END_CONTENT",
    format!("<div>{}</div>",
        &plotly_contents[start_bytes..end_bytes]
        .replace("plotly-html-element", Box::leak(nanoid!().into_boxed_str()))
        .replace("window.PLOTLYENV=",
                 "require([\"plotly\"], function(Plotly) { window.PLOTLYENV=")
        .replace("};\n\n\n    </script>","};\n\n\n});    </script>")));

We can now see two plots, with this notebook currently weighing in at a file size of only $16$ KB. We can also see that I have surrounded the HTML for our ploty with a <div>, this was needed to ensure the full plot is visible in a notebook cell.

Finally, let's clean up by deleting our temporary HTML file.

In [12]:
fs::remove_file(plotly_file)?;

Conclusion

In this section, we've improved our workaround for data visualisation with Plotly for Rust in Jupyter notebooks. We achieved this by stripping out excess JavaScript to reduce the file size and generating random IDs to allow multiple plots. In the next section, we'll implement all of this into a single function so that we can visualise our data easily in the upcoming sections.

Support this work

You can access this notebook and more by getting the e-book on Data Analysis with Rust Notebooks.