Preamble
import os
Introduction
In this section, we're going to have a look at YAML, which is a recursive acronym for "YAML Ain't Markup Language". It is a data interchange file format that is often found with a .yaml
or .yml
file extension.
What this section is most interested in is using YAML for configuration files, enabling us to extract parameters that we use within our programs so that they can be separated. This means that configuration files can be shared between scripts/programs, and they can be modified without needing to modify source code.
Of course, there are many alternatives such as JavaScript Object Notation (JSON) or Tom's Obvious, Minimal Language (TOML), and they all have their advantages and disadvantages. We won't do a full comparison of YAML vs. alternatives, but some advantages of YAML are:
- It is human-readable, making it easy for someone to read or create them.
- Many popular programming languages have support for managing YAML files.
- YAML is a superset of JSON, meaning that JSON can be easily converted to YAML if needed.
What Does YAML Look Like?
Let's start by showing an example of what a .yaml
file would look like.
learning_rate: 0.1
random_seed: 789108
maintainer: Shahin Rostami
categories:
- hotdog
- not a hotdog
We can see that we have some key-value mappings, where the key appears before the colon and the value appears after. The first key we have is learning_rate
with a value of 0.1
, the second is random_seed
with a value of 789108
, and the third is maintainer
with a value of Shahin Rostami
. Finally, I have included a key mapping to a sequence, where categories
has the value of list which contains the items "hotdog"
and "not a hotdog"
.
You can read more about YAML at the YAML Specification web page.
So that we can work with this file later on in this section, you can paste the above YAML into a file in the same directory as this notebook and name it config.yaml
. Alternatively, you can just run the cell below which will do it for you.
config_string = """learning_rate: 0.1
random_seed: 789108
maintainer: Shahin Rostami
categories:
- hotdog
- not a hotdog"""
with open("config.yaml", "w") as f:
f.write(config_string)
Getting Python Ready for YAML
Before we begin working with YAML files in Python, we need to make sure we have PyYAML installed. There are alternatives to PyYAML available, but they may not be compatible with the following instructions. However, you can use ruamel.yaml
as a drop-in replacement if you wish.
Some options to install PyYAML are with:
Conda-forge and conda
conda install -c conda-forge pyyaml
PyPi and pip
pip install PyYAML
Once you have the package installed you should be ready to import the PyYAML package within Python.
import yaml
Loading YAML with Python
It is surprisingly easy to load a YAML file into a Python dictionary with PyYAML.
with open("config.yaml") as f:
config = yaml.load(f, Loader=yaml.FullLoader)
We can confirm that it worked by displaying the contents of the config
variable.
config
{'learning_rate': 0.1, 'random_seed': 789108, 'maintainer': 'Shahin Rostami', 'categories': ['hotdog', 'not a hotdog']}
It's as easy as that. We can now access the various elements of the data structure like a normal Python dictionary.
config['learning_rate']
0.1
config['categories']
['hotdog', 'not a hotdog']
config['categories'][0]
'hotdog'
Updating YAML with Python
Let's say we want to update our learning_rate
to 0.2
and add an extra category to our category list. We can do this using the normal Python dictionary manipulation.
config["learning_rate"] = 0.2
config["categories"].append("kind of hotdog")
We can then write this back to the config.yaml
file to save our changes.
with open("config.yaml", "w") as f:
config = yaml.dump(
config, stream=f, default_flow_style=False, sort_keys=False
)
All done! We can confirm this by loading our YAML File again and displaying the dictionary.
with open("config.yaml") as f:
config = yaml.load(f, Loader=yaml.FullLoader)
config
{'learning_rate': 0.2, 'random_seed': 789108, 'maintainer': 'Shahin Rostami', 'categories': ['hotdog', 'not a hotdog', 'kind of hotdog']}
Conclusion
In this section, we briefly introduced YAML before using the PyYAML package to load, manipulate, and save a collection of configuration settings that we stored in a file named config.yaml
. Keeping your configuration settings separate from your source code comes with multiple benefits, e.g. allowing modification of these configurations without modifying source code, automation and search throughout your project, and sharing configurations between multiple bits of work.