WOA Issue 28
In this issue “Better, faster, cheaper: pick any two” Qualcomm…
Working at home this year meant setting up a home lab in my attic. It gets warm up here in the summertime, and I wonder just a little bit about the temperature at my desk. The many (types and sizes of) computers also need to keep cool, either by having a fan pointed at them or by careful application of heat sinks for dissipating the heat generated.
It’s easy to leverage Prometheus to scrape a bunch of system temperature metrics — plus a custom sensor reading — and feed it into Grafana for visualization and alerting. The key to this is Tailscale, which keeps everything on a private network. To make this work, I had to go through some detours and consider some other perfectly reasonable software that was not as well suited for this particular task at this specific level of scale and polish.
Here’s a start-to-finish account of the effort. Note that everything happens to run on Arm processors, mostly because the menagerie of little systems I have in the attic are Arm machines of one kind or another. Fortunately, Arm-based systems tend to run cool, which is perfect for my use — and my summer attic environment!
It turns out that even the simplest little single-board computer has at least one – often many – built-in temperature sensors. These are used to throttle the CPU to a lower speed if it gets too hot, or provide input controls to fans or other active cooling. These sensor readings can be found in the /sys/class/thermal filesystem exposed by the kernel. There are kernel docs on the General Thermal Sysfs Driver, but primarily you need to know that you don’t need any program more substantial than cat(1) to read those temps.
There are also standalone sensors you can get that talk over a myriad of protocols and functions. I have one module that talks over the I2C bus using Sparkfun’s Qwiic connector as an interface, and there are Bluetooth and ISM band radio protocols used to transmit temperature in dozens of ways. For the purpose of this exercise, you’ll want to find a remote sensor reading that you can extract by typing a command at the command line prompt. The specific code I am using is derived from Sparkfun’s Qwiic_BME280_Py because I have that sensor. If you have a radio-based weather input you can take inspiration from rtl_433 by Benjamin Larsson.
Temperature sensors are one of many sensor inputs that a modern computer might have available for you to tap into. It’s reasonable to want to also keep tabs on load average, disk space free, network bandwidth used, and any of 100’s of other values. Rather than collect each of these one at a time, run something like Prometheus’s node_exporter to gather metrics in bulk. This is where I diverged from my foray into MQTT and NATS, which are more about prompt delivery of individual messages than they are about wholesale bulk delivery of metrics data.
There are some metrics that Prometheus’s node_exporter won’t have native drivers for, so you’ll need to write your own. Thankfully the Prometheus text file format is incredibly simple and you can easily schedule a shell script to run out of cron to fill a directory with metrics that will be gathered at the same time the system metrics are harvested. I found the tutorial on the Robust Perception weblog, Using the Textfile Collector from a Shell Script, to be just what I needed to make a teeny tiny script based on the BME280 code above to dump out one line of text.
Prometheus has a very straightforward and easy to install configuration for collecting the metrics produced by node_exporter and assembling them into a time series database for querying and simple graphical reporting and charting. With Prometheus, you can collect lots of data from lots of remote systems, polling as frequently as necessary for you to get prompt notice of changes in your measurements.
Grafana takes the data out of Prometheus and provides neat (some might say beautiful) dashboards and flexible alerting. I got as far as installing it to prove that I could put some charts up on the screen, but I haven’t touched most of its many functions.
Everything is harder when you’re trying to provide access to systems that are remote behind firewalls, network address translations, and other obstacles. It’s also harder when the endpoints you are interested in are on the wide-open, hostile public Internet.
An essential part of my setup is Tailscale, which lets you build a network overlay that makes a few machines on your private network reachable to each other even through firewalls, but without exposing them to the world. When there are long-running tasks on this network, bind their listening addresses to their Tailscale address, which will allow you private access.
As with any quick weekend project, not everything went perfectly on the way to getting things going. I ran into a couple of minor bugs, which were addressed in due course. I’ve shared them here to help the next person who looks for them.
Prometheus 1.0.0 node_exporter will emit error parsing mdstatus: error parsing mdstat /proc/mdstat on Raspbian, which doesn’t stop it from working but does keep the error log filling up. See issue 1719. You can manually disable the collector with –no-collector.mdadm to avoid the log noise. More good news: this was fixed in Prometheus 1.0.1!
The easiest way to get started on a project like this is not the way I started it. When I do it again, I’ll change up the order of operations.
From the user interface down, the sensible order is to get Prometheus up and running first, set up node_exporter to collect some metrics, and then set up Grafana once you have some data to make pretty.
From the sensor up, this effort is best begun by producing the simplest shell script that will emit a single number that’s your sensor reading of interest. Once you have those two ends of the puzzle in place, the missing middle is the “textfile collector”.
In your quest for finding out how to get from sensor to screen, a “textfile collector” search may not be a your first term of choice. However, once you get this far, it will be obvious that’s a key component to the effort. This repo of node exporter textfile collector scripts will be helpful in unlocking some of the power of the platform.
Since you never know what kind of hardware might end up in your office attic, this post is all about a simple, flexible architecture for monitoring component temperatures across a range of devices, from single-board computers and ambient sensors to data-center servers. A novel network infrastructure (Tailscale) simplifies data ingestion and avoids a multi-tier architecture, and cloud native tools (Prometheus and Grafana) collect, visualize, and alert on anomalies. Et voila!
If this writeup piqued your interest, please register for the Arm DevSummit, being held online from October 6-8, 2020. On Tuesday, October 6 at 10:10 a.m. Pacific time, I will present on “Environmental system monitoring with Grafana, Prometheus, and Tailscale”, covering the materials in this blog plus, but focusing on the network architecture which helps simplify operations. Register now to save your spot!