The source code for this blog is available on GitHub.

arthursens.dev

When my Counter Restarted? Addressing Decade-Old Counter Limitation!

Cover Image for When my Counter Restarted? Addressing Decade-Old Counter Limitation!
Arthur Sens
Arthur Sens

Prometheus v2.50.0 is coming! This version is especially important to me because it will introduce something we call "Created Timestamps Ingestion". This is part of a 6 months project that was developed as part of Google Summer of Code and it touched so many different aspects of Software development! From design documents and building consensus with the team to coding SDKs and storage systems. It was a terrific learning experience that I'd recommend to any person interested in Open Source development :)

So, what are Created Timestamps? Let me explain it with a little story:

Monitoring the rabbit vs. turtle race training

Once upon a time, there was a rabbit and a turtle... yes this is a real story... believe us!

The rabbit and the turtle were training for a legendary race. So legendary that there are history books about this tale nowadays. At the time, my mentors and I were there as consultants to their coach. Their coach was training both of them remotely (thanks COVID), and yesssss OF COURSE he was using Prometheus to monitor their training. Prometheus was configured to scrape the training data every 10 seconds.

The coach wanted to make sure both of them ran at least 30 meters every single day and he also set up alerts to monitor potential lazy students, but the data he was seeing was a bit strange, so he asked for our help.

Turtle and rabbit outside the track

One thing that is important to note is that only when each animal crosses the start line, Prometheus can identify that the animal starts the training and needs to be measured. This means we have no time series until that happens.

So yeah, both of them went to the race track, and the turtle immediately started its training.

Turtle at 10m mark

The rabbit saw how slow the turtle was running and was like "What a clown, I'm gonna take a nap and still do my training before the turtle"

Meanwhile, Prometheus checked the state of the training and the turtle was already 10m away, so Prometheus sees travel_meters_total equal to 10.

The rabbit was competitive even in its sleep, so 10 seconds later it woke up and saw the turtle already 20m away.

Notice how the rabbit still hasn't crossed the start line, so Prometheus checks the state again but has no idea that the rabbit even exists.

Turtle at 20m mark

On the third measurement, Prometheus sees both animals on the 30-meter mark. We were lucky because we were there and saw everything, but the coach could only query Prometheus to know what happened.

Turtle and rabbit at 30m mark

So, in the eyes of the coach, what happened there?

The coach was showing us his Prometheus instance, and we could see that the data looked weird. We see that the time series representing the rabbit appears only after a few scrapes, and out of nowhere it is already 30 meters.

So technically Prometheus only had a single data point about the rabbit.

Did the rabbit teleport to the finish line? Did it just skip the start line? What happened there?

Traveled meters without Created Timestamps

Reading the results of the rate of travel_meters_total, we would say that the rabbit has never left the start line.

The problem here is very clear to my mentors and me:

What if we ingested the time when they hit the start line for the first time?

Created timestamps semantics

When working on the design of created timestamps, we came up with something like this:

"Created timestamps is a piece of metadata that tells the exact time when a time series first came into existence. Or the precise time when the time series was reset."

We were not the first ones to come up with this idea, of course. Created timestamps were already mentioned by OpenMetrics specification and also by OpenTelemetry data model.

Implementing Created Timestamps in Prometheus

As most Prometheus users know, Prometheus is a time series database. It ingests time series samples while associating them with a timestamp. Usually, the timestamp when that particular sample was scraped by Prometheus.

With Prometheus unaware of Created Timestamps, we could represent 3 samples from a particular time series like the image below:

TSDB Chunk unaware of Created Timestamps

The green dotted lines represent samples of a series with the same created timestamps. Notice how the third sample has a different creation time, which means that a reset happened there. Prometheus reset detection is done by comparing sequential samples: "If the next sample value is lower than the previous one, a reset happened". The third sample value is higher than the second, so Prometheus has no idea that a reset happened between them.

The strategy we came up with was to make client libraries able to expose time series creation time, and Prometheus would ingest a "synthetic zero" between scraped samples where it makes sense:

TSDB Chunk aware of Created Timestamps

Back to the training

We kindly asked the coach to use our version of Prometheus, and we immediately saw better results.

Notice on the RIGHT side how traveled_meters_total starts in 0 for both the turtle AND the rabbit.

traveled_meters_total with Created Timestamps

This also reflects on rates, showing how the rabbit was super fast and did not cheat during his training.

rate(traveled_meters_total) with Created Timestamps

Enabling Created Timestamps in your workloads

Exposing Created Timestamps is enabled by default by Prometheus client_golang since release v1.17.0, as long as Prometheus Protobuf exposition format is used during content negotiation.

On Prometheus's side, you'll need to make the following changes:

  • Upgrade Prometheus version to at least v2.49.0
  • Enable feature-flag --feature-flag=created-timestamp-zero-ingestion (Beware that this feature flag will change the default scraping protocol to Prometheus Protobuf)

Future of Created Timestamps (call for action!!!!)

We'd love to see Created Timestamps getting broad adoption in the Prometheus ecosystem! If you are a Prometheus client contributor/maintainer, it would be lovely to see PRs implementing Created Timestamps with Prometheus Protobuf.

We're also working on enabling Created Timestamp ingestion with the OpenMetrics parser, and extending the Remote-Write protocol to push Created Timestamps as part of time series metadata!

Created Timestamps at PromCon!

This blog is an adaptation of the talk Bartek and I gave it during our PromCon presentation! It goes even deeper into current pains and how Created Timestamps solves them! Feel free to watch if you prefer it that way :)

Created Timestamps at PromCon