The Go runtime/metrics package currently exports extremely granular
histograms. Exponentially bucket any histogram with unit "seconds"
or "bytes" instead to dramatically reduce the number of buckets, and
thus the number of metrics.
This change also adds a test to check for expected cardinality to
prevent cardinality surprises in the future.
Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
A previous PR made it so that the Go 1.17 collector locked only around
uses of rmSampleBuf, but really that means that Metric values may be
sent over the channel containing some values from future metrics.Read
calls. While generally-speaking this isn't a problem, we lose any
consistency guarantees provided by the runtime/metrics package.
Also, that optimization to not just lock around all of Collect was
premature. Truthfully, Collect is called relatively infrequently, and
its critical path is fairly fast (10s of µs). To prove it, this change
also adds a benchmark.
name old time/op new time/op delta
GoCollector-16 43.7µs ± 2% 43.2µs ± 2% ~ (p=0.190 n=9+9)
Note that because the benchmark is single-threaded it actually looks
like it might be getting *slightly* faster, because all those Collect
calls for the Metrics are direct calls instead of interface calls.
Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
* Check validity of method and code label values
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Use more flexibly functional option pattern for configuration
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Update documentation
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Simplify
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Fix inconsistent method naming
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.
The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
runtime.MemStats, even with the additional metrics added, because
it does not require any stop-the-world events.
That being said, integrating the package comes with some caveats, some
of which were discussed in #842. Namely:
* The old MemStats-based metrics need to continue working, so they're
exported under their old names backed by equivalent runtime/metrics
metrics.
* Earlier versions of Go need to continue working, so the old code
remains, but behind a build tag.
Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
code duplication.
* This change adds a new histogram metric type specifically optimized
for runtime/metrics histograms. This type's methods also include
additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
names are translated.
* This change adds a `go generate` script to generate a list of expected
runtime/metrics names for a given Go version for auditing. Users of
new versions of Go will transparently be allowed to use new metrics,
however.
Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
This function calculates exponential buckets with different arguments
than the existing ExponentialBuckets function. Instead of specifying the
start and factor, the user can specify the min and max bucket value. We
have been doing it this way internally at my company for some time.
Signed-off-by: Seth Bunce <seth.bunce@getcruise.com>
This seem what OTel is converging towards, see
https://github.com/open-telemetry/oteps/pull/149 .
I see pros and cons with base-10 vs base-2. They are discussed in
detail in that OTel PR, and the gist of the discussion is pretty much
in line with my design doc. Since the balance is easy to tip here, I
think we should go with base-2 if OTel picks base-2. This also seems
to be in agreement with several proprietary solution (see again the
discussion on that OTel PR.)
The idea to make the number of buckets per power of 2 (or formerly 10)
a power of 2 itself was also sketched out in the design doc
already. It guarantees mergeability of different resolutions. I was
undecided between making it a recommendation or mandatory. Now I think
it should be mandatory as it has the additional benefit of playing
well with OTel's plans.
This commit also addresses a number of outstanding TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
Since 1.16 is out, we still support the last four minor releases.
The bump was required by the prometheus/procfs package using the new
`%w` printf directives. However, it also allows us to remove some
special casing about build info.
Signed-off-by: beorn7 <beorn@grafana.com>
Without this fix, the `InstrumentHandler...` middlewares get locked in
an endless loop in case of an invalid Collector, eating all the memory.
Signed-off-by: beorn7 <beorn@grafana.com>
MetricVec was already exported in early versions of this library, but
nobody really used it to implement vectors of custom Metric
implementations. Now #796 has shown up with a fairly special use case
for which I'd prefer a custom implementation of a special
"auto-sampling histogram" outside of this library. Therefore, I'd like
to reinstate support for creating vectors of custom Metric
implementations.
I played around for quite some while with the option of a separate
package providing the tools one would need to create vectors of custom
Metric implementations. However, with the current structure of the
prometheus/client_golang/prometheus package, this leads to a lot of
complications with circular dependencies. (The new package would need
the primitives from the prometheus package, while the existing metric
vectors like GaugeVec need to import the new vector package to not
duplicate the implementation. Separating vector types from the main
prometheus package is out of the question at this point because that
would be a breaking change.)
Signed-off-by: beorn7 <beorn@grafana.com>
I assume older Nanoc versions rendered the anchors with commas, but
the current doesn't.
Also, this adds the same link to another doc comment where it is also
relevant.
Signed-off-by: beorn7 <beorn@grafana.com>
`github.com/golang/protobuf/proto` is deprecated in lieu of
`google.golang.org/protobuf/proto`. However, we cannot simply
migrate. Types from the proto package are exposed to users of packages
in this repo. If we migrate here, users have to migrate to. Thus, we
could only migrate with a major version bump.
In different news, with all the inline lint:ignore comments, including
the existing ones, there is no need to repeat the exception in the
Makefile.
A current version of `staticcheck` is happy with the code after this
commit. golangci-lint is broken at the moment, however, and ignores
the lint:ignore comments in the code as well as those via envvar.
Signed-off-by: beorn7 <beorn@grafana.com>
Now that we have also added CollectAndLint and GatherAndLint, I
thought we should bring CollectAndCount in line. So:
- Add GatherAndCount.
- Add filtering by metric name.
- Add tests.
Minor wart: CollectAndCount should now return `(int, error)`, but that
would be a breaking change as the current version just returns
`int`. I decided to let the new version panic when it should return an
error. An error is anyway very unlikely, so the biggest annoyance here
is really just the inconsistency.
Signed-off-by: beorn7 <beorn@grafana.com>
An empty job name was always an error, but it was previously only
detected when pushing to the PGW and receiving an error. With this
commit, the error is detected before pushing.
An empty label value should have been OK but was encoded in a way that
couldn't be pushed to the
PGW. Cf. https://github.com/prometheus/pushgateway/issues/344 . This
commit changes the creation of the path in the URL so that it works
with empty label values.
Signed-off-by: beorn7 <beorn@grafana.com>
Also, change all the `dto.MetricFamily` arguments to pointers to be
more consistent with what we do in client_golang in general.
Signed-off-by: beorn7 <beorn@grafana.com>
Document the example usage of ConstLabels to register several GaugeFuncs
on the same metric name. Also reference it from the NewCounterFunc
documentation as it's similar.
Ref: https://github.com/prometheus/client_golang/pull/736
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Essentially, just don't try to set a status code and send any error
message in the body once metrics gathering has succeeded. At that
point, the most likely reason for errors is anyway that the client has
disconnected, in which sending an error is moot. The other possible
reason for an error is a problem during metrics encoding. This is
unlikely to happen (it's a coding error here in client_golang in any
case), and if it is happening, the odds are we have already sent
something to the ResponseWriter, which means we cannot set a status
code anymore. The doc comment for HTTPErrorOnError now describes these
circumstances explicitly and recommends to set a logger to report that
kind of error.
This should fix the reason for the infamous `superfluous
response.WriteHeader call` message.
Signed-off-by: beorn7 <beorn@grafana.com>
Also, update the package documentation. The concerns about the global
registry aren't really valid anymore because promauto now also works
with custom registries.
The musings about the http.DefaultMux are more a digression and
shouldn't be in a doc comment.
Signed-off-by: beorn7 <beorn@grafana.com>
Interestingly, methods implicitly forwarded from embedded types are
detected by GoDoc if they are just one level deep. Embedded types in
the embedded type are not recognized. This commit therefore adds
explicit forwarding methods for Collect, Describe, and Reset.
Signed-off-by: beorn7 <beorn@grafana.com>
This is, sadly, the only way to avoid a breaking change. The cost is
that anyone using exemplars has to perform a type assertion. This is,
however, a common pattern where interfaces turn out to need additional
methods in a stable library or only some implementations can provide
the additional methods (AKA "interface upgrade").
Needless to say that in v2 of this library, we'll do things in a more
straight forward way.
Signed-off-by: beorn7 <beorn@grafana.com>
Contrary to the code comment, I see no `basicMetricVec` implementation.
Grep'ing this project shows this is the only reference. So I suspect
it's an outdated comment and can be removed to minimize confusion.
I'm unclear whether other parts of that comment are also incorrect and
need updating.
Signed-off-by: Jeff Widman <jeff@jeffwidman.com>
The `const separatorByte` wasn't used anymore actually. In `vec.go`,
we were using `model.SeparatorByte`, which is better anyway. So remove
the unused constant and initialize `separatorByteSlice` with
`model.SeparatorByte`, too.
Signed-off-by: beorn7 <beorn@grafana.com>
This is a much stronger hash function than fnv64a and comparably fast
(with super-fast assembly implementation for amd64).
Performance is not critical here anyway.
The old fnv64a is kept for vectors, where collision detection is in
place and the weakness of the hashing doesn't matter that much. I
implemented a vector version with xxhash and found that xxhash is
slower in all cases except very very high cardinality (where it is
only slightly faster). Also, ``xxhash.New`` comes with an allocation
of 80 bytes. Thus, to keep vectors alloc-free, we needed to add a
`sync.Pool`, which would have an additional performance overhead.
Signed-off-by: beorn7 <beorn@grafana.com>
This makes the collisions a bit less likely by XOR'ing descIDs rather
than adding them up for the collectorID.
Signed-off-by: beorn7 <beorn@grafana.com>
Flush is another of the methods that will call WriteHeader if it
hasn't happened yet. Since we want to call observeWriteHeader (if
set), we need to do the WriteHeader call already here, similar to what
we have done in Write and ReadFrom.
This commit also adds comments explaining the above to not tempt
developers to remove the WriteHeader call.
Signed-off-by: beorn7 <beorn@grafana.com>
It is perfectly possible that a normal GC happens just before the
forced one. Thus seeing 2 GCs is fine.
Whenever this test failed, it was because two GCs were seen.
Signed-off-by: beorn7 <bjoern@rabenste.in>
This is really lame as it essentially just uses longer times to
wait. The test is still timing-dependent and thus could still
theoretically fail with unlucky scheduling. However, we are testing
something that _is_ about timing. Turning this all into something not
timing-dependent would be first quite involved and second might defeat
the purpose of testing code that is inherently about timing.
Let's see how this works out in practice.
Signed-off-by: beorn7 <bjoern@rabenste.in>
We stopped advertising binary-wide setting of a label quite a while
ago. This doc comment was missed in the cleanup.
Signed-off-by: beorn7 <bjoern@rabenste.in>
tl;dr: Return previous memstats if reading new ones takes longer than
1s.
See the doc comment of NewGoCollector for details.
Signed-off-by: beorn7 <bjoern@rabenste.in>
The previous `float64(math.MaxUint64 + 1)` is too close to
`float64(math.MaxUint64)` to actually overflow as indended.
The counter code is actually converting forward and backward and
compare the original and twice-converted value. On most platform, this
will create a deviation and thus trigger the expected behavior. By
sheer "luck", one might end up with the same value and thus still use
the uint64 representation. Which is OK within the precision we can
expect. But it breaks the test. With this change, the next
representable floating point value greater than the floating point
value used to represent math.MaxUint64 is used.
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
This allows us to simplify a bunch of code while still supporting the
last four Go minor versions.
We have also run into minor annoyances a couple of times by now to
keep supporting 1.7 and 1.8.
It's time to pull the plug!
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
The context package is available since Go 1.7 which is the minimal version
supported by client_golang.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>