client_golang/prometheus/process_collector_darwin.go

148 lines
4.6 KiB
Go
Raw Normal View History

process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
// Copyright 2024 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package prometheus
import (
process_collector: fill in virtual and resident memory values on macOS using optional cgo (#1616) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] https://github.com/golang/go/issues/19841#issuecomment-293334802 [3] https://github.com/golang/go/issues/11887#issuecomment-125694604 Signed-off-by: Matt Harbison <mharbison72@gmail.com>
2024-09-27 19:29:44 +03:00
"errors"
process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
"fmt"
"os"
"syscall"
"time"
"golang.org/x/sys/unix"
process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
)
process_collector: fill in virtual and resident memory values on macOS using optional cgo (#1616) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] https://github.com/golang/go/issues/19841#issuecomment-293334802 [3] https://github.com/golang/go/issues/11887#issuecomment-125694604 Signed-off-by: Matt Harbison <mharbison72@gmail.com>
2024-09-27 19:29:44 +03:00
// notImplementedErr is returned by stub functions that replace cgo functions, when cgo
// isn't available.
var notImplementedErr = fmt.Errorf("not implemented")
type memoryInfo struct {
vsize uint64 // Virtual memory size in bytes
rss uint64 // Resident memory size in bytes
}
process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
func canCollectProcess() bool {
return true
}
func getSoftLimit(which int) (uint64, error) {
rlimit := syscall.Rlimit{}
if err := syscall.Getrlimit(which, &rlimit); err != nil {
return 0, err
}
return rlimit.Cur, nil
}
func getOpenFileCount() (float64, error) {
// Alternately, the undocumented proc_pidinfo(PROC_PIDLISTFDS) can be used to
// return a list of open fds, but that requires a way to call C APIs. The
// benefits, however, include fewer system calls and not failing when at the
// open file soft limit.
if dir, err := os.Open("/dev/fd"); err != nil {
return 0.0, err
} else {
defer dir.Close()
// Avoid ReadDir(), as it calls stat(2) on each descriptor. Not only is
// that info not used, but KQUEUE descriptors fail stat(2), which causes
// the whole method to fail.
if names, err := dir.Readdirnames(0); err != nil {
return 0.0, err
} else {
// Subtract 1 to ignore the open /dev/fd descriptor above.
return float64(len(names) - 1), nil
}
}
}
// describe returns all descriptions of the collector for Darwin.
// Ensure that this list of descriptors is kept in sync with the metrics collected
// in the processCollect method. Any changes to the metrics in processCollect
// (such as adding or removing metrics) should be reflected in this list of descriptors.
func (c *processCollector) describe(ch chan<- *Desc) {
ch <- c.cpuTotal
ch <- c.openFDs
ch <- c.maxFDs
ch <- c.maxVsize
ch <- c.startTime
/* the process could be collected but not implemented yet
ch <- c.rss
ch <- c.vsize
ch <- c.inBytes
ch <- c.outBytes
*/
}
process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
func (c *processCollector) processCollect(ch chan<- Metric) {
if procs, err := unix.SysctlKinfoProcSlice("kern.proc.pid", os.Getpid()); err == nil {
if len(procs) == 1 {
startTime := float64(procs[0].Proc.P_starttime.Nano() / 1e9)
ch <- MustNewConstMetric(c.startTime, GaugeValue, startTime)
} else {
err = fmt.Errorf("sysctl() returned %d proc structs (expected 1)", len(procs))
c.reportError(ch, c.startTime, err)
}
} else {
c.reportError(ch, c.startTime, err)
}
// The proc structure returned by kern.proc.pid above has an Rusage member,
// but it is not filled in, so it needs to be fetched by getrusage(2). For
// that call, the UTime, STime, and Maxrss members are filled out, but not
// Ixrss, Idrss, or Isrss for the memory usage. Memory stats will require
// access to the C API to call task_info(TASK_BASIC_INFO).
rusage := unix.Rusage{}
if err := unix.Getrusage(syscall.RUSAGE_SELF, &rusage); err == nil {
cpuTime := time.Duration(rusage.Stime.Nano() + rusage.Utime.Nano()).Seconds()
ch <- MustNewConstMetric(c.cpuTotal, CounterValue, cpuTime)
} else {
c.reportError(ch, c.cpuTotal, err)
}
process_collector: fill in virtual and resident memory values on macOS using optional cgo (#1616) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] https://github.com/golang/go/issues/19841#issuecomment-293334802 [3] https://github.com/golang/go/issues/11887#issuecomment-125694604 Signed-off-by: Matt Harbison <mharbison72@gmail.com>
2024-09-27 19:29:44 +03:00
if memInfo, err := getMemory(); err == nil {
ch <- MustNewConstMetric(c.rss, GaugeValue, float64(memInfo.rss))
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(memInfo.vsize))
} else if !errors.Is(err, notImplementedErr) {
// Don't report an error when support is not compiled in.
c.reportError(ch, c.rss, err)
c.reportError(ch, c.vsize, err)
}
process_collector: fill in most statistics on macOS (#1600) * process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <mharbison72@gmail.com> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> --------- Signed-off-by: Matt Harbison <mharbison72@gmail.com> Signed-off-by: Matt Harbison <57785103+mharbison72@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2024-09-04 17:56:01 +03:00
if fds, err := getOpenFileCount(); err == nil {
ch <- MustNewConstMetric(c.openFDs, GaugeValue, fds)
} else {
c.reportError(ch, c.openFDs, err)
}
if openFiles, err := getSoftLimit(syscall.RLIMIT_NOFILE); err == nil {
ch <- MustNewConstMetric(c.maxFDs, GaugeValue, float64(openFiles))
} else {
c.reportError(ch, c.maxFDs, err)
}
if addressSpace, err := getSoftLimit(syscall.RLIMIT_AS); err == nil {
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(addressSpace))
} else {
c.reportError(ch, c.maxVsize, err)
}
// TODO: socket(PF_SYSTEM) to fetch "com.apple.network.statistics" might
// be able to get the per-process network send/receive counts.
}