[go: up one dir, main page]

Page MenuHomePhabricator

upgrade prometheus-ipmi-exporter to 1.8.0
Open, MediumPublic

Description

ipmi_exporter 1.8.0 introduces SEL event metrics https://github.com/prometheus-community/ipmi_exporter/releases/tag/v1.8.0 and has been packaged for Debian trixie https://packages.debian.org/trixie/prometheus-ipmi-exporter. We should upgrade, and make use of these for SEL based hardware alerts.

rough checklist:

  • backport prometheus-ipmi-exporter
    • bookworm
    • bullseye
    • buster
  • canary test updated package
  • deploy to the fleet

Event Timeline

Quick update: prometheus-ipmi-exporter-1.8.0 was a straightforward backport for bookworm https://gitlab.wikimedia.org/repos/sre/prometheus-ipmi-exporter/-/jobs/291532

However, for bullseye there are missing dependencies and will need to handle those as well. I'll have a closer look at how much work this would be

Investigating (0) prometheus-ipmi-exporter-build-deps:amd64 < 1.8.0-1~wmf11+1 @iU mK Nb Ib >
Broken prometheus-ipmi-exporter-build-deps:amd64 Depends on golang-github-prometheus-exporter-toolkit-dev:amd64 < none | 0.5.1-2+deb11u2 @un uH > (>= 0.8.0)
  Removing prometheus-ipmi-exporter-build-deps:amd64 because I can't find golang-github-prometheus-exporter-toolkit-dev:amd64
Done
 Done
Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
The following packages will be REMOVED:
  prometheus-ipmi-exporter-build-deps
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 9216 B disk space will be freed.
(Reading database ... 28677 files and directories currently installed.)
Removing prometheus-ipmi-exporter-build-deps (1.8.0-1~wmf11+1) ...
mk-build-deps: Unable to install prometheus-ipmi-exporter-build-deps at /usr/bin/mk-build-deps line 457.
mk-build-deps: Unable to install all build-dep packages

Given that this is a Go static ELF we can also simply build on bookworm and copy over the deb to bullseye-wikimedia, we're doing this for other exporters as well. buster might be tricky due to it's old libc6, but we can also ignore it, there's less than 150 hosts left and they can simply live the old IPMI monitoring.

Given that this is a Go static ELF we can also simply build on bookworm and copy over the deb to bullseye-wikimedia, we're doing this for other exporters as well. buster might be tricky due to it's old libc6, but we can also ignore it, there's less than 150 hosts left and they can simply live the old IPMI monitoring.

Thanks, I was hoping for the same although in my testing the package built for bookworm fails to install on bullseye with

prometheus-ipmi-exporter : Depends: libc6 (>= 2.34) but 2.31-13+deb11u8 is to be installed

The dependency is added because some feature in the compiled Go code uses syscalls which were only wired up in 2.34 (maybe openat() at al). We ran into this problem before and there was a Go build flag to force it to use a fallback. I can't find a reference currently, but maybe Filippo remembers when he's back.

Nice, thanks for the pointer! It looks like export CGO_ENABLED=0 does the right thing. At least, with this set the package builds and installs successfully on my bullseye test host.

https://gitlab.wikimedia.org/repos/sre/prometheus-ipmi-exporter/-/jobs/294350

Buster is looking fine with this deb as well. So I've gone ahead and uploaded 1.8.0 to bookworm-wikimedia, bullseye-wikimedia, and buster-wikimedia. Up next is a small canary upgrade before fully rolling out.

Change #1051207 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] prom-ipmi-exporter: add sel-events collector

https://gerrit.wikimedia.org/r/1051207

Mentioned in SAL (#wikimedia-operations) [2024-07-15T18:04:15Z] <herron> upgraded prometheus-ipmi-exporter to 1.8.0 T368088

Change #1051207 merged by Herron:

[operations/puppet@production] prom-ipmi-exporter: add sel-events collector

https://gerrit.wikimedia.org/r/1051207

Change #1054649 had a related patch set uploaded (by Herron; author: Herron):

[operations/alerts@master] ipmi-sel: create task on critical ipmi sel events

https://gerrit.wikimedia.org/r/1054649

I'm getting failures for this change on db1179:

level=info msg="Starting ipmi_exporter" version="(version=1.6.1, branch=debian/sid, revision=1.6.1-2+b5)"
level=error msg="Error parsing config file" error="invalid collector: sel-events"

According to debmonitor is the only host that was not updated to the latest version (1.8.0-1~wmf12+1).
Should I update it? Is there anything else that needs to be done?

Mentioned in SAL (#wikimedia-operations) [2024-07-22T10:33:08Z] <volans> upgraded manually prometheus-ipmi-exporter to v 1.8.0-1~wmf12+1 on db1179 (leftover because was down) T368088

Upgraded the package all clean now on db1179.