Skip to content

Add system.cpu.load_average.1m and system.processes.count#2014

Open
strawgate wants to merge 1 commit into
mainfrom
feat/load-avg-and-process-count-metrics
Open

Add system.cpu.load_average.1m and system.processes.count#2014
strawgate wants to merge 1 commit into
mainfrom
feat/load-avg-and-process-count-metrics

Conversation

@strawgate

Copy link
Copy Markdown
Contributor

Upstream SystemMetricsInstrumentor doesn't emit either metric. Both are one-line psutil calls, so the Logfire wrapper adds them via the same callback pattern as system.cpu.simple_utilization and includes them in FULL_CONFIG.

Closes the gap where instrument_system_metrics(base='full') left the Hosts page's Load 1m and Procs columns permanently empty.

On Windows, psutil.getloadavg() is an emulated polling shim — first call returns 0 and values update every ~5s. Documented in the helper's docstring.

Upstream `SystemMetricsInstrumentor` doesn't include either metric.
Both are one-line psutil calls so the Logfire wrapper can emit them
via the same callback pattern as `system.cpu.simple_utilization`.

Both go into `FULL_CONFIG` so the Hosts page populates Load 1m + Procs
columns when the customer uses `base='full'`. Available individually
via `config={…}` for callers picking metrics by hand.

On Windows, `psutil.getloadavg()` is an emulated polling shim — first
call returns 0 and values update every ~5s.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

yield Observation(len(psutil.pids()))

logfire_instance.metric_gauge_callback(
'system.processes.count',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be system.process.count as in https://opentelemetry.io/docs/specs/semconv/system/system-metrics/#metric-systemprocesscount, which is an updowncounter. If including the process.state attribute isn't easy it's probably fine to leave out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the point is — rename the attribute to system.process.count, and make it an updowncounter not a gauge. If convenient, we can add the process.state attribute, but note that could increase the number of series significantly, which we probably wouldn't want to do in default config if it costs a decent amount. @strawgate do you have thoughts about whether we should add that attribute?

Alternatively we could change the name more significantly, I don't think it should be close to the semantic convention one though if we do that (like it is now). But I'm okay using this name and not having the attribute if we think this is good.

yield Observation(psutil.getloadavg()[0])

logfire_instance.metric_gauge_callback(
'system.cpu.load_average.1m',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/semconv/system/system-metrics/ mentions a hypothetical system.linux.cpu.load_1m. Not a real semconv, but worth mimicking. If we don't want to include the OS prefix, then system.cpu.load_1m.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strawgate Can we just change the name to this? Is there a reason you used the existing name?

'system.cpu.utilization': CPU_FIELDS,
# The Hosts page reads these — upstream `_DEFAULT_CONFIG` doesn't include
# them and they aren't expensive (one number per scrape each), so add them
# so `base='full'` covers the page without a follow-up `config={…}` knob.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are so cheap, and they're used in a default UI view, then they should be in the basic config.

'cpython.gc.uncollectable_objects': None,
'system.cpu.load_average.1m': None,
'system.processes.count': None,
}, 'Docs and the MetricName type need to be updated if this test fails'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs weren't updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the test explains that if it fails the docs need to be updated, we're guessing the AI fixed the test pre-emptively but didn't realize it also needed to update the docs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants