Skip to content

Site Health: Cache the full results and a timestamp in the status transient#12181

Open
gziolo wants to merge 16 commits into
WordPress:trunkfrom
gziolo:fix/65232-cache-site-health-results
Open

Site Health: Cache the full results and a timestamp in the status transient#12181
gziolo wants to merge 16 commits into
WordPress:trunkfrom
gziolo:fix/65232-cache-site-health-results

Conversation

@gziolo

@gziolo gziolo commented Jun 15, 2026

Copy link
Copy Markdown
Member

Trac ticket: https://core.trac.wordpress.org/ticket/65232

Note: This PR changed substantially based on review. The detail cache now stores a compact, locale-independent summary (test name, status, timestamp) in a separate, non-autoloaded option. Earlier commits and discussion reflect earlier approaches.

What & why

Trac #65232 wants a fast, cache-only Site Health summary that never runs the synchronous tests. Today the health-check-site-status-result transient stores only the aggregate good/recommended/critical counts, so any consumer that needs per-test results has to re-run the tests.

This PR adds a second cache that records, per test, its status and a timestamp, without re-running anything. The Abilities site_health field is a follow-up that reads this cache and resolves human-readable labels at read time, in the current locale.

Design

Two caches, with separate jobs:

  • health-check-site-status-result — unchanged shape: only { good, recommended, critical }. Small, still autoloaded, backward-compatible for the admin-menu counter and Dashboard widget.
  • health-check-site-status-detail (new) — per-test entries keyed by test name, each with a status and its own timestamp. Stored with an expiration so it is not autoloaded, since only the counts are needed on the hot path. get_site_status_detail() also returns counts derived from those same entries, so a consumer reads one internally consistent view.

Only locale-independent data is cached

Each entry stores just the test name, status, and timestamp. The label, description, actions, and badge are left out on purpose. Those strings are translated for the current user's locale, and this cache is a single site-wide option, so storing them could show one user's locale to another. It also keeps the option small. Consumers resolve labels at read time in the current locale.

Who writes what

  • The Site Health screen posts the aggregate counts and the per-test results as two independent requests, so a large counts refresh is never blocked. Its results include the asynchronous tests that only run in the browser.
  • The weekly scheduled check refreshes the counts transient and updates the detail cache. It currently keeps an entry the screen wrote within the last week, so it won't discard fresher results (including the async tests). Per-result timestamps let stale entries (for example a test whose plugin was deactivated) be dropped after a month.

Detail cache shape

{
  "results": {
    "<test>": { "status": "good|recommended|critical", "timestamp": 1715714399 }
  },
  "counts": { "good": 12, "recommended": 3, "critical": 0 },
  "timestamp": 1715714399
}

get_site_status_detail() returns results as a list and re-injects the test name from the key.

Canonical accessors (on WP_Site_Health)

  • get_site_status_counts() / set_site_status_counts() — the legacy UI counts (the small autoloaded transient).
  • get_site_status_detail(){ results, counts, timestamp }, internally consistent.
  • update_site_status_detail( $results, $includes_async ) — sanitizes and merges into the detail cache.

Backward compatibility

The counts transient keeps its existing shape, so menu.php, the Dashboard widget, and the localized SiteHealth data are unchanged. No WP_Site_Health load is forced on the admin-menu path.

Open question for reviewers

Merge policy. When the scheduled check runs after a screen visit, should it overwrite the browser-collected results? Storing the freshest result is simplest, but cron runs the async tests over loopback HTTP, which fails on some hosts and falls back to "unavailable". One option under discussion: let cron overwrite only when it could actually verify the test, instead of the current one-week grace window.

Testing

  • tests/phpunit/tests/admin/wpSiteHealth.php — the scheduled-check counts vs. detail split, fresh-preserve / stale-refresh / stale-drop behavior, derived detail counts, that only locale-independent fields are stored, and the unavailable-async fallback cached under its test identifier.
  • tests/phpunit/tests/ajax/wpAjaxHealthCheckSiteStatusResult.php — independent counts-only and results-only requests, that translated and HTML fields are dropped, and capability enforcement.

phpcs, PHPStan, and the full site-health and ajax test groups pass locally.

Use of AI Tools

AI assistance: Yes
Tool(s): Claude Code (Claude Opus 4.8), Codex
Used for: Development and validation; reviewed and verified by me.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props gziolo, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions

Copy link
Copy Markdown

Hi there! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in the Core Handbook.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

@gziolo gziolo requested a review from westonruter June 15, 2026 14:10
@gziolo gziolo self-assigned this Jun 15, 2026
@github-actions

Copy link
Copy Markdown

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

Comment thread src/wp-admin/includes/ajax-actions.php Outdated

@westonruter westonruter left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my comment about letting the ajax-stored result be preferred, here's a critical observation that Claude made: the lack of an $expiration means that the transient will be autoloaded.

Since it now includes all of the results which could potentially be large, this would run the risk of increasing the bucket of autoloaded options beyond the 1MB limit imposed by Memcached, right?

Full review from Claude

Confirmed the critical detail: set_transient() is called with no expiration ($expiration = 0), so without an external object cache the option is stored with $autoload = true (option.php:1547-1552). The full results blob is therefore autoloaded on every page load site-wide, not just on Site Health screens. That makes the size concern much more severe than the finders estimated.

I have enough verified context. Here are the findings, ranked most-severe first.

Code Review: PR #12181 — Cache Site Health full results + timestamp

Findings

1. Full results blob is autoloaded on every page request (perf regression)class-wp-site-health.php:3492
set_transient( 'health-check-site-status-result', wp_json_encode( $site_status ) ) is called with no expiration. With no external object cache, set_transient() stores the option with autoload = true (option.php:1547-1552). Previously the value was ~60 bytes (three ints); now it's the entire $results array including every test's HTML description, actions, and badge. That payload is now loaded into memory on every front-end and admin request for every visitor, via wp_load_alloptions() — a site-wide cost paid to store data only the Site Health screen reads (and only the 3 counts from it). Passing an expiration, or storing under a non-autoloaded option, would avoid this.

2. Large results payload can exceed object-cache item limits → counter silently reverts to 0class-wp-site-health.php:3489
On installs with a persistent object cache (e.g. Memcached, default 1MB item limit), a site with many site_status_tests (plugins add them, with arbitrary HTML) can produce a JSON blob exceeding the backend's per-item limit. wp_cache_set() then silently fails, get_transient() returns false, and the admin-menu/dashboard counters fall back to 0/0/0 even though valid counts existed. No size cap or truncation is applied.

3. Race between scheduled cron and the AJAX counts update can discard freshly collected resultssrc/wp-admin/includes/ajax-actions.php:5488
The AJAX handler does a non-atomic read-modify-write: it reads the cached transient, splices in the POSTed counts, and re-writes. If wp_cron_scheduled_check() writes fresh results between the handler's get_transient() and set_transient() (admin loads Site Health while cron runs), the handler overwrites with the older (or absent) results it read earlier — silently dropping the detailed results the ticket exists to preserve, until the next weekly cron.

4. New status guard miscounts unknown/empty statuses and diverges from results countclass-wp-site-health.php:3471
The guard if ( ! is_array( $result ) || ! isset( $result['status'] ) ) continue; only checks that status is set. A result with status => '' or a custom value like 'info' still falls through to the final else and increments good, mislabeling it as a passing check. Separately, a result missing status was previously counted as good (old loop had no guard) but is now skipped entirely — so the cached results array length (e.g. 3) can no longer equal good + recommended + critical, which will surprise any consumer that assumes they match.

5. timestamp freshness mechanism is half-builtclass-wp-site-health.php:3490
The comment says the timestamp lets freshness "be evaluated," but: the transient is set with no expiry, nothing in core ever compares the timestamp to now, and the AJAX handler preserves the old cron timestamp on every counts update (ajax-actions.php:5495-5497). So the stored timestamp reflects the last full cron run, not the last update — any future freshness check built on it would be subtly wrong, and the promised mechanism doesn't actually exist yet.

6. Payload schema duplicated across writers/readers with no canonical accessorsrc/wp-admin/includes/ajax-actions.php:5477
The count-normalization array( 'good' => (int) ( $x['good'] ?? 0 ), 'recommended' => …, 'critical' => … ) is copied verbatim into ajax-actions.php:5477-5481 and class-wp-site-health.php:126-130, and the get_transient() + json_decode( …, true ) idiom is duplicated too. The transient's shape is now defined by convention across five sites (cron writer, ajax merge, enqueue reader, menu.php, dashboard.php). Adding a status bucket or renaming a key requires editing several hand-maintained literals; missing one yields counts that silently disagree between the menu counter, dashboard widget, and Site Health screen. A single read/write/normalize helper would centralize it.

7. Enqueue decodes the entire results blob just to read three integersclass-wp-site-health.php:117
On every Site Health and Dashboard page render, json_decode() parses the full cached payload (all HTML results) only to extract good/recommended/critical — the comment itself notes "only the aggregate counts are needed." The decode cost scales with the size/number of test results, for no benefit. (Mitigated if findings 1 & 2 are addressed by not storing results in the same hot transient.)

8. AJAX handler re-encodes the full results blob on every counts POSTsrc/wp-admin/includes/ajax-actions.php:5501
Each time an admin loads the Site Health Status tab, the JS posts the counts and this handler reads, decodes, and re-wp_json_encode()s the entire results payload back into the transient — repeated large serialization writes per page view, not just per weekly cron run (write amplification).


Note on a dropped candidate: finders flagged wp_json_encode( $site_status ) returning false (and storing false) on invalid UTF-8 in result HTML. I refuted it — wp_json_encode() runs _wp_json_sanity_check()/_wp_json_convert_string() to repair invalid UTF-8 and re-encode (functions.php:4411-4417), so the common case recovers; only a >512-depth structure would fail, which Site Health results never reach.

Findings 1–4 are the ones I'd block on; finding 1 in particular looks like a meaningful site-wide performance regression worth raising on the PR. Want me to post these as inline PR comments (/code-review --comment) or open a discussion on the specific lines?

@gziolo

gziolo commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review! Reworked this substantially in 5bb9fee and updated the PR description. Key change: counts stay in the small autoloaded transient, full results move to a separate non-autoloaded detail cache — which also clears the race and write-amplification points. The Site Health screen is now authoritative for the detailed results (incl. async tests) with cron as a backstop, plus per-result timestamps.

Two design calls I'd value your take on before going further: the merge policy (1-week grace window vs. browser strictly outranking cron) and whether to cap the detail-cache size. Both noted in the description.

Disclosure: I used Claude Code and Codex to develop and validate these changes (reviewed and verified by me). 🙏

@gziolo gziolo force-pushed the fix/65232-cache-site-health-results branch from 5bb9fee to 70b0566 Compare June 16, 2026 11:25
Comment thread src/js/_enqueues/admin/site-health.js Outdated
Comment thread src/js/_enqueues/admin/site-health.js Outdated
Comment thread src/js/_enqueues/admin/site-health.js Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment thread src/wp-admin/includes/class-wp-site-health.php Outdated
Comment on lines +3630 to +3640
/*
* When not authoritative, keep a recent existing entry rather than overwriting
* it. This preserves results collected from the Site Health screen (including
* the asynchronous tests) when the scheduled check runs afterwards.
*/
if ( ! $authoritative
&& isset( $stored[ $test ]['timestamp'] )
&& ( $now - (int) $stored[ $test ]['timestamp'] ) < WEEK_IN_SECONDS
) {
continue;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need this. Wouldn't it always be better to store whatever is the most fresh?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We could store the freshest result, with one caveat.

Cron runs the async tests over loopback HTTP, which fails on some hosts. When that happens, cron falls back to a recommended result labeled "A test is unavailable". If cron always overwrote, that fallback would replace an accurate result the browser just collected.

So if we want cron to override recent browser results, we first need to check whether cron could actually verify the test. That part is detectable, since the fallback is built in one place when the loopback request fails. Then cron would overwrite only when it has a real result, and skip the update when it could not verify the test.

@westonruter

Copy link
Copy Markdown
Member

Key change: counts stay in the small autoloaded transient, full results move to a separate non-autoloaded detail cache — which also clears the race and write-amplification points.

Related to this, my suggestion to reduce the number of stored keys to just test and status (and timestamp) would also alleviate the concern about the size of the option getting too big.

gziolo and others added 11 commits June 22, 2026 16:48
…nsient.

Previously the `health-check-site-status-result` transient stored only the
aggregate `good`, `recommended`, and `critical` counts, so any consumer that
needed the underlying results had to re-run the Site Health tests synchronously.

The scheduled check now caches the complete results array and the time they were
collected alongside the counts. The Site Health screen AJAX handler validates the
submitted counts and preserves the cached results and timestamp while refreshing
the counts, and `enqueue_scripts()` only localizes the counts to avoid embedding
the full results in every Site Health and Dashboard page.

This provides a reusable cached source of detailed Site Health data for the
dashboard, the admin menu, and future REST/Abilities consumers without triggering
synchronous tests.

Adds unit tests covering the scheduled-check caching and the AJAX result handler.

See #65232.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ansient.

Following review feedback, store only the aggregate counts in the autoloaded
`health-check-site-status-result` transient and cache the full per-test results
separately in `health-check-site-status-detail`, which has an expiration so the
larger payload is not autoloaded on every request.

The Site Health screen submits the full results — including the asynchronous
tests that require JavaScript to run — and they are sanitized and cached as the
authoritative detailed results. The scheduled check refreshes only missing or
stale entries so it does not discard fresher results collected from the screen.
Each result carries its own timestamp, and entries that have not been refreshed
within a month are dropped.

Adds WP_Site_Health::get_site_status_counts(), get_site_status_detail(), and
merge_site_status_detail() as the canonical accessors, and updates the unit and
Ajax tests accordingly.

See #65232.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gziolo gziolo force-pushed the fix/65232-cache-site-health-results branch from 70b0566 to 3ca1452 Compare June 22, 2026 14:52
gziolo and others added 4 commits June 22, 2026 16:53
Co-authored-by: Weston Ruter <westonruter@gmail.com>
Address review feedback on the detailed results cache.

- Describe the cached result shape instead of `array<string, mixed>`, so
  consumers and static analysis know each field.
- Drop the dead `status` guard in the counts helper. It only runs on
  already sanitized results, so the status key is always set.
- Rename `$authoritative` to `$includes_async`. The Site Health screen
  takes precedence because it carries the JavaScript-only tests, not
  because its results are inherently more trustworthy.
- Add the test name back when reading the cache. It is stored as the
  array key to avoid duplication, so consumers still get it on read.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Initialize `site_status.results` in the localized data, next to `direct`,
`async`, and `issues`. With the array always defined, the client no longer
needs to create it on first use, and the upload guard can simply check for
collected results.

- Add `'results' => array()` to the localized `SiteHealth` data.
- Drop the lazy `typeof` init before the first push.
- Send the detailed results only when there are some, using `.length`.
  This also avoids posting an empty array that would clear the cache.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Store just the test name, status, and timestamp for each result. Drop the
label and the HTML fields (description, actions, badge).

Those strings are translated for the current user's locale, and the cache is
a single site-wide option. Storing them could show one user's locale to
another. This also keeps the cached option small. Consumers should resolve
labels at read time in the current locale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gziolo

gziolo commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Thanks for the thorough review! I have applied all the feedback in the latest commits. A quick summary:

  • Cache only locale-independent data. Each entry now stores just the test name, status, and timestamp. The translated fields (label, description, actions, badge) are left out, since this cache is one shared site-wide option and those strings vary by the user's locale. This also keeps the option small.
  • Renamed the parameter to $includes_async and reframed the related docblock around carrying the async tests rather than being authoritative.
  • Tightened the cached-result types and removed the dead status guard in the counts helper.
  • Seeded the JS results array from PHP, so the client no longer initializes it, and the upload guard checks for collected results.

One open question remains: the merge policy when the scheduled check runs after a screen visit. Storing the freshest result is simplest, but cron runs the async tests over loopback, which fails on some hosts and falls back to "unavailable". One option is to let cron overwrite only when it could actually verify the test, instead of the current one-week grace window. Happy to go either way.

I also revised the PR description so it reflects the current state of the code.

Pull the two repeated patterns in the status cache helpers into small private
methods.

- `read_status_cache()` reads a transient and decodes it to an array or null,
  replacing the get_transient plus json_decode pair in three readers.
- `normalize_status_counts()` returns the good, recommended, and critical
  counts as non-negative integers, replacing the same shaping copied across
  the counts getter and setter.

Behavior is unchanged. Each cache shape now lives in one place.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants