Skip to content

Add scripts and docs for creating offline documentation archives#22441

Open
ebembi-crdb wants to merge 6 commits intomainfrom
archive-creation-docs-v2
Open

Add scripts and docs for creating offline documentation archives#22441
ebembi-crdb wants to merge 6 commits intomainfrom
archive-creation-docs-v2

Conversation

@ebembi-crdb
Copy link
Contributor

Summary

  • Adds a complete toolkit for creating portable, offline documentation archives for any CockroachDB version
  • Archives are self-contained (~100-200MB) and work completely offline
  • Navigation dynamically detects archive folder name for portability

Scripts Added

Script Purpose
create_single_archive.py Main entry point - creates archive for one version
create_all_archives_fixed.py Batch creation for multiple versions
snapshot_relative.py Core archiver that creates base structure
make_navigation_dynamic_v2.py Makes navigation work with any folder name
fix_*.py Supporting scripts for the 14-step archive process

Documentation Added

  • README_ARCHIVE_CREATION.md - Comprehensive guide with troubleshooting
  • CREATE_PORTABLE_ARCHIVE.md - Quick start guide

Usage

cd src/current
python3 create_single_archive.py v23.1  # Creates cockroachdb-docs-v23.1-offline.zip

Test plan

  • Ran create_single_archive.py v22.2 successfully
  • Verified archive structure matches expected output (1,479 files, 175MB)
  • Verified navigation works with relative paths
  • Verified sidebar contains only target version

…ives

This adds a complete toolkit for creating portable, offline documentation
archives for any CockroachDB version:

Scripts:
- create_single_archive.py: Main entry point for single version archives
- create_all_archives_fixed.py: Batch creation for multiple versions
- snapshot_relative.py: Core archiver that creates base structure
- make_navigation_dynamic_v2.py: Makes navigation work with any folder name
- fix_*.py: Supporting scripts for the 14-step archive process

Documentation:
- README_ARCHIVE_CREATION.md: Comprehensive guide with troubleshooting
- CREATE_PORTABLE_ARCHIVE.md: Quick start guide

Features:
- Creates self-contained offline archives (~100-200MB per version)
- Dynamic navigation that works when archive is renamed
- Localized Google Fonts for offline use
- Single version sidebar (no newer version references)
- Relative paths for complete portability

Usage:
  cd src/current
  python3 create_single_archive.py v23.1
@netlify
Copy link

netlify bot commented Feb 2, 2026

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit d63170a
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/69a6be3227dcfa0008052249

@github-actions
Copy link

github-actions bot commented Feb 2, 2026

Files changed:

  • .github/workflows/test-archive-scripts.yml
  • src/current/CREATE_PORTABLE_ARCHIVE.md
  • src/current/README_ARCHIVE_CREATION.md
  • src/current/create_all_archives_fixed.py
  • src/current/create_single_archive.py
  • src/current/fix_broken_sidebar_links.py
  • src/current/fix_final_broken_links.py
  • src/current/fix_incomplete_sidebars.py
  • src/current/fix_js_sidebar_final.py
  • src/current/fix_navigation_quick.py
  • src/current/fix_remaining_v25_refs.py
  • src/current/fix_root_navigation.py
  • src/current/make_navigation_dynamic.py
  • src/current/make_navigation_dynamic_v2.py
  • src/current/snapshot_relative.py
  • src/current/test_archive_smoke.py

@netlify
Copy link

netlify bot commented Feb 2, 2026

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit d63170a
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/69a6be3229ebfa0008a301cf

@netlify
Copy link

netlify bot commented Feb 2, 2026

Deploy Preview for cockroachdb-docs canceled.

Name Link
🔨 Latest commit d63170a
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/69a6be3287ee710008319d3e

@mohini-crl
Copy link
Contributor

Few changes that we'll need -

  1. Fix the brittle / broken JS replacement - create_all_archives_fixed.py contains a literal content.replace(...) that targets a malformed JS expression (something like url = url.replace(/^stable/... , )...), which is fragile and can leave broken JS or fail to match
  2. Do not rewrite checked-in scripts in-place - create_all_archives_fixed.py reads and rewrites snapshot_relative.py (changing TARGET_VERSION, sidebar names) and writes it back before running.altering committed source files during a build is error-prone and not idempotent; it can lead to accidental commits or confusing repo state.

@mohini-crl mohini-crl self-requested a review February 9, 2026 15:04
Copy link
Contributor

@mohini-crl mohini-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read the comment to check the review changes

ebembi-crdb added 2 commits February 20, 2026 23:46
- Replace brittle exact-string JS replacement with re.sub() using a flexible
  regex to match the broken stable URL replace() call regardless of whitespace
  variation (addresses reviewer concern about fragile string matching)
- Write version-specific snapshot_relative.py to a tempfile instead of
  modifying the checked-in source in-place; temp file is deleted after use
  (addresses reviewer concern about rewriting checked-in scripts)
@mohini-crl
Copy link
Contributor

Use DOM-aware edits (BeautifulSoup) for HTML changes.
create_all_archives_fixed.py still performs regex-style file-wide edits. For safety, convert structural HTML/attribute changes (injecting <style>/<script>, rewriting href/src, editing the version-switcher) to BeautifulSoup DOM operations — or at minimum extract <script> tags with BeautifulSoup and run regex only on those script strings. This prevents accidental edits in inline JSON/other JS strings.

Ensure snapshot_relative.py is not rewritten in-place.
You added tempfile — please confirm (and if needed, change) that you write a temporary copy (e.g., snapshot_working.py) and execute that so the checked-in snapshot_relative.py remains unchanged. Add a short git status/note or tiny test that proves the repo file is unchanged after running the script.

Avoid shell=True for subprocess calls.
Replace subprocess.run(..., shell=True) with the list form (["python3","snapshot_relative.py"]) or call the code directly where possible. This is more robust and avoids quoting/injection issues.

Add a small CI smoke test.
Add a GitHub Actions job (or a local test script) that uses a tiny _site/docs fixture, runs create_single_archive.py or create_all_archives_fixed.py for one version, and verifies a few invariants (e.g., index.html exists, no url.replace(/^stable left in output, snapshot_relative.py unchanged, nav assets present).

- Replace shell=True subprocess.run() calls with list-form args in both
  create_all_archives_fixed.py and create_single_archive.py
- Replace shell operations (find, mkdir, cp, zip) with Python equivalents:
  Path.glob/unlink, Path.mkdir, shutil.copy2, shutil.make_archive
- Fix create_single_archive.py: use tempfile for snapshot_relative.py
  instead of writing it in-place (was the critical bug flagged in review)
- Fix fix_navigation_in_archive() to use BeautifulSoup: extract <script>
  tags first, apply regex only to their string content, then reserialise
- Add test_archive_smoke.py: verifies JS patch works, snapshot_relative.py
  is unchanged, and no shell=True remains in subprocess calls
- Add .github/workflows/test-archive-scripts.yml CI job that runs
  test_archive_smoke.py and checks git diff on snapshot_relative.py
@ebembi-crdb ebembi-crdb requested a review from a team as a code owner February 26, 2026 17:58
ebembi-crdb added 2 commits March 3, 2026 16:09
version.replace('.', '\.') inside an f-string {} is a SyntaxError on
Python < 3.12. Pre-compute the escaped version into a local variable
before using it in the f-string expression.
Working directory is src/current, so the path to snapshot_relative.py
should be relative to that, not the repo root. Also add -- to
disambiguate the path from a git revision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants