Skip to content

Khan Academy links: validate search results and add fallback for concepts with no matches #21

@jeremymanning

Description

@jeremymanning

Problem

Many question concepts either have no Khan Academy course or link to spurious matches (e.g., similar keywords but wrong theme/topic). When a user clicks the Khan Academy button after answering a question, they may land on irrelevant or empty search results.

Based on exploration, it doesn't appear possible to automatically count Khan Academy search results client-side (no public API for search result counts, and CORS prevents scraping).

Proposed Solution

1. Local validation script

Write a script (scripts/validate_khan_links.py) that:

  • Reads all domain JSON files and extracts unique concepts_tested values
  • For each concept, runs a search on Khan Academy (e.g., https://www.khanacademy.org/search?page_search_query=<concept>)
  • Counts the number of search results (using headless browser or Khan Academy's internal API if discoverable)
  • Outputs a report: concept → result count → recommended action

2. Add khan_academy_mode flag to question JSON

For each question, add a field:

{
  "khan_academy_mode": "search" | "generic"
}
  • "search": The Khan Academy button initiates a search for the question's specific concept(s) (current behavior)
  • "generic": The Khan Academy button links to a generic course page for that sub-domain or domain (e.g., https://www.khanacademy.org/science/physics for physics questions)

3. Update quiz.js to respect the flag

In src/ui/quiz.js, when building the Khan Academy link:

  • If khan_academy_mode === "search" → use current search URL
  • If khan_academy_mode === "generic" → link to pre-configured domain course URL

4. Define generic fallback URLs per domain

In domain JSON or a config file, define fallback Khan Academy URLs:

{
  "quantum-physics": "https://www.khanacademy.org/science/physics/quantum-physics",
  "astrophysics": "https://www.khanacademy.org/science/cosmology-and-astronomy",
  "biology": "https://www.khanacademy.org/science/biology"
}

Tasks

  • Write scripts/validate_khan_links.py — local script to check all concepts against Khan Academy search
  • Generate report of concepts with 0 results vs. valid results
  • Add khan_academy_mode field to question generation pipeline
  • Update quiz.js to use the flag when building Khan Academy URLs
  • Define generic fallback URLs for each domain
  • Re-run validation after question generation is complete

Notes

This should be done after all 50 questions per domain are generated, since the concept list needs to be finalized first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions