feat: add more scan chains plus CEX-only pull with reliable multi-chain browser priming#116
feat: add more scan chains plus CEX-only pull with reliable multi-chain browser priming#116elitex45 wants to merge 2 commits intodawsbot:v1from
Conversation
Add a dedicated CEX scraping pipeline and improve explorer switching reliability across chains. Changes in this commit: - Add new CEX-only pull script with:\n - DefiLlama CEX keyword ingestion (with fallback to built-in list)\n - keyword normalization/boundary matching\n - label/name-tag exclusions for known false positives\n - risk-term exclusion filter (exploit/hack/phish/scam/etc.)\n - optional --no-10k-limit pagination mode\n - compact percent progress output based on estimated totals\n - fetch retries + timeout handling - Add npm scripts:\n - pull:cex\n - pull:cex:all\n - smoke:labelcloud - Add smoke-labelcloud preflight script for explorer availability checks - Add missing chain integrations and chainId mappings:\n Avalanche, Polygon, Mantle, Scroll, Linea, Blast, Berachain, World Chain\n (remove non-working Mode and zkSync Era entries) - Improve BrowserFetcher for multi-chain runs:\n - per-origin Cloudflare priming via setActiveOrigin()\n - navigation-based HTML fetch for cross-origin reliability\n - origin-aware POST behavior with re-prime retry - Prime active origin per chain in fetch-all and fetch-cex loops
|
@dawsbot Great repo btw very much needed :) |
|
Thanks for the PR @elitex45 ! Sorry for my delay. What's the method you recommend for me to test and verify this all works? |
|
Its exactly same as before start the chrome instance in dev mode and run the command to pull cex addresses. start chrome instance with this then run pull:cex to fetch cex addresses it follows exact steps as before but this specifically focuses on fetching cex related addresses. other fix is when we try to select multiple explorers at once the code wasn't requesting to login in that explorer which was causing it to fail, now its fixed by launching the explorer links to each and login to them to let script to crawl it. also added other supported explorer links to extend the scope. |
Summary
This PR adds a CEX-focused scraping flow and improves browser/origin handling for reliable multi-chain pulls.
What Changed
Added CEX-only pull script:
scripts/fetch-cex.ts--no-10k-limitpagination modeAdded scripts in
package.json:pull:cexpull:cex:allsmoke:labelcloudAdded
scripts/smoke-labelcloud.tsfor chain labelcloud reachability checks.Added new chain integrations:
Improved browser fetching reliability:
setActiveOrigin()Updated pull loops (
fetch-all.ts,fetch-cex.ts) to prime origin per chain.Data Artifacts Included
data/db.sqlite3data/json/accounts.jsondata/csv/accounts.csvValidation
npx tsc -p tsconfig.json