chore: sync core lib and CLAUDE.md from agent-core#42
Conversation
There was a problem hiding this comment.
Code Review
This pull request attempts to mitigate potential Regular Expression Denial of Service (ReDoS) vulnerabilities across several files by bounding various quantifiers and character classes. However, the review feedback points out that several of the original regular expressions—specifically those matching markdown links, module exports, and bad example blocks—were already safe from ReDoS. Furthermore, introducing arbitrary length limits on these patterns can cause regressions by failing to match valid, large inputs (such as long URLs, large configuration exports, or extensive code blocks). It is recommended to revert these specific patterns to their original, safe forms as suggested.
| // ReDoS fix: bound the negated-class captures so the matcher is linear; | ||
| // bounds far exceed any realistic markdown link, so matches are unchanged. | ||
| const linkRegex = /\[([^\]]{1,2000})\]\(([^)]{1,4000})\)/g; |
There was a problem hiding this comment.
The original regular expression /\\\[([^\\\\]+)\\\]\\(([^)]+)\\)/g does not have a ReDoS vulnerability. Because the negated character classes [^\\\\]+ and [^)]+ are strictly bounded by the literal characters \\\\] and \\) respectively, there is no overlapping match state that can cause catastrophic backtracking.\n\nBy introducing arbitrary limits like {1,2000} and {1,4000}, this regex will fail to match valid markdown links that contain long URLs, such as those with extensive query parameters, tracking tokens, or inline base64 data URIs (which are very common in markdown image links).\n\nWe should revert this to the original safe pattern.
| // ReDoS fix: bound the negated-class captures so the matcher is linear; | |
| // bounds far exceed any realistic markdown link, so matches are unchanged. | |
| const linkRegex = /\[([^\]]{1,2000})\]\(([^)]{1,4000})\)/g; | |
| const linkRegex = /\[([^\]]+)\]\(([^)]+)\)/g; |
| // ReDoS fix: bound the \s* runs and capture length so the matcher stays linear; | ||
| // bounds exceed any realistic module.exports declaration so matches are unchanged. | ||
| const moduleExportsPattern = /module\.exports\s{0,1000}=\s{0,1000}\{([^}]{1,100000})\}/; |
There was a problem hiding this comment.
The original regular expression /module\\.exports\\s*=\\s*\\{([^}]+)\\}/ is completely safe from ReDoS. The negated character class [^}]+ is terminated by the literal \\}, meaning there is no ambiguity or overlapping paths for the regex engine to backtrack exponentially.\n\nLimiting the capture to 100,000 characters ({1,100000}) will cause the parser to fail on large, valid JavaScript files (such as large configuration files, localized translation dictionaries, or auto-generated asset maps) where the exported object exceeds 100KB.\n\nWe should revert this to the original safe pattern.
| // ReDoS fix: bound the \s* runs and capture length so the matcher stays linear; | |
| // bounds exceed any realistic module.exports declaration so matches are unchanged. | |
| const moduleExportsPattern = /module\.exports\s{0,1000}=\s{0,1000}\{([^}]{1,100000})\}/; | |
| const moduleExportsPattern = /module\.exports\s*=\s*\{([^}]+)\}/; |
| // ReDoS fix: bound the lazy [\s\S]*? bodies so an unterminated <bad-example> or | ||
| // ``` fence cannot drive polynomial backtracking; 50k chars covers any realistic | ||
| // example block, so the stripped regions are unchanged for real content. | ||
| const BAD_EXAMPLE_TAG_PATTERN = /<bad[_\- ]?example>[\s\S]{0,50000}?<\/bad[_\- ]?example>/gi; | ||
| const BAD_EXAMPLE_CODE_PATTERN = /```[^\n]{0,500}bad[^\n]{0,500}\n[\s\S]{0,50000}?```/gi; |
There was a problem hiding this comment.
The original regular expressions for stripping bad examples do not possess ReDoS vulnerabilities. Lazy matching over all characters ([\\s\\S]*?) is a linear scan and does not cause catastrophic backtracking because there are no nested quantifiers or overlapping active groups.\n\nBy capping the match at 50,000 characters, any large <bad-example> block or code block (which can easily exceed 50KB when demonstrating large prompts or complex codebases) will fail to be stripped. This will lead to false positives during cross-file analysis.\n\nWe should revert these to their original safe patterns.
| // ReDoS fix: bound the lazy [\s\S]*? bodies so an unterminated <bad-example> or | |
| // ``` fence cannot drive polynomial backtracking; 50k chars covers any realistic | |
| // example block, so the stripped regions are unchanged for real content. | |
| const BAD_EXAMPLE_TAG_PATTERN = /<bad[_\- ]?example>[\s\S]{0,50000}?<\/bad[_\- ]?example>/gi; | |
| const BAD_EXAMPLE_CODE_PATTERN = /```[^\n]{0,500}bad[^\n]{0,500}\n[\s\S]{0,50000}?```/gi; | |
| const BAD_EXAMPLE_TAG_PATTERN = /<bad[_\- ]?example>[\s\S]*?<\/bad[_\- ]?example>/gi; | |
| const BAD_EXAMPLE_CODE_PATTERN = /```[^\n]*bad[^\n]*\n[\s\S]*?```/gi; |
|
Superseded by latest sync (post ReDoS + prototype-pollution hardening). |
Automated sync of lib/ and CLAUDE.md from agent-core.