Skip to content

Lots More Bots#106

Open
donatj wants to merge 7 commits into
masterfrom
bot-attack
Open

Lots More Bots#106
donatj wants to merge 7 commits into
masterfrom
bot-attack

Conversation

@donatj

@donatj donatj commented May 14, 2025

Copy link
Copy Markdown
Owner

No description provided.

@donatj donatj changed the title Lots more Bots Lots More Bots May 15, 2025
@donatj

donatj commented May 15, 2025

Copy link
Copy Markdown
Owner Author

My hesitance in merging this is two fold

  • it benchmarks a fair bit slower per UA
  • it might cause unexpected behaviour if people are varying HTML based on detected UA, and suddenly what was a browser is now properly a bot

@donatj donatj requested a review from Copilot October 31, 2025 09:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances bot detection in the user agent parser by adding support for numerous web crawlers and bots. The main changes implement a new pattern-based bot detection mechanism that can identify bots with URL references in their user agent strings, while maintaining backward compatibility for existing bot detection.

  • Added a new regex-based bot detection system that identifies bots by their characteristic (name/version; +http://...) pattern
  • Removed hardcoded bot names from the main browser regex and moved them to the new bot detection logic
  • Added constants for 11 new bot/crawler types that are commonly used
  • Added 116 new test cases for various bot user agents
  • Added browser constant exclusions for lesser-known bots to keep the API surface manageable

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/UserAgentParser.php Refactored bot detection logic with new regex pattern and added prev capture group to main browser regex
src/UserAgent/Browsers.php Added constants for 11 new commonly-used bot types (archive.org_bot, Discordbot, Google-Read-Aloud, Google-Safety, Pinterestbot, WellKnownBot, wpbot, YandexImages, YandexMobileBot, YandexRCA, YandexUserproxy)
bin/constant_generator.php Added exclusion list for 30 bot types to prevent cluttering the Browsers interface
bin/user_agent_sorter.php Added error handling for JSON decode failures
tests/user_agents.dist.json Added 116 new bot user agent test cases covering various crawlers and bots
README.md Updated documentation with new browser constants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/UserAgentParser.php
if(
(
empty($result[BROWSER][0])
|| ($result['prev'][0] !== '')

Copilot AI Oct 31, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will throw an 'Undefined index' notice if $result['prev'][0] doesn't exist. The regex on line 99 uses (?P<prev>.)? with a ? quantifier, meaning the capture group may not be set. This condition should use isset() or !empty() to check for existence before accessing the array element. Suggested fix: || (isset($result['prev'][0]) && $result['prev'][0] !== '')

Suggested change
|| ($result['prev'][0] !== '')
|| (isset($result['prev'][0]) && $result['prev'][0] !== '')

Copilot uses AI. Check for mistakes.
@donatj

donatj commented Jun 9, 2026

Copy link
Copy Markdown
Owner Author

@copilot merge master into this

Copilot AI commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@copilot merge master into this

Merged master into this branch in 57afd47 and re-ran PHPUnit locally successfully (1158 tests, 2699 assertions).

Copilot finished work on behalf of donatj June 9, 2026 10:27
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Benchmark Results

Metric Value
Before (base) 40.415659s
After (PR) 43.214714s
Delta +2.799055s (+6.93%)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +1582 to +1586
"Mozilla\/5.0 (compatible; Baiduspider\/2.0; +http:\/\/www.baidu.com\/search\/spider.html": {
"platform": null,
"browser": "Baiduspider",
"version": "2.0"
},
Comment thread src/UserAgentParser.php
Comment on lines +123 to +125
%[(;]\s*(?P<browser>[^(/;]+)
(?:[:/ ]v?(?P<version>[0-9A-Z.]+)[^;)\s]*)?
;?(?:\s*robot;)?\s*\+https?:%x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants