feat: modernize langchain integration crawl tools by daveomri · Pull Request #31 · apify/langchain-apify

daveomri · 2026-04-29T08:12:33Z

Summary

Third PR on top of feat/modernize-langchain-integration; builds on the [native components PR](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration-native-components) and adds the Search & Crawling Actor tools layer: four new BaseTool subclasses wrapping search, maps, video, and e-commerce Actors. Upcoming PR will fold this together with social-media tools and documentation onto [feat/modernize-langchain-integration](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration) before merging to main.

New code: ~426 lines — Tests: ~337 lines

Note on scope: This PR is intentionally scoped to the four Search & Crawling Actor tools called out in US-4 (RAG Web Browser, Google Maps, YouTube, E-commerce). Social-media Actor tools and the integration documentation will land as follow-up PRs.

ApifyRAGWebBrowserTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps apify/rag-web-browser. Returns JSON with run metadata (run_id / status / dataset_id / timestamps) and items (crawled-page dicts). Distinct from ApifySearchRetriever (which returns LangChain Document objects); this tool returns JSON for agent tool-calling.
ApifyGoogleMapsTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps compass/crawler-google-places. Required query, optional max_results (default 10) and language (ISO code). Returns JSON with run metadata and items (place dicts).
ApifyYouTubeScraperTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps streamers/youtube-scraper. Required search_query, optional search_type: Literal['search', 'video', 'channel'] (default search), max_results (default 10). Tight Literal at the LLM boundary, loose str + runtime ValueError at the _client.py boundary so direct callers get the same protection.
ApifyEcommerceScraperTool ([_actor_tools.py](langchain_apify/_actor_tools.py))
- Wraps apify/e-commerce-scraping-tool. Required url, optional max_results (default 20). Bare-URL design intentionally keeps the LLM-facing surface minimal; selector hints can be added later if real users hit empty-result issues.
APIFY_SEARCH_TOOLS convenience list
- New list[type[BaseTool]] exported alongside APIFY_CORE_TOOLS and APIFY_ACTOR_TOOLS for selective agent binding: [ApifyRAGWebBrowserTool, ApifyGoogleMapsTool, ApifyYouTubeScraperTool, ApifyEcommerceScraperTool].
ApifyToolsClient additions ([_client.py](langchain_apify/_client.py))
- Three new methods (google_maps_search, youtube_scrape, ecommerce_scrape) and one rename + signature change: rag_web_search → rag_web_browser_search, now returning (run, items) like the other helpers so the tool layer can build _run_meta(run). All four reuse the existing run_actor_and_get_items plumbing — transport-error wrapping and _check_run_status come for free.
ApifySearchRetriever ([retrievers.py](langchain_apify/retrievers.py))
- Single call site updated to consume the new tuple return (_, items = self._client.rag_web_browser_search(...)); behaviour and Document shape are unchanged.
Backward compatible
- No changes to public API of any pre-existing class. ApifyActorsTool / ApifyDatasetLoader / ApifyWrapper untouched. The rag_web_search rename is internal — only the retriever consumed it, and that's updated in-tree.
Tests
- 35 new unit tests covering: input-mapping per helper (asserts Actor ID + run_input keys), youtube_scrape enum validation, happy-path JSON shape per tool, parametrized _TOOL_INVOCATIONS battery covering RuntimeError → ToolException, empty-dataset, handle_tool_error=True swallow, missing-token, plus inheritance / metadata / APIFY_SEARCH_TOOLS membership. Existing test_retrievers.py tests rewired for the new tuple-return helper.

Review strategy

Merge strategy

This PR targets feat/modernize-langchain-integration, not main. It depends on the [native components PR](https://github.com/apify/langchain-apify/tree/feat/modernize-langchain-integration-native-components) being merged first — _actor_tools.py extends the file introduced there and consumes _run_meta / _ApifyGenericTool from tools.py. Once native components is merged into the integration branch, this PR will be rebased and opened for review. Social-media tools and docs will follow as separate PRs on the same integration branch before the final merge to main.

…input schemas

…mline client handling and error management

…media tools for apify integration

… Apify tools

…un_task methods

…ms and message for empty dataset

… api interaction

…y tools to enforce safety constraints

…and maintability; update test cases for better formatting and error handling

…tandards

…craping-tool

…ze-langchain-integration-crawl-tools

…errors

MQ37

Two issues I would like to address before merging, otherwise the PR looks good! Please see my comments

I think the serializer is duplicating data when trying to preserve the old flat format while introducing the new nested one. For example the run id is top level and also nested in some cases.
I do not understand the error=no_items check, where does this value come from?

drobnikj

Looks good, please fix Kuba comments, ontherwise pre approving

jirispilka · 2026-06-22T00:53:37Z

+        """Convert Actor dataset items to LangChain Documents."""
+        docs: list[Document] = []
+        for item in items:
+            page_content = item.get('text') or item.get('markdown') or ''


Content order here is text or markdown, but ApifyRAGWebBrowserTool (same apify/rag-web-browser Actor) uses markdown or text. The source fallback order differs too: here it's crawledUrl or url or metadata.url, the tool does metadata.url or crawledUrl. Same Actor, two consumers, different output when both fields are present — worth aligning, ideally via one shared extract helper.

Generated by Claude Code

jirispilka

Nice, there is one issue with google scraper input then I left a couple of nits.

When submitting the feedback, Claude has somehow messed it up. Apologizes for duplicated comments.

…ton and updating return format to json object

MQ37

Cool, thank you! Address the comments from the guys and we can merge.

…ze-langchain-integration-crawl-tools

…elpers)

daveomri added 30 commits April 20, 2026 16:12

feat: implement apifyclient wrapper

8cad430

feat: removed redundant const file

2404b9c

feat: add few more input schemas, helpers and tool classes

b1a89a4

feat: export new tools from __init__

0aa9175

feat: add unit tests

4e46d36

feat: implement tests and introduce tools list

fc6ef12

fix: lint fix

cc5be9e

feat: enhance error handling and documentation for apify tools

c2b9cb6

fix: iso format fix

3edf126

feat: add apify run task and apify run task and get items tools with …

8c36edc

…input schemas

feat: introduce _ApifyGenericTool base class for Apify tools to strea…

026175a

…mline client handling and error management

feat: add _actor_tools.py file to define upcomming search and social …

110c971

…media tools for apify integration

fix: add try/except to match others

a08f63e

fix: update timeout constants and improve input schema descripiton in…

d028531

… Apify tools

fix: enhance error handling for missing dataset id in run_actor and r…

429a3ed

…un_task methods

fix: update apifygetdatasetitemstool to return a json object with ite…

b914e47

…ms and message for empty dataset

feat: add integration smoke tests for generic Apify tools to validate…

0f71181

… api interaction

feat: implement clamping for timeout, memory, and item limits in apif…

50c52f2

…y tools to enforce safety constraints

feat: clean up _actor_tools.py and tools.py for improved readibility …

ba179a6

…and maintability; update test cases for better formatting and error handling

feat: add three new tools to _client.py

da900ce

feat: implement apifygooglesearchtool and apifywebcrawlertool

ff6ffeb

feat: implement a apify search retrievel

6e8888c

feat: add apify crawl loader to document_loaders.py

b124ce1

feat: update __init__

029b9e1

feat: add unit tests

c7ee287

feat: add actor tools unit tests

ec60765

feat: add retrievers unit tests

c077186

feat: simplify apify crawl loader init and enhance unit tests

0b4ecbb

ref: align private scope conventions with langchain partner package s…

005294b

…tandards

ref: migrate auth to SecretStr + secret_from_env pattern

2f74c29

daveomri added 9 commits April 29, 2026 13:27

fix: send correct detailsUrls/maxProductResults to apify/e-commerce-s…

3db07fb

…craping-tool

fix: return flat [{url,title,content}] array per spec

c973123

feat: support category URLs via url_type parameter

6b825af

fix: use listingUrls (not categoryUrls) for category-mode

ddb4373

fix: use canonical searchQueries (array) field, not searchKeywords

c5607d8

fix: rename actor search group

250e1ac

fix: test fix

f4cf20e

fix: merge tools

1c7aa14

Merge branch 'feat/modernize-langchain-integration' into feat/moderni…

baee642

…ze-langchain-integration-crawl-tools

daveomri self-assigned this Jun 10, 2026

daveomri changed the title ~~Feat/modernize langchain integration crawl tools~~ feat: modernize langchain integration crawl tools Jun 10, 2026

daveomri marked this pull request as ready for review June 10, 2026 16:29

daveomri requested review from MQ37, drobnikj and jirispilka June 10, 2026 16:30

fix: handle datetime serialization in tool responses to prevent JSON …

1ac52df

…errors

MQ37 requested changes Jun 16, 2026

View reviewed changes

Comment thread langchain_apify/_actor_tools.py Outdated

Comment thread langchain_apify/tools.py Outdated

Comment thread langchain_apify/tools.py Outdated

drobnikj approved these changes Jun 17, 2026

View reviewed changes

Comment thread langchain_apify/_actor_tools.py Outdated

Comment thread langchain_apify/retrievers.py Outdated

Comment thread langchain_apify/_client.py Outdated

ref: emit nested-only tool response envelope and address review feedback

b0a86b1

daveomri requested a review from MQ37 June 18, 2026 11:14

jirispilka reviewed Jun 22, 2026

View reviewed changes

Comment thread langchain_apify/_client.py Outdated

jirispilka self-requested a review June 22, 2026 01:04

jirispilka approved these changes Jun 22, 2026

View reviewed changes

ref: simplify tool response structure by removing serialization funci…

41477d0

…ton and updating return format to json object

MQ37 approved these changes Jun 22, 2026

View reviewed changes

daveomri added 2 commits June 22, 2026 12:23

Merge branch 'feat/modernize-langchain-integration' into feat/moderni…

52d1e5e

…ze-langchain-integration-crawl-tools

fix: address connector review findings (search count, source order, h…

a0f2057

…elpers)

daveomri merged commit 2f7d2d9 into feat/modernize-langchain-integration Jun 22, 2026
5 checks passed

daveomri deleted the feat/modernize-langchain-integration-crawl-tools branch June 22, 2026 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modernize langchain integration crawl tools#31

feat: modernize langchain integration crawl tools#31
daveomri merged 84 commits into
feat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-crawl-tools

daveomri commented Apr 29, 2026

Uh oh!

MQ37 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drobnikj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jirispilka Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

jirispilka left a comment •

edited

Loading

Uh oh!

MQ37 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

daveomri commented Apr 29, 2026

Summary

Review strategy

Merge strategy

Uh oh!

MQ37 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drobnikj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jirispilka Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jirispilka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MQ37 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jirispilka left a comment •

edited

Loading