Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
8cad430
feat: implement apifyclient wrapper
daveomri Apr 20, 2026
2404b9c
feat: removed redundant const file
daveomri Apr 20, 2026
b1a89a4
feat: add few more input schemas, helpers and tool classes
daveomri Apr 20, 2026
0aa9175
feat: export new tools from __init__
daveomri Apr 20, 2026
4e46d36
feat: add unit tests
daveomri Apr 20, 2026
fc6ef12
feat: implement tests and introduce tools list
daveomri Apr 21, 2026
cc5be9e
fix: lint fix
daveomri Apr 21, 2026
c2b9cb6
feat: enhance error handling and documentation for apify tools
daveomri Apr 21, 2026
3edf126
fix: iso format fix
daveomri Apr 21, 2026
8c36edc
feat: add apify run task and apify run task and get items tools with …
daveomri Apr 21, 2026
026175a
feat: introduce _ApifyGenericTool base class for Apify tools to strea…
daveomri Apr 21, 2026
110c971
feat: add _actor_tools.py file to define upcomming search and social …
daveomri Apr 21, 2026
a08f63e
fix: add try/except to match others
daveomri Apr 21, 2026
d028531
fix: update timeout constants and improve input schema descripiton in…
daveomri Apr 21, 2026
429a3ed
fix: enhance error handling for missing dataset id in run_actor and r…
daveomri Apr 21, 2026
b914e47
fix: update apifygetdatasetitemstool to return a json object with ite…
daveomri Apr 21, 2026
0f71181
feat: add integration smoke tests for generic Apify tools to validate…
daveomri Apr 21, 2026
50c52f2
feat: implement clamping for timeout, memory, and item limits in apif…
daveomri Apr 21, 2026
ba179a6
feat: clean up _actor_tools.py and tools.py for improved readibility …
daveomri Apr 22, 2026
da900ce
feat: add three new tools to _client.py
daveomri Apr 22, 2026
ff6ffeb
feat: implement apifygooglesearchtool and apifywebcrawlertool
daveomri Apr 22, 2026
6e8888c
feat: implement a apify search retrievel
daveomri Apr 22, 2026
b124ce1
feat: add apify crawl loader to document_loaders.py
daveomri Apr 22, 2026
029b9e1
feat: update __init__
daveomri Apr 22, 2026
c7ee287
feat: add unit tests
daveomri Apr 22, 2026
ec60765
feat: add actor tools unit tests
daveomri Apr 22, 2026
c077186
feat: add retrievers unit tests
daveomri Apr 22, 2026
0b4ecbb
feat: simplify apify crawl loader init and enhance unit tests
daveomri Apr 22, 2026
005294b
ref: align private scope conventions with langchain partner package s…
daveomri Apr 22, 2026
2f74c29
ref: migrate auth to SecretStr + secret_from_env pattern
daveomri Apr 23, 2026
6258b2b
fix: backward-compat fix
daveomri Apr 23, 2026
2905b67
fix: update stale doc string
daveomri Apr 23, 2026
3238c02
chore: removed redundant file
daveomri Apr 23, 2026
92df406
fix: extracted repeated code, fixed secretstr compatibility to apifyt…
daveomri Apr 23, 2026
3a0f666
fix: set min value to timeout, memory and items, add exlude and repr …
daveomri Apr 23, 2026
8614cfd
feat: added repr and exclude to apify api token
daveomri Apr 23, 2026
2bf130a
feat: add type checking to apify core tools list
daveomri Apr 23, 2026
98293d4
feat: add tests for clamped values and apify api token
daveomri Apr 23, 2026
863ed8d
fix: lint fix
daveomri Apr 23, 2026
70527e0
ref: update apify_api_token type to support SecretStr in document loa…
daveomri Apr 24, 2026
797b7f9
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 24, 2026
f005bc5
fix: turn off logger for ApifySearchRetrieval
daveomri Apr 24, 2026
dd08098
fix: fix lint errors
daveomri Apr 24, 2026
2804a5c
fix: tests fix
daveomri Apr 24, 2026
ea8b16e
chore: rename tools to match the task description
daveomri Apr 28, 2026
cd1eea1
fix: narrow except blocks in _client.py to SDK/transport errors
daveomri Apr 28, 2026
50c3583
fix: clamp memory_mbytes to Apify platform minimum (128 MB)
daveomri Apr 28, 2026
450728c
fix: narrow empty-dataset message in ApifyGetDatasetItemsTool
daveomri Apr 28, 2026
1360e92
ref: simplify ApifyToolsClient.__init__ to require explicit token
daveomri Apr 28, 2026
09b6c6e
docs: add module-level docstring to tools.py
daveomri Apr 28, 2026
a5bd7cc
ref: rename model_post_init parameter to
daveomri Apr 28, 2026
e0f15e8
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 28, 2026
23242c1
revert: restore env-fallback
daveomri Apr 28, 2026
8f9afe6
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 28, 2026
7ea3e8c
chore: drop placeholder section in _actor_tools.py
daveomri Apr 28, 2026
700e5ab
chore: align APIFY_ACTOR_TOOLS type hint with APIFY_CORE_TOOLS
daveomri Apr 28, 2026
c0dd11e
feat: constrain crawler_type to a Literal of valid Apify values
daveomri Apr 28, 2026
0189943
feat: clamp max_crawl_depth in ApifyWebCrawlerTool
daveomri Apr 28, 2026
6d2422d
feat: expose timeout_secs in ApifyGoogleSearchInput
daveomri Apr 28, 2026
2dfecd7
ref: accept SecretStr token in ApifyCrawlLoader
daveomri Apr 28, 2026
9c81785
docs: clarify ApifyCrawlLoader.lazy_load is not truly lazy
daveomri Apr 28, 2026
49dd4f0
ref: rewrite ApifySearchRetriever to use ApifyToolsClient
daveomri Apr 28, 2026
a060c14
fix: normalise locale codes to lowercase to match Apify Actor schema
daveomri Apr 28, 2026
a908467
fix: extract source URL from metadata.url for apify/rag-web-browser
daveomri Apr 28, 2026
b2290a7
feat: add Search & Crawling helpers to ApifyToolsClient (rag_web_brow…
daveomri Apr 29, 2026
c4d133b
feat: cover input mapping and enum validation for new ApifyToolsClien…
daveomri Apr 29, 2026
f5dd607
feat: add ApifyRAGWebBrowserTool, ApifyGoogleMapsTool, ApifyYouTubeSc…
daveomri Apr 29, 2026
5368645
feat: expose search tools and APIFY_SEARCH_TOOLS from langchain_apify
daveomri Apr 29, 2026
1392e0b
test: cover search tools (happy path + parametrized error / empty / h…
daveomri Apr 29, 2026
45a62d1
fix: lint fix
daveomri Apr 29, 2026
3db07fb
fix: send correct detailsUrls/maxProductResults to apify/e-commerce-s…
daveomri Apr 29, 2026
c973123
fix: return flat [{url,title,content}] array per spec
daveomri Apr 30, 2026
6b825af
feat: support category URLs via url_type parameter
daveomri Apr 30, 2026
ddb4373
fix: use listingUrls (not categoryUrls) for category-mode
daveomri Apr 30, 2026
c5607d8
fix: use canonical searchQueries (array) field, not searchKeywords
daveomri Apr 30, 2026
250e1ac
fix: rename actor search group
daveomri May 5, 2026
f4cf20e
fix: test fix
daveomri May 5, 2026
1c7aa14
fix: merge tools
daveomri May 5, 2026
baee642
Merge branch 'feat/modernize-langchain-integration' into feat/moderni…
daveomri Jun 10, 2026
1ac52df
fix: handle datetime serialization in tool responses to prevent JSON …
daveomri Jun 11, 2026
b0a86b1
ref: emit nested-only tool response envelope and address review feedback
daveomri Jun 18, 2026
41477d0
ref: simplify tool response structure by removing serialization funci…
daveomri Jun 22, 2026
52d1e5e
Merge branch 'feat/modernize-langchain-integration' into feat/moderni…
daveomri Jun 22, 2026
a0f2057
fix: address connector review findings (search count, source order, h…
daveomri Jun 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions langchain_apify/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,14 @@
from importlib import metadata
from typing import TYPE_CHECKING

from langchain_apify._actor_tools import ApifyGoogleSearchTool, ApifyWebCrawlerTool
from langchain_apify._actor_tools import (
ApifyEcommerceScraperTool,
ApifyGoogleMapsTool,
ApifyGoogleSearchTool,
ApifyRAGWebBrowserTool,
ApifyWebCrawlerTool,
ApifyYouTubeScraperTool,
)
from langchain_apify.document_loaders import ApifyCrawlLoader, ApifyDatasetLoader
from langchain_apify.retrievers import ApifySearchRetriever
from langchain_apify.tools import (
Expand Down Expand Up @@ -42,12 +49,18 @@
APIFY_SEARCH_TOOLS: list[type[BaseTool]] = [
ApifyGoogleSearchTool,
ApifyWebCrawlerTool,
ApifyRAGWebBrowserTool,
ApifyGoogleMapsTool,
ApifyYouTubeScraperTool,
ApifyEcommerceScraperTool,
]

__all__ = [
# Existing components (backward-compatible)
'ApifyActorsTool',
'ApifyCrawlLoader',
'ApifyDatasetLoader',
'ApifySearchRetriever',
'ApifyWrapper',
# Core generic tools
'ApifyGetDatasetItemsTool',
Expand All @@ -56,16 +69,16 @@
'ApifyRunTaskAndGetDatasetTool',
'ApifyRunTaskTool',
'ApifyScrapeUrlTool',
# Actor-specific tools
# Search & crawling tools
'ApifyGoogleSearchTool',
'ApifyWebCrawlerTool',
# Retriever
'ApifySearchRetriever',
# Loaders
'ApifyCrawlLoader',
'ApifyRAGWebBrowserTool',
'ApifyGoogleMapsTool',
'ApifyYouTubeScraperTool',
'ApifyEcommerceScraperTool',
# Tool group lists
'APIFY_SEARCH_TOOLS',
'APIFY_CORE_TOOLS',
'APIFY_SEARCH_TOOLS',
# Meta
'__version__',
]
309 changes: 306 additions & 3 deletions langchain_apify/_actor_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,37 @@
from __future__ import annotations

import json
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Literal

from langchain_core.tools import ToolException
from pydantic import BaseModel # noqa: TCH002
from pydantic import BaseModel, Field

from langchain_apify._client import (
_DEFAULT_CRAWLER_TYPE,
_DEFAULT_GOOGLE_MAX_RESULTS,
_DEFAULT_MAX_CRAWL_DEPTH,
_DEFAULT_MAX_CRAWL_PAGES,
_DEFAULT_RAG_MAX_RESULTS,
_DEFAULT_RUN_TIMEOUT_SECS,
)
from langchain_apify._types import CrawlerType # noqa: TCH001 # runtime-needed: shared Literal alias
from langchain_apify._utils import _extract_content, _safe_title
from langchain_apify._utils import _extract_content, _extract_source, _safe_title
from langchain_apify.tools import (
ApifyGoogleSearchInput,
ApifyWebCrawlerInput,
_ApifyGenericTool,
_run_meta,
)

if TYPE_CHECKING:
from langchain_core.callbacks import CallbackManagerForToolRun

# Per-tool result limits not shared via ``_client`` (Maps/YouTube/Ecommerce).
_DEFAULT_GOOGLE_MAPS_MAX_RESULTS = 10
_DEFAULT_YOUTUBE_MAX_RESULTS = 10
_DEFAULT_ECOMMERCE_MAX_RESULTS = 20


# ---------------------------------------------------------------------------
# Search & Crawling tools
# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -173,3 +181,298 @@ def _run(
if isinstance(item, dict)
]
return json.dumps({'run': None, 'items': pages}, default=str)


# ---------------------------------------------------------------------------
# Input schemas (US-4 Search & Crawling Actor tools)
# ---------------------------------------------------------------------------


class ApifyRAGWebBrowserInput(BaseModel):
"""Input schema for :class:`ApifyRAGWebBrowserTool`."""

query: str = Field(description='Search query string.')
max_results: int = Field(default=_DEFAULT_RAG_MAX_RESULTS, description='Maximum number of results to return.')


class ApifyGoogleMapsInput(BaseModel):
"""Input schema for :class:`ApifyGoogleMapsTool`."""

query: str = Field(description='Search query (e.g. "coffee shops in Berlin").')
max_results: int = Field(
default=_DEFAULT_GOOGLE_MAPS_MAX_RESULTS, description='Maximum number of places to return.'
)
language: str | None = Field(
default=None,
description='Optional ISO language code for results (e.g. "en", "de").',
)


class ApifyYouTubeScraperInput(BaseModel):
"""Input schema for :class:`ApifyYouTubeScraperTool`."""

search_query: str = Field(
description=('Keyword for "search" mode, or a video/channel URL for "video"/"channel" modes.'),
)
search_type: Literal['search', 'video', 'channel'] = Field(
default='search',
description='Scrape mode: search keyword, single video URL, or channel URL.',
)
max_results: int = Field(default=_DEFAULT_YOUTUBE_MAX_RESULTS, description='Maximum number of items to return.')


class ApifyEcommerceScraperInput(BaseModel):
"""Input schema for :class:`ApifyEcommerceScraperTool`."""

url: str = Field(description='Product-detail URL or category / listing page URL to scrape.')
url_type: Literal['product', 'category'] = Field(
default='product',
description=(
'Type of page the URL points to: "product" for a product-detail page, '
'"category" for a category / listing page.'
),
)
max_results: int = Field(
default=_DEFAULT_ECOMMERCE_MAX_RESULTS, description='Maximum number of products to return.'
)


# ---------------------------------------------------------------------------
# Tools (US-4 Search & Crawling Actor tools)
# ---------------------------------------------------------------------------


class ApifyRAGWebBrowserTool(_ApifyGenericTool): # type: ignore[override]
"""Search the web and return content from top results.

Wraps the ``apify/rag-web-browser`` Actor. Unlike
:class:`ApifySearchRetriever` (which returns LangChain ``Document``
objects for RAG pipelines), this tool returns a JSON envelope
suitable for agent tool-calling.

Args:
apify_token: Apify API token. Falls back to the ``APIFY_TOKEN``
environment variable when *None*.

Returns:
JSON object ``{"run": {...}, "items": [{"url", "title", "content"}]}``.

Example:
.. code-block:: python

import os
os.environ["APIFY_TOKEN"] = "your-apify-token"

from langchain_apify import ApifyRAGWebBrowserTool

tool = ApifyRAGWebBrowserTool()
result = tool.invoke({"query": "what is LangChain?", "max_results": 3})
"""

name: str = 'apify_rag_web_browser'
description: str = (
'Search the web and return a JSON envelope with crawled results.'
' Each item has keys: url, title, content.'
' Required: query (str) - the search query.'
f' Optional: max_results (int, default {_DEFAULT_RAG_MAX_RESULTS}).'
' Returns keys: run, items.'
)
args_schema: type[BaseModel] = ApifyRAGWebBrowserInput

def _run(
self,
query: str,
max_results: int = _DEFAULT_RAG_MAX_RESULTS,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
run, items = self._client.rag_web_search(
query,
max_results=self._clamp_items(max_results),
timeout_secs=self.max_timeout_secs,
)
except RuntimeError as exc:
raise ToolException(str(exc)) from exc
results = [
{
'url': _extract_source(item),
'title': _safe_title(item),
'content': _extract_content(item),
}
for item in items
if isinstance(item, dict)
]
return json.dumps({'run': _run_meta(run), 'items': results}, default=str)


class ApifyGoogleMapsTool(_ApifyGenericTool): # type: ignore[override]
"""Search Google Maps for places, reviews, and business details.

Wraps the ``compass/crawler-google-places`` Actor.

Args:
apify_token: Apify API token. Falls back to the ``APIFY_TOKEN``
environment variable when *None*.

Returns:
JSON object ``{"run": {...}, "items": [...]}`` where ``run`` holds
``run_id``, ``status``, ``dataset_id``, ``started_at``, ``finished_at``
and ``items`` are place dicts.

Example:
.. code-block:: python

import os
os.environ["APIFY_TOKEN"] = "your-apify-token"

from langchain_apify import ApifyGoogleMapsTool

tool = ApifyGoogleMapsTool()
result = tool.invoke({"query": "coffee shops in Berlin", "max_results": 5})
"""

name: str = 'apify_google_maps'
description: str = (
'Search Google Maps places, reviews, and business details and return a JSON envelope.'
' Required: query (str) - the search query.'
f' Optional: max_results (int, default {_DEFAULT_GOOGLE_MAPS_MAX_RESULTS}),'
' language (str|null - ISO code, e.g. "en").'
' Returns keys: run, items.'
)
args_schema: type[BaseModel] = ApifyGoogleMapsInput

def _run(
self,
query: str,
max_results: int = _DEFAULT_GOOGLE_MAPS_MAX_RESULTS,
language: str | None = None,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
run, items = self._client.google_maps_search(
query,
max_results=self._clamp_items(max_results),
language=language,
timeout_secs=self.max_timeout_secs,
)
except RuntimeError as exc:
raise ToolException(str(exc)) from exc
return json.dumps({'run': _run_meta(run), 'items': items}, default=str)


class ApifyYouTubeScraperTool(_ApifyGenericTool): # type: ignore[override]
"""Scrape YouTube videos, channels, or search results.

Wraps the ``streamers/youtube-scraper`` Actor.

Args:
apify_token: Apify API token. Falls back to the ``APIFY_TOKEN``
environment variable when *None*.

Returns:
JSON object ``{"run": {...}, "items": [...]}`` where ``run`` holds
``run_id``, ``status``, ``dataset_id``, ``started_at``, ``finished_at``
and ``items`` are video / channel dicts.

Example:
.. code-block:: python

import os
os.environ["APIFY_TOKEN"] = "your-apify-token"

from langchain_apify import ApifyYouTubeScraperTool

tool = ApifyYouTubeScraperTool()
result = tool.invoke({
"search_query": "langchain tutorial",
"search_type": "search",
"max_results": 5,
})
"""

name: str = 'apify_youtube_scraper'
description: str = (
'Scrape YouTube by keyword, video URL, or channel URL and return a JSON envelope.'
' Required: search_query (str - keyword for "search" mode, or a video/channel URL).'
' Optional: search_type (one of "search", "video", "channel"; default "search"),'
f' max_results (int, default {_DEFAULT_YOUTUBE_MAX_RESULTS}).'
' Returns keys: run, items.'
)
args_schema: type[BaseModel] = ApifyYouTubeScraperInput

def _run(
self,
search_query: str,
search_type: Literal['search', 'video', 'channel'] = 'search',
max_results: int = _DEFAULT_YOUTUBE_MAX_RESULTS,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
run, items = self._client.youtube_scrape(
search_query=search_query,
search_type=search_type,
max_results=self._clamp_items(max_results),
timeout_secs=self.max_timeout_secs,
)
except (RuntimeError, ValueError) as exc:
raise ToolException(str(exc)) from exc
return json.dumps({'run': _run_meta(run), 'items': items}, default=str)


class ApifyEcommerceScraperTool(_ApifyGenericTool): # type: ignore[override]
"""Extract product or listing data from an e-commerce URL.

Wraps the ``apify/e-commerce-scraping-tool`` Actor.

Args:
apify_token: Apify API token. Falls back to the ``APIFY_TOKEN``
environment variable when *None*.

Returns:
JSON object ``{"run": {...}, "items": [...]}`` where ``run`` holds
``run_id``, ``status``, ``dataset_id``, ``started_at``, ``finished_at``
and ``items`` are product / listing dicts.

Example:
.. code-block:: python

import os
os.environ["APIFY_TOKEN"] = "your-apify-token"

from langchain_apify import ApifyEcommerceScraperTool

tool = ApifyEcommerceScraperTool()
result = tool.invoke({
"url": "https://shop.example.com/category/123",
"url_type": "category",
"max_results": 20,
})
"""

name: str = 'apify_ecommerce_scraper'
description: str = (
'Extract product data from an e-commerce URL and return a JSON envelope.'
' Required: url (str) - product-detail or category / listing URL.'
' Optional: url_type (one of "product", "category"; default "product"),'
f' max_results (int, default {_DEFAULT_ECOMMERCE_MAX_RESULTS}).'
' Returns keys: run, items.'
)
args_schema: type[BaseModel] = ApifyEcommerceScraperInput

def _run(
self,
url: str,
url_type: Literal['product', 'category'] = 'product',
max_results: int = _DEFAULT_ECOMMERCE_MAX_RESULTS,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
run, items = self._client.ecommerce_scrape(
url,
url_type=url_type,
max_results=self._clamp_items(max_results),
timeout_secs=self.max_timeout_secs,
)
except (RuntimeError, ValueError) as exc:
raise ToolException(str(exc)) from exc
return json.dumps({'run': _run_meta(run), 'items': items}, default=str)
Loading
Loading