-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: Add SerpexWebSearch component for multi-engine web search #9937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add SerpexWebSearch component for multi-engine web search #9937
Conversation
- Add SerpexWebSearch fetcher component supporting Google, Bing, DuckDuckGo, Brave, Yahoo, and Yandex - Implement automatic retry logic with exponential backoff - Add comprehensive test suite with unit tests and integration tests - Support configurable search engines, result counts, and time range filtering - Return results as Haystack Document objects with rich metadata - Include release notes for new component
- Change 'organic_results' to 'results' (actual API response field) - Change 'link' to 'url' for result URLs (actual API response field) - Update test fixtures to match real API response structure - Verified with live API testing using real SERPEX API key
|
@divyeshradadiya is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
hey @divyeshradadiya thanks for opening the PR! I'll be giving this a review this week. Some high-level comments I already wanted to leave are:
|
| @pytest.mark.integration | ||
| @pytest.mark.skipif(not os.environ.get("SERPEX_API_KEY"), reason="SERPEX_API_KEY not set") | ||
| def test_run_with_different_engines(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to ask if you were able to run these integration tests and check that they pass?
…nd time_range parameters
|
@divyeshradadiya could you run |
|
Hi @divyeshradadiya, thanks a lot for your contribution and for working on the SerpexWebSearch component. After discussing this internally, we believe this addition would be better suited as a separate Haystack integration rather than being included directly in the core repository. Hosting it in your own repository will give you full control over development, updates, and releases, while still allowing the community to benefit from your work. We’d be happy to add it to our integrations page and help promote it so it gets visibility among Haystack users. If you’d like, we can share examples and guidance on how other integrations are structured. |
|
@sjrl Sure! Could you please provide the documentation URL or implementation process? |
Sure! Here is how to request your integration to be listed in our website. Here is checklist for creating an integration and here is an example repo and integration page of a recently added integration. |
Add SerpexWebSearch Component for Multi-Engine Web Search Integration
Description
This PR introduces a new
SerpexWebSearchcomponent to Haystack's fetchers module, enabling seamless integration with the SERPEX API for fetching organic web search results from multiple search engines.What does it do?
The
SerpexWebSearchcomponent:Documentobjects with rich metadata (title, URL, position, snippet)to_dict/from_dict)Why is it needed?
Web search is a critical capability for RAG (Retrieval-Augmented Generation) pipelines and AI applications that need to ground responses with current, up-to-date information. This component:
Changes Made
New Files Added
haystack/components/fetchers/serpex.py(203 lines)SerpexWebSearchcomponent class decorated with@componentrun()method returningList[Document]to_dict()andfrom_dict()for serializationtest/components/fetchers/test_serpex.py(280 lines)SERPEX_API_KEYenvironment variable)releasenotes/notes/add-serpex-web-search-fetcher-a1b2c3d4e5f6g7h8.yamlModified Files
haystack/components/fetchers/__init__.pySerpexWebSearchto exports_import_structuredictionaryHow did you test it?
Unit Tests
Results:
Integration Tests
Tested with real SERPEX API using provided API key:
Test Scenarios - All Passing ✅
Basic Google Search
Haystack Framework Search
Multi-Engine Support (DuckDuckGo)
Time Range Filtering
Technical Query
Manual Verification
✅ API Endpoint:
https://api.serpex.dev/api/search✅ Authentication: Bearer token correctly formatted
✅ Response Parsing: Correctly handles
resultsarray✅ Document Structure: All required metadata fields present
✅ Error Handling: Proper exceptions raised and logged
✅ Resource Cleanup:
__del__method properly closes HTTP clientCode Quality Checks
✅ All checks passing
✅ No syntax errors
✅ Type hints complete
✅ Code style compliant
Implementation Details
API Integration
Pipeline Integration
Component Parameters
Initialization:
api_key(str, required): SERPEX API key from https://serpex.devengine(str, optional): Default search engine - "auto", "google", "bing", "duckduckgo", "brave", "yahoo", "yandex" (default: "google")num_results(int, optional): Number of results (default: 10)timeout(int, optional): Request timeout in seconds (default: 10)retry_attempts(int, optional): Retry attempts for failed requests (default: 2)Run Method:
query(str, required): Search queryengine(str, optional): Override default enginenum_results(int, optional): Override result counttime_range(str, optional): Filter by time - "all", "day", "week", "month", "year"Output:
Dict[str, List[Document]]with key "documents"Notes for the Reviewer
Architecture
The component follows Haystack's established patterns:
@componentdecorator for framework integrationto_dict()/from_dict()for serialization@component.output_types()for output specificationDependency Analysis
No new external dependencies added:
Testing Coverage
Performance Considerations
Security
Backwards Compatibility
✅ No breaking changes to existing Haystack APIs
✅ New component is additive only
✅ Follows existing fetcher patterns (LinkContentFetcher)
Checklist
feat:addedRelated Issues
Enables web search integration requested in community for RAG pipeline support.
Commits
feat: Add SerpexWebSearch component for multi-engine web search (b74f358)
fix: Correct SERPEX API response field names (fab6ed4)
resultsinstead oforganic_results,urlinstead oflink)Screenshots / Demo
Test Results
Example Output
Ready for merge! ✅ All tests passing, fully documented, production-ready.