-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Describe the Bug
When using Firecrawl's WebSocket API with startCrawl() and watcher(), there is an intermittent timing issue where the done event is fired before any document events are received, resulting in empty data arrays and missing document events. This causes the final status to show completed with data: [] and total: N/A, even though the crawl may have actually processed pages successfully. The problem does not occur consistently - sometimes the WebSocket events work correctly, but other times the timing issue manifests.
To Reproduce
Steps to reproduce the issue:
- Install Firecrawl JS SDK:
npm install @mendable/firecrawl-js@4.4.1 - Create a script that uses WebSocket monitoring:
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-api-key' });
async function webSocketBugDemo() {
const { id } = await firecrawl.startCrawl('https://www.firecrawl.dev/', {
limit: 3,
});
console.log(`✅ Crawl started! Job ID: ${id}`);
const watcher = firecrawl.watcher(id, {
kind: 'crawl',
pollInterval: 2,
timeout: 60
});
let documentCount = 0;
let doneEventReceived = false;
return new Promise((resolve, reject) => {
watcher.on('document', (doc) => {
documentCount++;
console.log(`📄 DOCUMENT EVENT #${documentCount}: ${doc.metadata?.title || 'Processing...'}`);
});
watcher.on('error', (err) => {
console.error('❌ WebSocket error:', err?.error || err);
reject(err);
});
watcher.on('done', (state) => {
doneEventReceived = true;
console.log('\n🔍 BUG DEMONSTRATION:');
console.log(` Documents received via WebSocket: ${documentCount}`);
console.log(` Data length in 'done' event: ${state.data?.length || 0}`);
console.log(` Status: ${state.status}`);
console.log(` Total: ${state.total || 'N/A'}`);
if (documentCount === 0 && state.data?.length === 0) {
console.log('\n⚠️ BUG CONFIRMED: No documents received via WebSocket AND no data in final status');
console.log(' This indicates a timing issue where "done" event fires before "document" events');
} else if (documentCount > 0 && state.data?.length === 0) {
console.log('\n⚠️ PARTIAL BUG: Documents received via WebSocket but no data in final status');
} else {
console.log('\n✅ No bug detected in this run');
}
resolve(state);
});
watcher.start().catch(reject);
});
}
webSocketBugDemo().then(() => {
console.log('\n🎉 Bug demonstration completed!');
process.exit(0);
}).catch(error => {
console.error('💥 Error:', error);
process.exit(1);
});
- Run the script multiple times and observe that sometimes done event fires with empty data before any document events.
- Log output shows inconsistent behavior: sometimes
Documents received: 0,Data length: 0, other times it works correctly
Expected Behavior
The WebSocket should consistently emit document events as pages are processed, and only emit the done event after all documents have been processed and the crawl is truly complete. The final status should always contain the actual processed data.
Screenshots
Script output example when bug occurs:
🚀 Demonstrating WebSocket event timing bug...
✅ Crawl started! Job ID: c4abf434-8384-4df4-b436-af486d18e77e
🔍 BUG DEMONSTRATION:
Documents received via WebSocket: 0
Data length in 'done' event: 0
Status: completed
Total: N/A
⚠️ BUG CONFIRMED: No documents received via WebSocket AND no data in final status
This indicates a timing issue where "done" event fires before "document" events
Environment (please complete the following information):
- OS: Linux (Amazon Linux 2023)
- Deployment Type: Cloud (firecrawl.dev)
- Firecrawl Version: 4.4.1
- Node.js Version: 22.x