[Browser Rendering] Crawl endpoint

ToriLindsay · ToriLindsay · commit abd8379f565b · 2025-10-30T10:37:53.000Z
diff --git a/src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx b/src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx
@@ -0,0 +1,196 @@
+---
+pcx_content_type: how-to
+title: /crawl - Crawl web content
+sidebar:
+  order: 11
+---
+
+import { Render } from "~/components";
+
+The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to your specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
+
+The `/crawl` endpoint respects the directives of `robots.txt` files, such as `crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
+
+## Endpoint
+
+```txt
+https://api.cloudflare.com/client/v4/accounts/<account_id>/browser-rendering/crawl
+```
+
+## Required fields
+You must provide `url`:
+-  `url` (string)
+
+## Common use cases
+
+- Scraping online content to build a knowledge base of up-to-date information
+- Converting online content into LLM-friendly formats to train Retrieval-Augmented Generation (RAG) applications and other AI systems
+
+## Basic usage
+
+Since the `/crawl` endpoint takes some time to process, it is split into two requests:
+1. A `POST` request where you initiate the crawl and receive a response with a `job_id`.
+2. A `GET` request where you request the status or results of the crawl.
+
+:::note[Free plan limitation]
+If you are on a Workers Free plan, your crawl may fail if it hits the [limit of 10 minutes per day](/browser-rendering/platform/pricing/). To avoid this, you can either [upgrade to a Workers Paid plan](/workers/platform/pricing/) or you can [put limitations on timeouts](/browser-rendering/reference/timeouts/) to get the most out of the 10 minutes of your crawl request.
+:::
+
+### Initiate the crawl job
+
+text
+
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+
+		// Required: Starts crawling from this URL
+    "url": "https://developers.cloudflare.com/workers/",
+
+		// Optional: Maximum number of pages to crawl (default is 10, maximum is 100,000)
+    "limit": 50,
+
+		// Optional: Maximum link depth to crawl from the starting URL
+    "depth": 2,
+
+		// Optional: Response format (default is HTML, other options are Markdown and JSON)
+    "formats": ["markdown"]
+  }'
+```
+
+text
+
+```json output
+{
+  "result": {
+    "id": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e"
+  },
+  "success": true
+}
+```
+
+### Request results of the crawl job
+
+text
+
+```bash
+curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
+  -H 'Authorization: Bearer YOUR_API_TOKEN'
+```
+
+text
+
+```json output
+
+{
+  "result": {
+    "id": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e",
+    "status": "complete",
+    "browserTimeSpent": 134.7,
+    "total": 50,
+    "completed": 50,
+    "entries": [
+      {
+        "url": "[https://developers.cloudflare.com/workers/](https://developers.cloudflare.com/workers/)",
+        "status": "completed",
+        "markdown": "# Cloudflare Workers\nBuild and deploy serverless applications...",
+        "html": null,
+        "metadata": {
+          "title": "Cloudflare Workers · Cloudflare Workers docs",
+          "language": "en-US"
+        }
+      },
+      {
+        "url": "[https://developers.cloudflare.com/workers/get-started/quickstarts/](https://developers.cloudflare.com/workers/get-started/quickstarts/)",
+        "status": "completed",
+        "markdown": "## Quickstarts\nGet up and running with a simple 'Hello World'...",
+        "html": null,
+        "metadata": {
+          "title": "Quickstarts · Cloudflare Workers docs",
+          "language": "en-US"
+        }
+      }
+    ]
+  },
+  "success": true
+}
+```
+
+### Cancel a crawl job
+
+text
+
+```bash
+curl -X DELETE 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
+  -H 'Authorization: Bearer YOUR_API_TOKEN'
+```
+
+A successful cancellation will return a 200 OK status code. The job status will be updated to cancelled.
+
+## Advanced usage
+
+:::note[Looking for more parameters?]
+Visit the [Browser Rendering PDF API reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters.
+:::
+
+Here are...
+
+text
+
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+
+		// Required: The URL to start crawling from
+    "url": "https://www.exampledocs.com/docs/",
+
+    // Optional: The maximum age of a cached resource that can be returned (in seconds)
+    "maxAge": 7200,
+
+    "options": {
+
+		  // Optional: If true, follows links to external domains (default is false)
+      "includeExternalLinks": true,
+
+			// Optional: If true, follows links to subdomains of the starting URL (default is false)
+      "includeSubdomains": true,
+
+			// Optional: Only visits URLs that match one of these patterns
+      "includePatterns": [
+        ".*/api/v1/.*"
+      ],
+
+			// Optional: Does not visit URLs that match any of these patterns
+      "excludePatterns": [
+        ".*/learning-paths/.*"
+      ]
+    }
+```
+
+### Render
+
+text
+
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "url": "https://developers.cloudflare.com/workers/",
+    "render": false
+  }'
+```
+
+<Render
+  file="setting-custom-user-agent"
+  product="browser-rendering"
+/>
+
+<Render
+  file="faq"
+  product="browser-rendering"
+/>