cloudflare · ToriLindsay · Oct 30, 2025 · Oct 30, 2025 · Oct 31, 2025 · Nov 3, 2025
@@ -0,0 +1,207 @@
+---
+pcx_content_type: how-to
+title: /crawl - Crawl web content
+sidebar:
+  order: 11
+---
+
+import { Render } from "~/components";
+
+The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to a specified number or depth of links. The response can be returned in either HTML, Markdown, or JSON.
+
+The `/crawl` endpoint respects the directives of `robots.txt` files, including `crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
+
+## Endpoint
+
+```txt
+https://api.cloudflare.com/client/v4/accounts/<account_id>/browser-rendering/crawl
+```
+
+## Required fields
+You must provide `url`:
+-  `url` (string)
+
+## Common use cases
+
+- Scraping online content to build a knowledge base of up-to-date information
+- Converting online content into LLM-friendly formats to train [Retrieval-Augmented Generation (RAG) applications](/reference-architecture/diagrams/ai/ai-rag/) and other AI systems
+
+## Basic usage
+
+There are two separate steps to the `/crawl` endpoint:
+1. [Initiate the crawl job](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) — A `POST` request where you initiate the crawl and receive a response with a job `id`.
+2. [Request results of the crawl job](browser-rendering/rest-api/crawl-endpoint/#request-results-of-the-crawl-job) — A `GET` request where you request the status or results of the crawl.
+
+:::note[Free plan limitation]
+If you are on a Workers Free plan, your crawl may fail if it hits the [limit of 10 minutes per day](/browser-rendering/platform/pricing/). To avoid this, you can either [upgrade to a Workers Paid plan](/workers/platform/pricing/) or you can [put limitations on timeouts](/browser-rendering/reference/timeouts/) to get the most out of the 10 minutes of your crawl request.
+:::
+
+### Initiate the crawl job
+
+Here are the basic parameters you can use to initiate your crawl job:
+- `url` — (Required) Starts crawling from this URL
+- `limit` — (Optional) Maximum number of pages to crawl (default is 10, maximum is 100,000)
+- `depth` — (Optional) Maximum link depth to crawl from the starting URL
+- `formats` — (Optional) Response format (default is HTML, other options are Markdown and JSON)
+
+The API will respond immediately with a job `id` you will use to retrieve the status and results of the crawl job.
+
+See the [advanced usage section below](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) for additional parameters.
+
+Here is an example that uses the basic parameters:
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+
+    "url": "https://developers.cloudflare.com/workers/",
+
+    "limit": 50,
+
+    "depth": 2,
+
+    "formats": ["markdown"]
+  }'
+```
+
+Here is an example of the response, which includes a job `id`:
+
+```json output
+{
+  "result": {
+    "id": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e"
+  },
+  "success": true
+}
+```
+
+### Request results of the crawl job
+
+Here is an example of how you would check the status or request the results of your crawl job with the job `id` you were provided:
+
+```bash
+curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
+  -H 'Authorization: Bearer YOUR_API_TOKEN'
+```
+
+Here is an example response:
+
+```json output
+
+{
+  "result": {
+    "id": "c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e",
+    "status": "complete",
+    "browserTimeSpent": 134.7,
+    "total": 50,
+    "completed": 50,
+    "entries": [
+      {
+        "url": "[https://developers.cloudflare.com/workers/](https://developers.cloudflare.com/workers/)",
+        "status": "completed",
+        "markdown": "# Cloudflare Workers\nBuild and deploy serverless applications...",
+        "html": null,
+        "metadata": {
+          "title": "Cloudflare Workers · Cloudflare Workers docs",
+          "language": "en-US"
+        }
+      },
+      {
+        "url": "[https://developers.cloudflare.com/workers/get-started/quickstarts/](https://developers.cloudflare.com/workers/get-started/quickstarts/)",
+        "status": "completed",
+        "markdown": "## Quickstarts\nGet up and running with a simple 'Hello World'...",
+        "html": null,
+        "metadata": {
+          "title": "Quickstarts · Cloudflare Workers docs",
+          "language": "en-US"
+        }
+      }
+      // ... 48 more entries omitted for brevity
+    ]
+  },
+  "success": true
+}
+```
+
+### Cancel a crawl job
+
+If you need to cancel a job that is currently in progress, here is an example of how to cancel a crawl job with the job `id` you were provided:
+
+```bash
+curl -X DELETE 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
+  -H 'Authorization: Bearer YOUR_API_TOKEN'
+```
+
+A successful cancellation will return a `200 OK` status code. The job status will be updated to cancelled.
+
+## Advanced usage
+
+The `/crawl` endpoint has many parameters you can use to customize your crawl. For the full list, check the [API docs[(https://developers.cloudflare.com/api/resources/browser_rendering/).
+
+Here is an example that uses the additional parameters that are currently available, in addition to the [basic parameters shown in the example above](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) and the [render parameter below](/browser-rendering/rest-api/crawl-endpoint/#render-a-simple-html-fetch):
+
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+
+		// Required: The URL to start crawling from
+    "url": "https://www.exampledocs.com/docs/",
+
+    // Optional: The maximum age of a cached resource that can be returned (in seconds)
+    "maxAge": 7200,
+
+    "options": {
+
+		  // Optional: If true, follows links to external domains (default is false)
+      "includeExternalLinks": true,
+
+			// Optional: If true, follows links to subdomains of the starting URL (default is false)
+      "includeSubdomains": true,
+
+			// Optional: Only visits URLs that match one of these patterns
+      "includePatterns": [
+        ".*/api/v1/.*"
+      ],
+
+			// Optional: Does not visit URLs that match any of these patterns
+      "excludePatterns": [
+        ".*/learning-paths/.*"
+      ]
+    }
+```
+
+### Choose when to render JavaScript
+
+Use the `render` parameter to control whether the `crawl` endpoint spins up a headless browser and executes page JavaScript. The default is `render: true`. Set `render: false` to do a fast HTML fetch without executing JavaScript.
+
+Use `render: true` when the page builds content in the browser. Use `render: false` when the content you need is already in the initial HTML response.
+
+Crawls with the `render: false` are billed under [Workers pricing](/workers/platform/pricing/), Crawls with `render: true` use a headless browser and are billed under typical Browser Rendering pricing.
+
+Here is an example of a request that uses the `render` parameter:
+
+```bash
+curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
+  -H 'Authorization: Bearer <apiToken>' \
+  -H 'Content-Type: application/json' \
+  -d '{
+		// Required: The URL to start crawling from
+    "url": "https://developers.cloudflare.com/workers/",
+
+		//Optional: If false, only does a simple HTML fetch crawl (default is true)
+    "render": false
+  }'
+```
+
+<Render
+  file="setting-custom-user-agent"
+  product="browser-rendering"
+/>
+
+<Render
+  file="faq"
+  product="browser-rendering"
+/>