You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx
+17-9Lines changed: 17 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,9 @@ sidebar:
7
7
8
8
import { Render } from"~/components";
9
9
10
-
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to a specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
10
+
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to a specified number or depth of links. The response can be returned in either HTML, Markdown, or JSON.
11
11
12
-
The `/crawl` endpoint respects the directives of `robots.txt` files, such as`crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
12
+
The `/crawl` endpoint respects the directives of `robots.txt` files, including`crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
13
13
14
14
## Endpoint
15
15
@@ -24,11 +24,11 @@ You must provide `url`:
24
24
## Common use cases
25
25
26
26
- Scraping online content to build a knowledge base of up-to-date information
27
-
- Converting online content into LLM-friendly formats to train Retrieval-Augmented Generation (RAG) applications and other AI systems
27
+
- Converting online content into LLM-friendly formats to train [Retrieval-Augmented Generation (RAG) applications](/reference-architecture/diagrams/ai/ai-rag/) and other AI systems
28
28
29
29
## Basic usage
30
30
31
-
Since the `/crawl` endpoint takes some time to process there are two separate steps:
31
+
There are two separate steps to the `/crawl` endpoint:
32
32
1.[Initiate the crawl job](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) — A `POST` request where you initiate the crawl and receive a response with a job `id`.
33
33
2.[Request results of the crawl job](browser-rendering/rest-api/crawl-endpoint/#request-results-of-the-crawl-job) — A `GET` request where you request the status or results of the crawl.
34
34
@@ -38,7 +38,9 @@ If you are on a Workers Free plan, your crawl may fail if it hits the [limit of
38
38
39
39
### Initiate the crawl job
40
40
41
-
Here is an example of how to initiate a crawl job with `url`, `limit`, `depth`, and `formats` parameters. See the [advanced usage section below](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) for additional parameters:
41
+
Here is an example of how to initiate a crawl job with `url`, `limit`, `depth`, and `formats` parameters. The API will respond immediately with a job `id` you will use to retriece the status and results of the crawl job.
42
+
43
+
See the [advanced usage section below](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) for additional parameters:
42
44
43
45
```bash
44
46
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -120,7 +122,7 @@ Here is an example response:
120
122
121
123
### Cancel a crawl job
122
124
123
-
Here is an example of how to cancel a crawl job with the job `id` you were provided:
125
+
If you need to cancel a job that is currently in progress, here is an example of how to cancel a crawl job with the job `id` you were provided:
@@ -131,7 +133,9 @@ A successful cancellation will return a `200 OK` status code. The job status wil
131
133
132
134
## Advanced usage
133
135
134
-
The `/crawl` endpoint has many parameters you can use to customize your crawl. Here is an example that uses the additional parameters that are currently available, in addition to the [basic parameters shown in the example above](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) and the [render parameter below](/browser-rendering/rest-api/crawl-endpoint/#render-a-simple-html-fetch):
136
+
The `/crawl` endpoint has many parameters you can use to customize your crawl. For the full list, check the [API docs[(https://developers.cloudflare.com/api/resources/browser_rendering/).
137
+
138
+
Here is an example that uses the additional parameters that are currently available, in addition to the [basic parameters shown in the example above](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) and the [render parameter below](/browser-rendering/rest-api/crawl-endpoint/#render-a-simple-html-fetch):
135
139
136
140
```bash
137
141
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -165,9 +169,13 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser
165
169
}
166
170
```
167
171
168
-
### Render a simple HTML fetch
172
+
### Choose when to render JavaScript
173
+
174
+
Use the `render` parameter to control whether the `crawl` endpoint spins up a headless browser and executes page JavaScript. The default is `render: true`. Set `render: false` to do a fast HTML fetch without executing JavaScript.
175
+
176
+
Use `render: true` when the page builds content in the browser. Use `render: false` when the content you need is already in the initial HTML response.
169
177
170
-
With the `render` parameter, you have the option to use the `/crawl` endpoint do a simple HTML fetch crawl. This is best for crawls that you want completed quickly, when spinning up a full headless browser instance is not necessary. Crawls with the `render` parameter set as `false` are only charged according to [Workers pricing](/workers/platform/pricing/)and not Browser Rendering pricing.
178
+
Crawls with the `render: false` are billed under [Workers pricing](/workers/platform/pricing/), Crawls with `render: true` use a headless browser and are billed under typical Browser Rendering pricing.
171
179
172
180
Here is an example of a request that uses the `render` parameter:
0 commit comments