You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx
+18-19Lines changed: 18 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ sidebar:
7
7
8
8
import { Render } from"~/components";
9
9
10
-
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to your specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
10
+
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to a specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
11
11
12
12
The `/crawl` endpoint respects the directives of `robots.txt` files, such as `crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
13
13
@@ -28,17 +28,17 @@ You must provide `url`:
28
28
29
29
## Basic usage
30
30
31
-
Since the `/crawl` endpoint takes some time to process, it is split into two requests:
32
-
1. A `POST` request where you initiate the crawl and receive a response with a `job_id`.
33
-
2. A `GET` request where you request the status or results of the crawl.
31
+
Since the `/crawl` endpoint takes some time to process there are two separate steps:
32
+
1.[Initiate the crawl job](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) — A `POST` request where you initiate the crawl and receive a response with a job `id`.
33
+
2.[Request results of the crawl job](browser-rendering/rest-api/crawl-endpoint/#request-results-of-the-crawl-job) — A `GET` request where you request the status or results of the crawl.
34
34
35
35
:::note[Free plan limitation]
36
36
If you are on a Workers Free plan, your crawl may fail if it hits the [limit of 10 minutes per day](/browser-rendering/platform/pricing/). To avoid this, you can either [upgrade to a Workers Paid plan](/workers/platform/pricing/) or you can [put limitations on timeouts](/browser-rendering/reference/timeouts/) to get the most out of the 10 minutes of your crawl request.
37
37
:::
38
38
39
39
### Initiate the crawl job
40
40
41
-
text
41
+
Here is an example of how to initiate a crawl job with `url`, `limit`, `depth`, and `formats` parameters. See the [advanced usage section below](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) for additional parameters:
42
42
43
43
```bash
44
44
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -60,7 +60,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser
60
60
}'
61
61
```
62
62
63
-
text
63
+
Here is an example of the response, which includes a job `id`:
64
64
65
65
```json output
66
66
{
@@ -73,14 +73,14 @@ text
73
73
74
74
### Request results of the crawl job
75
75
76
-
text
76
+
Here is an example of how you would check the status or request the results of your crawl job with the job `id` you were provided:
77
77
78
78
```bash
79
79
curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
80
80
-H 'Authorization: Bearer YOUR_API_TOKEN'
81
81
```
82
82
83
-
text
83
+
Here is an example response:
84
84
85
85
```json output
86
86
@@ -120,24 +120,18 @@ text
120
120
121
121
### Cancel a crawl job
122
122
123
-
text
123
+
Here is an example of how to cancel a crawl job with the job `id` you were provided:
A successful cancellation will return a 200 OK status code. The job status will be updated to cancelled.
130
+
A successful cancellation will return a `200 OK` status code. The job status will be updated to cancelled.
131
131
132
132
## Advanced usage
133
133
134
-
:::note[Looking for more parameters?]
135
-
Visit the [Browser Rendering PDF API reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters.
136
-
:::
137
-
138
-
Here are...
139
-
140
-
text
134
+
The `/crawl` endpoint has many parameters you can use to customize your crawl. Here is an example that uses the additional parameters that are currently available, in addition to the [basic parameters shown in the example above](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) and the [render parameter below](/browser-rendering/rest-api/crawl-endpoint/#render-a-simple-html-fetch):
141
135
142
136
```bash
143
137
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -171,16 +165,21 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser
171
165
}
172
166
```
173
167
174
-
### Render
168
+
### Render a simple HTML fetch
175
169
176
-
text
170
+
With the `render` parameter, you have the option to use the `/crawl` endpoint do a simple HTML fetch crawl. This is best for crawls that you want completed quickly, when spinning up a full headless browser instance is not necessary. Crawls with the `render` parameter set as `false` are only charged according to [Workers pricing](/workers/platform/pricing/) and not Browser Rendering pricing.
171
+
172
+
Here is an example of a request that uses the `render` parameter:
177
173
178
174
```bash
179
175
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
0 commit comments