Skip to content

Commit 7f9a35f

Browse files
committed
First draft
1 parent abd8379 commit 7f9a35f

File tree

1 file changed

+18
-19
lines changed

1 file changed

+18
-19
lines changed

src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar:
77

88
import { Render } from "~/components";
99

10-
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to your specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
10+
The `/crawl` endpoint automates the process of scraping content from webpages starting with a single URL and crawling to a specified depth of links. The response can be returned in either HTML, Markdown, or JSON.
1111

1212
The `/crawl` endpoint respects the directives of `robots.txt` files, such as `crawl-delay` and [`content-signal`](https://contentsignals.org/). All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`.
1313

@@ -28,17 +28,17 @@ You must provide `url`:
2828

2929
## Basic usage
3030

31-
Since the `/crawl` endpoint takes some time to process, it is split into two requests:
32-
1. A `POST` request where you initiate the crawl and receive a response with a `job_id`.
33-
2. A `GET` request where you request the status or results of the crawl.
31+
Since the `/crawl` endpoint takes some time to process there are two separate steps:
32+
1. [Initiate the crawl job](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job)A `POST` request where you initiate the crawl and receive a response with a job `id`.
33+
2. [Request results of the crawl job](browser-rendering/rest-api/crawl-endpoint/#request-results-of-the-crawl-job)A `GET` request where you request the status or results of the crawl.
3434

3535
:::note[Free plan limitation]
3636
If you are on a Workers Free plan, your crawl may fail if it hits the [limit of 10 minutes per day](/browser-rendering/platform/pricing/). To avoid this, you can either [upgrade to a Workers Paid plan](/workers/platform/pricing/) or you can [put limitations on timeouts](/browser-rendering/reference/timeouts/) to get the most out of the 10 minutes of your crawl request.
3737
:::
3838

3939
### Initiate the crawl job
4040

41-
text
41+
Here is an example of how to initiate a crawl job with `url`, `limit`, `depth`, and `formats` parameters. See the [advanced usage section below](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) for additional parameters:
4242

4343
```bash
4444
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -60,7 +60,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser
6060
}'
6161
```
6262

63-
text
63+
Here is an example of the response, which includes a job `id`:
6464

6565
```json output
6666
{
@@ -73,14 +73,14 @@ text
7373

7474
### Request results of the crawl job
7575

76-
text
76+
Here is an example of how you would check the status or request the results of your crawl job with the job `id` you were provided:
7777

7878
```bash
7979
curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
8080
-H 'Authorization: Bearer YOUR_API_TOKEN'
8181
```
8282

83-
text
83+
Here is an example response:
8484

8585
```json output
8686

@@ -120,24 +120,18 @@ text
120120

121121
### Cancel a crawl job
122122

123-
text
123+
Here is an example of how to cancel a crawl job with the job `id` you were provided:
124124

125125
```bash
126126
curl -X DELETE 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/result/c7f8s2d9-a8e7-4b6e-8e4d-3d4a1b2c3f4e' \
127127
-H 'Authorization: Bearer YOUR_API_TOKEN'
128128
```
129129

130-
A successful cancellation will return a 200 OK status code. The job status will be updated to cancelled.
130+
A successful cancellation will return a `200 OK` status code. The job status will be updated to cancelled.
131131

132132
## Advanced usage
133133

134-
:::note[Looking for more parameters?]
135-
Visit the [Browser Rendering PDF API reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters.
136-
:::
137-
138-
Here are...
139-
140-
text
134+
The `/crawl` endpoint has many parameters you can use to customize your crawl. Here is an example that uses the additional parameters that are currently available, in addition to the [basic parameters shown in the example above](/browser-rendering/rest-api/crawl-endpoint/#initiate-the-crawl-job) and the [render parameter below](/browser-rendering/rest-api/crawl-endpoint/#render-a-simple-html-fetch):
141135

142136
```bash
143137
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
@@ -171,16 +165,21 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser
171165
}
172166
```
173167
174-
### Render
168+
### Render a simple HTML fetch
175169
176-
text
170+
With the `render` parameter, you have the option to use the `/crawl` endpoint do a simple HTML fetch crawl. This is best for crawls that you want completed quickly, when spinning up a full headless browser instance is not necessary. Crawls with the `render` parameter set as `false` are only charged according to [Workers pricing](/workers/platform/pricing/) and not Browser Rendering pricing.
171+
172+
Here is an example of a request that uses the `render` parameter:
177173
178174
```bash
179175
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
180176
-H 'Authorization: Bearer <apiToken>' \
181177
-H 'Content-Type: application/json' \
182178
-d '{
179+
// Required: The URL to start crawling from
183180
"url": "https://developers.cloudflare.com/workers/",
181+
182+
//Optional: If false, only does a simple HTML fetch crawl (default is true)
184183
"render": false
185184
}'
186185
```

0 commit comments

Comments
 (0)