The Website component is a data connector that allows users to scrape websites. It can carry out the following tasks:
#Release Stage
Alpha
#Configuration
The component configuration is defined and maintained here.
#Supported Tasks
#Scrape Website
Scrape the website contents.
| Input | ID | Type | Description | 
|---|---|---|---|
| Task ID (required) | task | string | TASK_SCRAPE_WEBSITE | 
| Query (required) | target_url | string | The root URL to scrape. All links on this page will be scraped, and all links on those pages, and so on. | 
| Allowed Domains | allowed_domains | array[string] | A list of domains that are allowed to be scraped. If empty, all domains are allowed. | 
| Max Number of Pages (required) | max_k | integer | The max number of pages to return. If the number is set to 0, all pages will be returned. If the number is set to a positive integer, at most max k pages will be returned. | 
| Include Link Text | include_link_text | boolean | Indicate whether to scrape the link and include the text of the link associated with this page in the 'link_text' field | 
| Include Link HTML | include_link_html | boolean | Indicate whether to scrape the link and include the raw HTML of the link associated with this page in the 'link_html' field | 
| Output | ID | Type | Description | 
|---|---|---|---|
| Pages | pages | array[object] | The scraped webpages |