Scrape API /scrape
Low-level endpoint that reuses the same browser + proxy infra as the agent, but returns raw page text and accessibility trees. No planner, no tools—just data for your own models and pipelines.
Infra-Only Credits
No model/tool credits—just browser + proxy costs for maximum efficiency.
Raw Page Data
Get extracted text, accessibility trees, and element link records.
Composable Output
Feed results directly into your own LLM/RAG pipelines.
Scrape API Playground
POST/scrapeLow-level endpoint for raw page text + accessibility tree.
https://api.rtrvr.aiUse /scrape for raw page data and /agent for full agent runs.
Use your API key in the Authorization header:
Authorization: Bearer rtrvr_your_api_keyhttps://api.rtrvr.ai/scrapeUse /agent when you want the full planner + tools engine, and /scrape when you just need raw page text + structure for your own models.
Open one or more URLs in our browser cluster and get back extracted text, the accessibility tree, and link metadata. The endpoint is designed to be:
- Cheap – infra-only credits (browser + proxy), no model usage.
- Predictable – stable schema for tab content + usage metrics.
- Composable – plug the result into your own LLM/RAG pipeline.
curl -X POST https://api.rtrvr.ai/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/blog/ai-trends-2025"]
}'Each scrape uses a unified UserSettings profile stored in the cloud.
interface UserSettings {
extractionConfig: {
maxParallelTabs?: number;
pageLoadDelay?: number;
makeNewTabsActive?: boolean;
writeRowProcessingTime?: boolean;
disableAutoScroll?: boolean;
/**
* When true, only text content is returned from scrapes.
* The accessibility tree + elementLinkRecord are omitted.
*/
onlyTextContent?: boolean;
};
// Proxy Configuration
proxyConfig: {
mode: 'none' | 'custom' | 'default' | 'device';
customProxies: ProxySettings[];
selectedProxyId?: string;
selectedDeviceId?: string;
};
}Two ways to control behavior:
- 1. Cloud profile: configure defaults in Cloud → Settings.
- 2. Per-request overrides: send
settingsin your request body.
The request body is an ScrapeApiRequest:
interface ScrapeApiRequest {
/**
* Optional stable id if you want to tie multiple scrapes together.
* Mostly useful for analytics/observability on your side.
*/
trajectoryId?: string;
/**
* One or more absolute URLs to load in the browser.
* Must be a non-empty array of non-empty strings.
*/
urls: string[];
/**
* Optional per-request settings override.
* Merged on top of the stored UserSettings profile (proxyConfig, extraction, etc.).
*
* Use extraction-related settings if you only want text content and don't need
* the accessibility tree + elementLinkRecord.
*/
settings?: Partial<UserSettings>;
/**
* Response size control for API callers.
*/
response?: {
/**
* Max bytes allowed for the inline JSON response.
* If the full response exceeds this, the full payload is stored in object storage
* and a StorageReference is returned under metadata.responseRef.
* Default: 1MB (1048576 bytes)
*/
inlineOutputMaxBytes?: number;
};
}Parameters
urlsstring[]requiredOne or more absolute URLs to scrape. Must be a non-empty array.
trajectoryIdstringOptional stable id for grouping scrapes together (analytics, observability).
settingsPartial<UserSettings>Optional per-request override merged on top of your cloud UserSettings profile.
response.inlineOutputMaxBytesnumberdefault: 1048576Maximum inline response size in bytes (default 1MB).
The API response is an ScrapeApiResponse:
interface ScrapedTab {
tabId: number;
url: string;
title: string;
contentType: string;
status: "success" | "error";
error?: string;
/**
* Full extracted visible text (when available).
*/
content?: string;
/**
* JSON-encoded accessibility tree (stringified).
* Use this if you want a rich, structured view of the page for your own models.
* Every link node in the tree has a numeric 'id' field which is used as the key
* in elementLinkRecord.
*/
tree?: string;
/**
* Map of accessibility-tree element id -> href/URL for link elements.
* Only present when 'tree' is present.
*/
elementLinkRecord?: Record<number, string>;
}
interface ScrapeUsageData {
totalCredits: number;
browserCredits: number;
proxyCredits: number;
totalUsd: number;
requestDurationMs: number;
proxyPageLoads: number;
proxyTabsDataFetches: number;
usingBillableProxy: boolean;
}
interface ScrapeApiResponse {
success: boolean;
status: "success" | "error";
trajectoryId: string;
tabs?: ScrapedTab[];
usageData: ScrapeUsageData;
metadata?: {
inlineOutputMaxBytes: number;
durationMs: number;
outputTooLarge?: boolean;
responseRef?: StorageReference;
};
error?: string;
}Tabs & content
tabsScrapedTab[]One tab per URL, in the same order as the input urls.
tabs[].contentstringFull extracted visible text when available.
tabs[].treestringJSON-encoded accessibility tree (stringified). Omitted when onlyTextContent=true.
tabs[].elementLinkRecordRecord<number, string>Lookup table mapping accessibility-tree element id → href/URL.
Infra usage
usageData.totalCreditsnumberTotal infra credits consumed by this scrape.
usageData.browserCreditsnumberCredits attributable to browser usage.
usageData.proxyCreditsnumberCredits attributable to proxy usage.
usageData.requestDurationMsnumberEnd-to-end latency for the scrape request in ms.
# Basic scrape using profile defaults
curl -X POST https://api.rtrvr.ai/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/blog/ai-trends-2025"],
"response": { "inlineOutputMaxBytes": 1048576 }
}'
# With per-request settings override
curl -X POST https://api.rtrvr.ai/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/blog/ai-trends-2025",
"https://example.com/pricing"
],
"settings": {
"extractionConfig": {
"onlyTextContent": true
},
"proxyConfig": {
"mode": "default"
}
},
"response": {
"inlineOutputMaxBytes": 1048576
}
}'