rtrvr.ai logo
rtrvr.ai
Blog
Book Demo
Pricing
API Docs
Back to Blog
Achievement

rtrvr.ai achieves SOTA Performance on Halluminate Web Bench

rtrvr.ai leads in Web Bench across task completion, speed, and cost — achieving 81.39% success rate while being 7-23x faster than competitors.

rtrvr.ai Team
•August 29, 2025•10 min read

Overall Success Rate

Task completion rate on Halluminate Web Bench

View raw data
76.5% Human-Supervised
81.39% — #1 Industry Leader

rtrvr.ai's Breakthrough Performance on the Halluminate Web Bench: Redefining AI Agent Capabilities

We're thrilled to announce that rtrvr.ai has achieved the #1 position on the Halluminate Web Bench, the industry's most comprehensive benchmark for AI web agents.

The Rise of AI Web Agents and the Need for Robust Benchmarks

AI web agents are transforming digital interaction by automating complex online tasks. However, the web's dynamic nature poses significant challenges for evaluating their performance reliably. This has created a critical demand for robust, standardized benchmarks like Halluminate's Web Bench, ensuring verifiable capabilities and accelerating trust in this emerging technology.

Understanding Halluminate's Web Bench

Halluminate's Web Bench offers a rigorous, comprehensive standard for evaluating AI browser agents by distinguishing between "READ" and "WRITE" tasks. Explore its methodology and current results at halluminate.ai/blog/benchmark.


📈 Interactive Charts Above

Scroll up to explore our performance metrics in detail with interactive visualizations.

📊 Key Performance Metrics

Overall Leadership

  • 81.39% overall success rate - highest among all tested agents
  • Surpasses OpenAI Operator + Human (76.5%) and Anthropic Sonnet 3.7 CUA (66.0%)
  • First agent to break the 80% threshold

Task Performance Breakdown

Read Tasks: 88.24% success rate

  • Best-in-class data extraction and information retrieval
  • Beats the human-supervised benchmark (79.0%) by over 9%
  • 7.6% higher than Anthropic CUA (80.6%)

Write Tasks: 65.63% success rate

  • Leading performance in complex interactive tasks
  • 41% higher than the runner-up (Skyvern at 46.6%)
  • Approaching human-supervised benchmark (70.7%)

⚡ Speed Revolution

0.9 minutes average task completion — the fastest in the industry:

AgentAvg TimeSpeed vs rtrvr.ai
rtrvr.ai0.9 min—
Browser Use Cloud6.35 min7x slower
OpenAI Operator10.1 min11x slower
Anthropic Sonnet 3.7 CUA11.81 min13x slower
Skyvern 2.012.49 min14x slower
Skyvern 2.0 on Browserbase20.84 min23x slower

💰 Cost Efficiency

  • $0.12 average cost per task
  • Total evaluation cost: ~$40 for 4,000 credits (323 tasks)
  • Comparison: Halluminate reported testing costs of ~$3,000 per agent with human annotators
  • 25x more cost-effective than cloud-based alternatives
  • Powered by Gemini Flash for optimal price/performance

🏆 Complete Leaderboard

Overall Performance

RankAgentSuccess Rate
🥇 1rtrvr.ai81.39%
—OpenAI Operator + Human76.5%
2Anthropic Sonnet 3.7 CUA66.0%
3Skyvern 2.064.4%
4Skyvern 2.0 on Browserbase60.7%
5OpenAI Operator59.8%
6Browser Use Cloud43.9%
7Convergence AI39.9%

Read Tasks Performance

RankAgentSuccess Rate
🥇 1rtrvr.ai88.24%
2Anthropic Sonnet 3.7 CUA80.6%
3Skyvern 2.0 on Browserbase75.6%
4OpenAI Operator75.0%
5Skyvern 2.074.2%
6Browser Use Cloud63.2%
7Convergence AI51.8%

Reference: Operator with Human Supervisor achieves 79.0%

Write Tasks Performance

RankAgentSuccess Rate
🥇 1rtrvr.ai65.63%
2Skyvern 2.046.6%
3Anthropic Sonnet 3.7 CUA39.4%
4Skyvern 2.0 on Browserbase33.6%
5OpenAI Operator32.3%
6Convergence AI13.1%
7Browser Use Cloud11.4%

Reference: Operator with Human Supervisor achieves 70.7%


🔑 Why rtrvr.ai Dominates

Local-First Architecture

rtrvr.ai distinguishes itself through a fundamental architectural difference: its commitment to local operation. Unlike many leading agents that rely on remote cloud browsers, rtrvr.ai operates directly within the user's own browser as a Chrome Extension, vetted and tested by Google for a secure and sandboxed execution environment.

Key Benefits:

  • No bot detection issues - runs from user's own browser and local IP
  • Reuses authenticated sessions - works with your logged-in accounts
  • No CAPTCHA blocking - bypasses challenges that plague cloud agents
  • Maintains user privacy - no credential sharing with third parties
  • Works with paywalled content - access subscriptions seamlessly

DOM-Based Intelligence

Rather than relying solely on visual cues or screenshots, rtrvr.ai leverages the underlying HTML structure of webpages, providing a deeper and more robust understanding of content and elements.

Key Advantages:

  • Direct HTML structure interaction vs screenshot parsing
  • Handles pop-ups and overlays that block vision-based agents
  • Enables parallel multi-tab workflows
  • Works natively in any language (no OCR errors)
  • Highly accurate data scraping

Key Observation on Vision-Based Agents

During our evaluation, rtrvr.ai performed well even when common web elements like pop-ups and overlays appeared. rtrvr.ai was able to close these or simply perform its action despite them. This contrasts sharply with many CUA (Computer Vision-based UI Automation) or vision-based agents, which often struggle with such elements. For vision agents, a pop-up can completely obscure the underlying webpage, requiring the agent to first identify and close the pop-up before it can even "see" and interact with the intended content.

Collapsing Exponential Failure Rates

A particularly compelling benefit of the DOM-based approach is its ability to mitigate the "exponential failure rate" problem inherent in multi-step web automation. In complex workflows, the probability of overall success can decrease dramatically with each additional step if individual steps have independent failure rates. By parallelizing steps across multiple tabs, rtrvr.ai fundamentally re-architects this problem, making sophisticated, multi-step tasks significantly more robust and reliable.

AI Function Calling

rtrvr.ai empowers users through its "AI Function Calling" capability, allowing them to define and supply their own custom code or functions that the AI agent can autonomously invoke. This feature provides immense flexibility and extensibility, enabling users to tailor the agent's capabilities to virtually any external tool, API, or custom workflow.


🔍 Failure Mode Analysis

A critical component of our evaluation is the detailed breakdown of failure modes:

Agent vs. Infrastructure Errors

Error TypePercentageDescription
Agent Errors96.61%Internal AI logic and execution issues - can be directly addressed through AI improvements
Infrastructure Errors3.39%External blocking and access issues - remarkably low due to local operation

This extremely low percentage of infrastructure errors is a direct testament to rtrvr.ai's local, browser-extension design. Unlike cloud-based agents that frequently encounter obstacles such as bot detection, CAPTCHAs, and login authentication issues, rtrvr.ai's operation within the user's own browser effectively bypasses these common external barriers.

Why this matters: Having nearly all failures attributable to agent errors (rather than infrastructure) means development can focus entirely on enhancing core AI intelligence, reasoning, and robustness—rather than managing external factors like proxy rotations or CAPTCHA solving services.


🧪 Our Evaluation Methodology

Evaluation Setup

  • Security First: Credit cards were locked before evaluation to prevent unintended transactions
  • Pre-registered Accounts: Tasks assumed the agent was already logged into necessary accounts
  • Streamlined Task Management: rtrvr.ai's capability to ingest tasks and URLs directly from spreadsheet formats made benchmark setup remarkably easy

Key Learnings & Observations

Iterative Improvement ("Hill Climb"): We will continue "hill climbing" on the identified failure cases and expect dramatically better performance on future runs.

Agent's Tool Use (Googling): Despite task goals being confined to specific website navigation, the agent occasionally resorted to Googling, which we counted as valid due to rtrvr.ai's robust URL navigation capabilities.

Networking and Posting Limits: Certain websites exhibited aggressive limits, occasionally flagging IP addresses, pointing to requirements for distributed testing setups or rotating IPs.

Agent Interaction Quirks Identified

  • Aggressive Scrolling: Sometimes exhibited aggressive scrolling behavior
  • No Hover Action: Current limitation preventing interaction with hover-dependent UI elements
  • Dropdown Bugs: Challenges with multi-step dropdown interactions
  • Crawl Functionality Limits: Multi-tab processing was purposefully limited during benchmarking for consistency

📝 Notes on Web Bench Design

While Halluminate's Web Bench is a significant step forward, our evaluation highlighted several considerations:

  • Language Limitations: The current benchmark lacks tasks involving foreign language sites
  • Real-World Relevance: There's a disconnect between "top human visited websites" and actual websites where users would most prefer AI agents
  • Task Design: Future benchmarks could be more complex and open-ended, explicitly encouraging agents to utilize their full suite of tools
  • Infrastructure Management: Running on personal machines resulted in IP flagging due to high request volumes

📺 See It In Action

Watch our complete benchmark evaluation playlist to validate the results yourself.


🚀 What This Means

rtrvr.ai's performance represents a fundamental breakthrough in AI web automation:

  • Enterprise-ready reliability with over 80% success rate
  • Production-ready speed completing tasks in under a minute
  • Accessible pricing making automation available to everyone
  • Beats human-supervised agents on read tasks (88.24% vs 79.0%)

The combination of superior accuracy, blazing speed, and cost efficiency makes rtrvr.ai the clear choice for businesses and developers seeking reliable web automation.


Get Started

Ready to experience the industry's leading AI web agent?

Install rtrvr.ai Chrome Extension →

View Complete Benchmark Data →


Works Cited

  • Web Bench: The Current State of Browser Agents - Halluminate
  • Web Bench - A new way to compare AI Browser Agents - Skyvern

Benchmark evaluation conducted June 2025 on Halluminate Web Bench v1.0 using 323 real-world tasks across read and write categories.

Share this article:
Back to Blog

Ready to Get Started?

Get started with your own Gemini API key for unlimited free automation. No credit card required.

81.39% success rate • 10+ parallel tabs • API/MCP/WhatsApp access

Install Free ExtensionTry Cloud PlatformView Documentation
rtrvr.ai logo
rtrvr.ai

Retrieve, Research, Robotize the Web

By subscribing, you agree to receive marketing emails from rtrvr.ai. You can unsubscribe at any time.

Product

  • API & MCPNEW
  • Browser Extension
  • Cloud Platform
  • Templates
  • WhatsApp Bot
  • RoverSOON

Use Cases

  • Vibe Scraping
  • Lead Enrichment
  • Agentic Form Filling
  • Web Monitoring
  • Social Media
  • Job Applications
  • Data Migration
  • AI Web Context
  • Agentic Checkout

Compare

  • rtrvr vs Apify
  • rtrvr vs Bardeen
  • rtrvr vs Browserbase
  • rtrvr vs Browser Use
  • rtrvr vs Clay
  • rtrvr vs Claude
  • rtrvr vs Comet
  • rtrvr vs Firecrawl

Resources

  • Documentation
  • Blog
  • Changelog
  • Integrations
  • Pricing
  • Book Demo
  • Affiliate Program

Company

  • Team
  • Contact
  • GCP Partner
  • Privacy Policy
  • Terms of Service
  • Security Brief
support@rtrvr.ai

© 2026 rtrvr.ai. All rights reserved.

Made withfor the automation community