#
The highest accuracy web search for your AI
## A web API purpose-built for AIs
Powering millions of daily requests
### Highest accuracy
Production-ready outputs built on cross-referenced facts, with minimal hallucination.
### Predictable costs
Flex compute budget based on task complexity. Pay per query, not per token.
### Evidence-based outputs
Verifiability and provenance for every atomic output.
### Trusted
SOC-II Type 2 Certified, trusted by leading startups and enterprises.
Powering the best AIs using the web
Highest accuracy at every price point
State of the art across several benchmarks
### About this benchmark
This benchmark[benchmark]($https://lastexam.ai/) consists of 2,500 questions developed by subject-matter experts across dozens of subjects (e.g. math, humanities, natural sciences). Each question has a known solution that is unambiguous and easily verifiable, but requires sophisticated web retrieval and reasoning. Results are reported on a sample of 100 questions from this benchmark. Learn more in our latest blog[latest blog]($https://parallel.ai/blog/introducing-parallel-search).
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
## Highest accuracy at every price point
State of the art across several benchmarks
### HLE Search LP
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 82 | 47 | | Others | exa | 138 | 24 | | Others | tavily | 190 | 21 | | Others | perplexity | 126 | 30 | | Others | openai gpt-5 | 143 | 45 |
### About this benchmark
This benchmark[benchmark]($https://lastexam.ai/) consists of 2,500 questions developed by subject-matter experts across dozens of subjects (e.g. math, humanities, natural sciences). Each question has a known solution that is unambiguous and easily verifiable, but requires sophisticated web retrieval and reasoning. Results are reported on a sample of 100 questions from this benchmark. Learn more in our latest blog[latest blog]($https://parallel.ai/blog/introducing-parallel-search).
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### BrowseComp Search LP
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 156 | 58 | | Others | exa | 233 | 29 | | Others | tavily | 314 | 23 | | Others | perplexity | 256 | 22 | | Others | openai gpt-5 | 253 | 53 |
### About the benchmark
This benchmark[benchmark]($https://openai.com/index/browsecomp/), created by OpenAI, contains 1,266 questions requiring multi-hop reasoning, creative search formulation, and synthesis of contextual clues across time periods. Results are reported on a sample of 100 questions from this benchmark. Learn more in our latest blog[latest blog]($https://parallel.ai/blog/introducing-parallel-search).
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### New Browsecomp (LP)
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ---------- | ---------- | ------------- | | Parallel | Ultra | 300 | 45 | | Parallel | Ultra2x | 600 | 51 | | Parallel | Ultra4x | 1200 | 56 | | Parallel | Ultra8x | 2400 | 58 | | Others | GPT-5 | 488 | 38 | | Others | Anthropic | 5194 | 7 | | Others | Exa | 402 | 14 | | Others | Perplexity | 709 | 6 |
CPM: USD per 1000 requests. Cost is shown on a Log scale.
### About the benchmark
This benchmark[benchmark]($https://openai.com/index/browsecomp/), created by OpenAI, contains 1,266 questions requiring multi-hop reasoning, creative search formulation, and synthesis of contextual clues across time periods. Results are reported on a random sample of 100 questions from this benchmark. Read the blog[blog]($https://parallel.ai/blog/deep-research-benchmarks).
### Methodology
- - Dates: All measurements were made between 08/11/2025 and 08/29/2025.
- - Configurations: For all competitors, we report the highest numbers we were able to achieve across multiple configurations of their APIs. The exact configurations are below.
- - GPT-5: high reasoning, high search context, default verbosity
- - Exa: Exa Research Pro
- - Anthropic: Claude Opus 4.1
- - Perplexity: Sonar Deep Research reasoning effort high
### RACER (LP)
| Series | Model | Cost (CPM) | Win Rate vs Reference (%) | | -------- | ---------- | ---------- | ------------------------- | | Parallel | Ultra | 300 | 82 | | Parallel | Ultra2x | 600 | 86 | | Parallel | Ultra4x | 1200 | 92 | | Parallel | Ultra8x | 2400 | 96 | | Others | GPT-5 | 628 | 66 | | Others | O3 Pro | 4331 | 30 | | Others | O3 | 605 | 26 | | Others | Perplexity | 538 | 6 |
CPM: USD per 1000 requests. Cost is shown on a Log scale.
### About the benchmark
This benchmark[benchmark]($https://github.com/Ayanami0730/deep_research_bench) contains 100 expert-level research tasks designed by domain specialists across 22 fields, primarily Science & Technology, Business & Finance, and Software Development. It evaluates AI systems' ability to produce rigorous, long-form research reports on complex topics requiring cross-disciplinary synthesis. Results are reported from the subset of 50 English-language tasks in the benchmark. Read the blog[blog]($https://parallel.ai/blog/deep-research-benchmarks).
### Methodology
- - Dates: All measurements were made between 08/11/2025 and 08/29/2025.
- - Win Rate: Calculated by comparing RACE[RACE]($https://github.com/Ayanami0730/deep_research_bench) scores in direct head-to-head evaluations against reference reports.
- - Configurations: For all competitors, we report results for the highest numbers we were able to achieve across multiple configurations of their APIs. The exact GPT-5 configuration is high reasoning, high search context, and high verbosity.
- - Excluded API Results: Exa Research Pro (0% win rate), Claude Opus 4.1 (0% win rate).
### WISER-Atomic
| Series | Model | Cost (CPM) | Accuracy (%) | | -------- | -------------- | ---------- | ------------ | | Parallel | Core | 25 | 77 | | Parallel | Base | 10 | 75 | | Parallel | Lite | 5 | 64 | | Others | o3 | 45 | 69 | | Others | 4.1 mini low | 25 | 63 | | Others | gemini 2.5 pro | 36 | 56 | | Others | sonar pro high | 16 | 64 | | Others | sonar low | 5 | 48 |
CPM: USD per 1000 requests. Cost is shown on a Log scale.
### About the benchmark
This benchmark, created by Parallel, contains 121 questions intended to reflect real-world web research queries across a variety of domains. Read our blog here[here]($https://parallel.ai/blog/parallel-task-api).
### Steps of reasoning
50% Multi-Hop questions
50% Single-Hop questions
### Distribution
40% Financial Research
20% Sales Research
20% Recruitment
20% Miscellaneous
## Search, built for AIs
The most accurate search tool, to bring web context to your AI agents
## The most accurate deep and wide research
Run deeper and more accurate research at scale, for the same compute budget
## Build a dataset from the web
Define your search criteria in natural language, and get back a structured table of matches
## Custom web enrichment
Bring existing data, define output columns to research, and get fresh web enrichments back
## Monitor any event on the web
Continuously monitor for any changes on the web
Start building
## Towards a programmatic web for AIs
Parallel is building new interfaces, infrastructure, and business models for AIs to work with the web
Latest updates
