#AI Benchmarks

5 articles with this tag

Exa Unveils New Code Search Benchmarks

Exa.ai releases 'WebCode', a new benchmark suite for evaluating search performance in coding agents, addressing limitations in existing tools.

8 days ago

Technology

Poetiq's Small Team Sets New Together AI Benchmark Records

Poetiq's small team is setting new Together AI benchmark records by leveraging recursively self-improving meta-systems to optimize existing LLMs.

about 1 month ago

Artificial Intelligence

Claude Sonnet 4.6 Ups the AI Ante

Anthropic's Claude Sonnet 4.6 launches with major upgrades in coding, reasoning, and computer use, plus a 1M token context window.

about 1 month ago

AI Research

Step 3.5 Flash: AI's New Efficiency Standard

Step 3.5 Flash AI model revolutionizes AI efficiency with a 196B parameter foundation and 11B active parameters, offering competitive performance with lower latency.

about 2 months ago

Artificial Intelligence

Claude Opus 4.6: Smarter, Faster, and Longer Context

Anthropic's Claude Opus 4.6 launches with a 1M token context window, enhanced coding, and state-of-the-art benchmark performance.

about 2 months ago