#LLM Evaluation
3 articles with this tag
AI Research
LLMs Fail Esoteric Code Tasks
Frontier LLMs show a dramatic capability gap on a new benchmark using esoteric programming languages, revealing a reliance on memorization over reasoning.
21 days ago
Artificial Intelligence
Balyasny's AI Engine
Balyasny Asset Management built a powerful AI research engine using OpenAI models, slashing analysis times and boosting investment team confidence.
26 days ago

Technology
Context-Aware Guardrails Tested
Mozilla.ai tested context-aware guardrails for LLMs in a humanitarian context, revealing crucial multilingual performance disparities and the need for robust, domain-specific safety policies.
about 2 months ago