#Model Alignment

2 articles with this tag

Jason Wolfe on OpenAI Model Specs & Behavior

Jason Wolfe from OpenAI discusses the concept of 'model specs' and their importance in guiding AI behavior, transparency, and the ongoing pursuit of safe and beneficial AI.

6 days ago

AI Research

RLAIF: Unpacking the Latent Value Hypothesis

The latent value hypothesis explains RLAIF by positing that pretraining encodes human values as representation directions, activated by constitutional prompts.

27 days ago