#Model Alignment
2 articles with this tag

AI Research
Jason Wolfe on OpenAI Model Specs & Behavior
Jason Wolfe from OpenAI discusses the concept of 'model specs' and their importance in guiding AI behavior, transparency, and the ongoing pursuit of safe and beneficial AI.
6 days ago
AI Research
RLAIF: Unpacking the Latent Value Hypothesis
The latent value hypothesis explains RLAIF by positing that pretraining encodes human values as representation directions, activated by constitutional prompts.
27 days ago