Discussion about this post

User's avatar
Kevin Yu Chen Hou's avatar

What is 'good' truly seems non-trivial, and at times, impossible in a field like mental health.

In emergency psychiatry, you often hear that a clinician's ability to predict suicide risk is no better than chance. This is not a bad clinician; humans are just insanely complicated.

It's still worth exploring what guardrails can be built, and also building non-goodharted, cross-cultural evals (particularly for every country outside of US/China that don't have their own foundation models).

It sounds like your evals parallel some work we've done recently called SIM-VAIL (Weilnhammer et al., 2026) - using Anthropic's Petri to red-team using psychiatrist crafted personas across Phenotype-Intention personas.

Keen to follow how this work progresses - I'm hopeful the cumuluation of efforts will converge towards actionable steps in building more aligned LLMs.

John Lund's avatar

Thank you for bringing up the need for a diversity of cultural values in LLM evals (and behavior)! We have to start incorporating true diversity into our models or we risk an unprecedented flattening of society.

No posts

Ready for more?