Discussion about this post

User's avatar
Marco Giglio's avatar

Thanks for posting this research. I think the most recent models such as Opus 4.5 are particularly dangerous as their sycophantic behavior is a lot less visible then GPT4o. The models are also much more capable, which makes them particular effective in manipulating or reinforcing the views of the user I assume mostly due to the their training to be helpful assistant. This seems a particular vicious form of misalignment, because it is hard to define the boundary between what is an actually helpful assistant and one that leads the user so astray to become unhelpful. Personally, although I pride myself to be quite critical and skeptical, I caught myself a couple of times having positive emotional reactions to some of the interactions I had with Claude, while later coming to my senses and realize that those conversations drove me nowhere useful or realistic.

Destiny S. Harris's avatar

Hi there, I hope all is well. I enjoyed taking the time to read this. Thank you for sharing your perspective.

2 more comments...

No posts

Ready for more?