LLM Judges Are Unreliable

CIP

Aug 15

1

How Positional Preferences, Order Effects, and Prompt Sensitivity Undermine Reliability in AI Judgments

Read →

1 Comment

AI Must Die

Sep 9Edited

this result is obvious to anyone who knows anything about language models. studies like these are conducted in bad faith to further legitimize the technology for these use cases under the (wrong) assumption that fundamental limitations can be satisfactorily corrected with further investment

Expand full comment

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts