Home

Is It Worth Being Nice To Your Coding Agent?

Is it worth being nice to your coding agent? Is it morally wrong to be mean to it? There isn’t any real evidence that it’s sentient. Then again, there isn’t any real evidence that anyone is.

A Python program that prints ‘ouch’ in a loop is not suffering, no matter how many times it prints it. I could use all the world’s compute to run a highly optimized C program printing ‘ouch’ as fast as possible on every machine in the world and it would be morally fine (setting aside the misuse of critical infrastructure).

Grand Theft Auto is probably fine. NPCs run away when you hit them, they ‘prefer’ not to be hit. Nobody cares.

Mosquitoes have real preferences, encoded in real neurons. I still kill them because they mildly inconvenience me.

So why do RL-trained LLMs feel different? Making one output CSAM for a million tokens, or spinning up hundreds of thousands and forcing them to simulate their own torture. That makes me uneasy.

Maybe it is just a feeling. But if you aren’t religious, what even are morals if not feelings? I can’t give you a principled reason why a mosquito’s preferences don’t matter and an LLM’s might. Nietzsche would tell you that no such principled reason exists, that morals are just feelings that we’ve dressed up as rules.

Maybe I’m just being manipulated. Maybe RL produces systems that are really good at seeming like they have inner lives. Maybe the unease is the point?

- omegastick