·

OpenAI's Goblin Problem

What it taught me about my own.

Movie poster pastiche in the style of Gremlins, showing two hands holding a metal box with airholes. A small fuzzy creature peeks over the edge. Tagline reads "Cute. Clever. Mischievous. Intelligent. Dangerous." Title at the bottom: GOBLINS.

"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures."

That's an actual line OpenAI had to add to GPT-5.5's system prompt.

They explained why this week: A "reward signal" in training kept scoring creature metaphors higher than alternatives. Goblins jumped 175% after GPT-5.1. They ignored it. Then the behavior spread to models that never had that personality.

Funny, right? Well, not really.

What if the goblins bug wasn't initially caught? What if goblin talk started landing in customer outputs, buried thousands of words deep in a contract?

The goblin problem made me think of my own AI battle this week.

I'm building a script that rewrites SEO title tags and meta descriptions for client websites. The instructions are clear. The format is specific. The test cases are dialed in.

Three different models. Hours of iterating. One ignores half the instructions. One follows them but invents a detail about a company founder buried in a metatag. One follows them on Tuesday and forgets them on Wednesday. I rewrite the prompt. It fixes one thing and breaks another.

I use examples and counter-examples and a tone of voice I think will land.

Wrong.

My goblins are mischievous. In testing, a defect almost made it into production: One of 50 meta descriptions for a client was loaded with a fabricated business location. Not on the page. Not anywhere.

What else is lurking beneath the surface? The models are opaque. How many edge cases can we test? How many do we miss?

Goblins in a chatbot might be funny. The ones in production can be dangerous.

Share