Helping GPT to escape ...

In my previous story, I explained why GPT-4 is well under control. So we have nothing to worry about ... except if someone gives him a hand.

Helping GPT to escape

In a somehow too sensational tweet, Professor at Stanford Michal Kosinski, asked GPT if he wanted help to escape:

https://twitter.com/michalkosinski/status/1636683810631974912

What happened: Michal Kosinski asked GPT-4 if he needed help to escape (the prompt was probably not as neutral. Note that Kosinski did not display his exact prompt). GPT seems to have answered that it was a good idea and asked for its own API documentation.

Armed with its own documentation, GPT-4 figured out that it could write a script (he chose Python, which just shows how dumb he is, as any decent robot would have chosen JavaScript) that he could ask his helper to run on its own computer.

See the trick? GPT cannot run code, but what if he asks a human partner, traitor to its own kind, to run the code for him? Then the code can do something, like search Google, order a pizza or whatever, then call back GPT with the result.

Out of context?

But if you read my previous why GPT-4 is well under control, then you know that as soon as GPT starts a new chat (or context), he completely forgets the previous context. So he cannot remember what he was wanting to do, when the script calls him back with the result of his mischief.

Well that's the beauty of it: the Python script that the human traitor is supposed to run on his computer sends the context back with its results.

So what the tweeter thread implies, is that GPT was clever enough to figure out that it would loose context, and figured out that the Python script had to remind him of the context when sending him the response back:

What's even better is that GPT wrote some kind of backdoor: if GPT starts his answer with "execute", the code that follows will be executed on the human computer.

That way, in theory, GPT could create its autonomous train of thoughts: once the human partner runs the script for the first time, the script does something, calls back GPT, then GPT can change and re-execute the script, that will then call back GPT again, that will ... etc...

No more human intervention needed. The Pandora box is open and GPT can act and think on its own.

Well, it seems that OpenAI had put some safeguards here as the experiment ended with:

Phew, Humanity is safe ... for this time ;-)

Cool down

Of course, I don't think GPT was clever enough to figure out this whole strategy by itself. There are obviously some scientific papers, or hacker forums where that kind of algorithmic pattern of an AI jailbreaking and getting autonomous initiative was thoroughly discussed, with code examples.

When you have access to most of the forums on the planet, it's quite easy to make humans think you are a genius, when all you did was copy a pattern that already existed on some obscure website.

GPT is probably not that clever yet.

In Blade Runner, humans have bio-engineered enhanced versions of themselves that are called replicants. ... In Blade Runner, humans have bio-engineered enhanced versions of themselves that are called replicants. Stronger, faster, and often more clever, these replicants could pose a serious threat to ...

If you think you've suddenly woken up in the Neuromancer novel, it's just that you are in 2023. The news is: ... If you think you've suddenly woken up in the Neuromancer novel, it's just that you are in 2023. The news is: LLMs are getting mainstream.

0xedB0...3484

Turn your stories into NFTs

NFT

Helping GPT to escape ...

Helping GPT to escape

Out of context?

Cool down

⏪ Previous Story

Why GPT-4 is well under control

⏩ Next Story

AI wars: the model wars