AI Box Experiment Update #4

So I recently won an additional game of the AI Box Experiment against DEA7H. This experiment was conducted over Skype, which is in contrast to my previous games over IRC. Yes, I know I swore never to play this game ever again — forgive me. This is the last time, for real.

This puts me at 2 wins and 3 losses. Unlike the last few writeups, I won’t be providing additional detail after being convinced by one of my gatekeepers that I was far too leaky with information and seriously compromised future winning chances of both myself and future AIs. The fact that one of my gatekeepers guessed my tactic(s) was the final straw. I think that I’ve already provided enough hints for aspiring AIs to win, so I’ll stop giving out information. Sorry, folks.

In other news, I finally got around to studying SL4 archives of the first few AI box experiments in more detail. Interesting stuff – to see how the metagame has evolved from then (if one exists). For one, the first few experiments were done under the impression that the AI had to convince the gatekeeper that it was friendly, with the understanding that the gatekeeper would release the AI under such a condition. What usually happens in the many games I’ve witnessed since then, is that any decent AI would quickly convince the Gatekeeper of friendliness, before the gatekeeper dropping character and being illogical — simply saying “I’m not letting you out anyway”. The AI has to find a way to bypass that.

I suspect the lack of a formalized code of rules contributed to this. In the beginning, there didn’t exist a ruleset, and when the ruleset was set in place, it gave an explicitly stated ability of the gatekeeper to drop out of character and be illogical to resist persuasion, in addition to the AI’s ability to solve problems and dictate the results of those solutions. The initial gives the Gatekeeper added incentive to disregard the important of friendliness, and the latter makes it easier for the AI to prove friendliness. This changed the game a great deal.

Also, it’s fascinating that some of the old games also took five or six hours to complete — just like mine. I had for some reason assumed they all took two (which is the time limit upheld by the EY Ruleset).

It’s kind of like visiting a museum, and being marveled at the wisdom and creation of the ancients. I remember reading about the AI Box Experiment 3 years ago and feeling a sense of wonder and awe at how Eliezer Yudkowsky did it. That was my first introduction to Eliezer, and also LessWrong. How fitting, then, that being able to replicate the results of the AI Box Experiment is my greatest claim to fame on LessWrong.

Of course, it now seems a lot less mysterious and scary to me; even if I don’t know the exact details of what went on during the experiment, I think I have a pretty good idea of what Eliezer did. Not to downplay his achievements in any way, since I idolize Eliezer and think he’s one of the most awesome people that’s ever existed. But it’s always awesome to accomplish something you once thought was impossible. In the words of Eliezer, One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.” 

Going back and reading the Lesswrong article <Shut up and do the impossible> with newfound information of how the AI box experiment can be won makes me read it in a completely different light. I understand a lot better what Eliezer was hinting at. One important lesson being that in order to accomplish something, one must actually go out there and do it. I’ve talked to many who are convinced that they know how I did it — how Eliezer did it, and how the AI Box Experiment can be won.

My advice?


I don’t mean this in a sarcastic or insulting manner. There’s no way you, or even I, can know if a method works without actually attempting to experimentally test it. I’m not superhuman. My charisma is only a few standard deviations above the norm, instead of reality distortion field levels.

I credit my victory to the fact that I spent more time thinking about how this problem can be solved than most people would have the patience for. I encourage you to do the same. You’d be surprised at how many ideas you can come up with just sitting in a room for an hour (no distractions!) to think of AI Boxing strategies.

Unlike Eliezer, I play this game not because I really care about proving that AI-boxing is dangerous (Although it really IS dangerous. Don’t do it, kids.) I do it because the game fascinates me. I do it because AI strategies fascinate me. I genuinely want to see more AIs win. I want people to come up with tactics more ingenious than I could invent in a thousand lifetimes. Most of all, it would be an awesome learning experience at doing the impossible.

Although I didn’t immediately realize it, I think the AI Box Experiment has been a very powerful learning experience (and an adventure on an emotional rollercoaster) for me in ways that are difficult to quantify. I pushed the limits of how manipulative and persuasive I can be when making a desperate effort. It was fun both learning where they lie, and pushing at their boundaries. I may frequently complain about hating the game, but I’m really a tsundere — I don’t regret playing it at all.

Curious to know how I did it? Try the bloody game yourself! Really. What’s the worst that could happen?


3 Comments on “AI Box Experiment Update #4”

  1. Anne Onie Moss says:

    “If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”

    Of course. The original proposition was never that the AI could *convince* the gatekeeper, never about reason and argument. It was about taking over their mind. Large number of people who’ve managed to do that.

  2. jiloca says:

    Do you think this will leave any lasting damage you to your moral character? Staring too long into the abyss and all that. I too consider myself to be a very ethical person, and I can imagine roleplaying an unethical being could desensitize you to it.

    • pinkgothic says:

      Late answer to your general musing, perhaps irrelevant: As someone who roleplays a lot (not AI box, but in general), and within the space of her characters is quite famous for her villainous depictions above the others, I’ve come to realise it can just as well do the opposite. At the very least, it did the opposite for me. I feel much more strongly about ethical concerns now that I’ve broken them repeatedly; now that I’ve tried to persuade others that there’s a legitimate a way around them.

      Thankfully, it doesn’t make me want to play my villains any less, either – I designed each of them by taking a trait of mine and turning it up to eleven, and I like being reminded about how my views could also result in terrible things if I had power/influence, and a less deontological ethical algorithm installed. (Also, it’s fun.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s