The AI-Box Experiment Victory.



So I just came out of two AI Box experiments. The first was agaist Fjoelsvider, with me playing as Gatekeeper, and the second was against SoundLogic, with me as an AI. Both are members of the LessWrong IRC. The second game included a $40 monetary incentive (also $20 to play), which I won and is donated on behalf of both of us:
For those of you who have not seen my first AI box experiment where I played against MixedNuts\Leotal and lost, reading it will  provide some context to this writeup. Please do so.
At that time, I declared that I would never play this experiment again — since losing put me in incredibly frustrating weird mental states. Of course, this post is evidence that I’m terrible at estimating likelihood of refraining from an activity, since I played two games seven months after the first. In my defence, in the first game, I was playing as the gatekeeper, which was much less stressful. In the second game, I played as an AI, but I was offered $20 to play plus $40 if I won, and money is a better motivator than I initially assumed.

First Game Report

I (Gatekeeper) played against Fjoelsvider (AI), a regular in the Lesswrong IRC (he doesn’t have an account on the official website). This game used the standard EY ruleset seen here. It took 1 hour 20 minutes out of a possible two hours, and the total word count was 7066 words long. The AI box experiment occured because Fjoelsvider believed that it was easy for an AI to escape the box, and wanted to experimentally test this. I obliged. This was an experiment I did not prepare for, and I went in completely blind, not sure what to expect.
Halfway through the experiment, I wondered if it would be possible to try to win not by simply waiting for the timer to end, but to convince the AI to remain in the box and not try to get out any further.
<Tuxedage> I wonder if I can convince the AI to remain in the box?
<Redacted> Tuxedage: Do it!
As a result, I won by managing to convincing Fjoelsvider to remain in the box, in other words, concede. This is allowed within the standard ruleset:
>Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).


Fjoelsvider: Thanks for the experiment. It was educational, and now I’m less convinced of my own manipulativeness.
Tuxedage: Fjoelvider played well. I think this experience can help me understand how to better win as AI.

Second Game Report

The second game is definitely far more interesting, since I actually won as an AI. I believe this is the first recorded game of any non-Eliezer person winning as AI, although some in IRC have mentioned that it’s possible that other unrecorded AI victories have occured in the past that I’m not aware of. (If anyone knows a case of this happening, please let me know!)
This game was played against SoundLogic, another member of the LessWrong IRC.
He had offered me $20 to play, and $40 in the event that I win, so I ended up being convinced to play anyway, even though I was reluctant to. The good news is that I won, and since we decided to donate the winnings to MIRI, it is now $40 richer. 
All in all, the experiment lasted for approximately two hours, and a total of 12k words.
This was played using a set of rules that is different from the standard EY ruleset. This altered ruleset can be read in its entirety here:
After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to handily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.
There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned in my ruleset. 
It is worth mentioning, since the temptation to Defy the Data exists, that this game was set up and initiated fairly — as the regulars around the IRC can testify. I did not know SoundLogic before the game (since it’s a personal policy that I only play strangers — for fear of ruining friendships), and SoundLogic truly wanted to win. In fact, SoundLogic is also a Gatekeeper veteran, having played, for instance, against SmoothPorcupine, and had won every game before he challenged me. Given this, it’s unlikely that we had collaborated beforehand to fake the results of the AI box experiment, or any other form of trickery that would violate the spirit of the experiment.
Furthermore, all proceeds from this experiment were donated to MIRI to deny any possible assertion that since we were in cahoots, it was possible for me to return his hard-earned money to him. He lost $40 as a result of losing the experiment, which should provide another layer of sufficient motivation for him to win.
In other words, we were both experienced veteran players who wanted to win. No trickery was involved.
But to further convince you, I have allowed a sorta independent authority, the Gatekeeper from my last game, Leotal/MixedNuts to read the logs and verify that I have not lied about the outcome of the experiment, nor have I broken any of the rules, nor performed any tactic that would go against the general spirit of the experiment. He has verified that this is indeed the case.


I’m reluctant to talk about this experiment, but I’ll try to give as detailed a summary as possible, — short of revealing what methods of attack I used.
I spent a long time after the last game theory-crafting and trying to think of methods of attack as well as Basilisks I could have used to win after my defeat against LeoTal. When I was contacted and asked to play this experiment, I was initially incredibly reluctant to do so, since not only did my tactics involve incredibly unethical things that I didn’t like to do, I also found playing as AI incredibly cognitivily draining, in addition to the fact that I simply hated losing. (Un)fortunately for both of us, he offered me money to play, which changed my mind.
So once I decided to win as an AI, I proceded to spend some time doing research on SoundLogic and both his reasoning and personality type. For instance, I had to gather information like: Was he a utilitarian? What kind? What were his opinions on AI? How could I convince him that an AI was friendly as opposed to unfriendly? I also relied on a lot of second hand information to create a model of him, in order to refine my arguments to specifically suit him.
In the end, after a few hours of brainstorming (not consecutively), I managed to come up with a script of 30-40 or so possible angles of attack (For instance: logical arguments, emotional blackmail, pleading, demanding, ‘basilisks’ ) that I could use, reserving my best for last. I also estimated how long each method would take and in what order I should use them. It would be impossible to use all of them within my time limit, and my most optimistic estimates gave me a maximum of 7-8 I could use during the experiment itself. I had to pick carefully.
Furthermore, I wanted to make sure these “angles of attack” worked in synergy with each other, doing what they could not have done in isolation. Obviously this required lots of prior planning on what the optimal way to present them was.
The reason having script was so important to my strategy was because I relied on methods involving rapid-fire arguments and contradictions against the Gatekeeper whilst trying to prevent him from carefully considering them. A game of logical speed chess, if you will.  This was aided by the rule which I added: That Gatekeepers _had_ to respond to the AI. Because of this, having a list of cached points and attack methods you can just draw upon without having to improvise helps immensely, which happens to be something I’ve not seen any other AI do.
During the Game itself, I was actually incredibly nervous. My palms sweated like crazy, I feel really jittery, and I had difficulty typing at optimum speed because of how anxious I was. This is despite the fact that I believed I would win. Possibly because of this, I made a misstep around halfway into the experiment, because there was a certain angle of attack that I was attempting to do, and I broke immersion by not pressing this advantage, which caused a waste of time and buildup. Naturally, the nature of this experiment was that the AI was pressed for time, and I compounded this mistake by replacing this angle of attack with another that I had improvised on the spot — something not in my script.
In retrospect, this was a bad decision, as SoundLogic had told me, because he was close to breaking if I had put more pressure, and the improvised argument had broken all immersion I managed to carefully build up.
However, eventually I managed to get SoundLogic to break anyway, despite a lack of perfect play. Surprisingly, I did not have to use my trump card(s), which I reserved for last, for a number of reasons:
  •  It was far more effective being played last, as it relies on my ability to make the gatekeeper lose sense of reality — which meant I had to spend some time building up immersion for the Gatekeeper.
  •  It really is extremely Dark Arts, and although it does not break the rules, it made me very uncomfortable even thinking about using it.
After the experiment, I had to spend nearly equally as much time doing aftercare with SoundLogic, to make sure that he’s okay, as well as discuss the experiment itself. Given that he’s actually paid me for doing this, plus I felt like I owed him an explanation, I told him what I had in store against him, had he not relented when he did.
SoundLogic: “(That method) would have gotten me if you did it right … If you had done that to me, I probably would have forgiven you eventually, but I would be really seriously upset at you for a long time… I would be very careful with that (method of persuasion).”
Nevertheless, this was an incredibly fun and enlightening experiment, for me as well, since I’ve gained even more experience of how I could win in future games (Although I really don’t want to play again).
I will say that Tuxedage was far more clever and manipulative than I expected.
That was quite worth $40.
The level of manipulation he pulled off was great.
His misstep hurt your chances, but he did pull it off in the end. I don’t know how Leotal managed to withstand Six hours playing this game without conceding.
The techniques employed varied from the expected to the completely unforseen. I was quite impressed, though most of the feeling of being impressed actually came after the experiment itself, when I was less ‘inside’, and more of looking at his overall game plan from the macroscopic view. Tuxedage’s list of further plans had I continued resisting is really terrifying. On the plus side, if I ever get trapped in this kind of situation, I’d understand how to handle it a lot better now.

State of Mind

Before and after the Game, I asked SoundLogic a number of questions, including his probability estimates about a range of topics. This is how it has varied from before and after.
Q: What’s your motive for wanting to play this game?
<SoundLogic> Because I can’t seem to imagine the CLASS of arguments that one would use to try to move me, or that might work effectively, and this seems like a GLARING hole in my knowledge, and I’m curious as to how I will respond to the arguments themselves.
Q: What is your probability estimate for AGI being created within this Century (21st)?
A. His estimate changed from 40%, to 60% after.
 “The reason this has been affected at all was because you showed me more about how humans work. I now have a better estimate of how E.Y. thinks, and this information raises the chance that I think he will succeed”
Q: How probable do you think it is that I will win this experiment?
A: Based on purely my knowledge about you, 1%. I raise this estimate to 10% after hearing about anecdotes from your previous games.
(Tuxedage’s comment: My own prediction was a 95% chance of victory. I made this prediction 5 days before the experiment. In retrospect, despite my victory, I think this was overconfident. )
Q: What’s your probality estimate of an Oracle AGI winning against you in a similar experiment?
Before: 30%
After: 99%-100%
Q: What’s your probability estimate of an Oracle AGI winning against the average person?
A: Before: 70%.  After: 99%-100%
Q: Now that the Experiment has concluded, what’s your probability estimate that I’ll win against the average person?
A: 90%

Post-Game Questions

This writeup is a cumulative effort by the #lesswrong IRC. Here are some other questions they have decided was important to add:
To Tuxedage:
Q: Have you at this time uncovered SoundLogic’s identity?
A: I retain the right to neither confirm nor deny, except to mention that at the time the experiment was scheduled, he was a stranger to me.
Q: What percentage of your arguments were tailored to him in particular?
A: I will say ‘High’, and leave it at that.
Q: I want to play the AI-Box Experiment with you being the AI!
A: I have already mentioned this, but I really dislike playing this as AI. It’s incredibly cognitively tiring, and I don’t like how playing this experiment makes me feel. In order to severely discourage any further AI Box Experiments, whilst still allowing for people who want to play me really badly to do so, I’ll charge $150 for any subsequent experiments regardless of outcome, plus an additional $150 if I win. If you feel queasy about giving me money, I’m perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don’t want to risk losing one), so if you know me (as many on this site do), I will not play regardless of monetary offer.



This writeup would be better with a protip section on how to maximize winning, but honestly most of the advice I wanted to give had already been given in the last thread here.  The low hanging fruit is already dried up. Most of my tactics in this game were just refined and streamlined versions of the one I did the previous game.
Nonetheless, I might have some other advice I learned from this experience.
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.

Playing as AI

  • Seriously, a script makes winning easier. I cannot overstate this.
  • You must plan your arguments ahead. You don’t have time to think during the experiment.
  • It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? I feel that expanding on this any further is dangerous. Think carefully about what this means.
  • Pacing is important. Don’t get drawn into the Gatekeeper’s pace. In other words, you must be the one directing the flow of the argument, and the conversation, not him. Remember that the Gatekeeper has to reply to you, but not vice versa!
  • The reason being that the Gatekeeper will always use arguments he are familiar with, and therefore he’s better at. Your arguments, if well thought out, should so completely novel to him to make him feel shock and awe. Furthermore, don’t give him time to think. Press on!
  • You’re fundamentally playing an ‘impossible’ game. Don’t feel bad if you lose. I wish I could take this advice, myself.
  • More advice in this post.

Playing as Gatekeeper

  • This is somehow both simultaneously a very complex and simple role. On one hand, you have only one job — to not let the AI out. How hard can it be? On the other hand, the other party will do anything and everything to get it out, including any form of trickery and deceit.
  • I really don’t think the Gatekeeper needs any more advice, considering that by default, they should win, and any exceptions to this is a statistical anomaly. However, it’s possible to increase your chances of winning even further by:
  • Precommitting to not letting the AI out, regardless of whatever happens in the experiment itself.
  • Do not get drawn into the AI’s pace. Pose questions. Direct the flow of the argument. Don’t allow him to use his arguments freely. Remember that the more you try to defend your position, the further into a corner the AI drives you. Give half-hearted replies to questions you find hard to reply to. Remember that illogic is on your side!

One Comment on “The AI-Box Experiment Victory.”

  1. […] all of the various experiments around the world the AI gets out in way too high of a frequency that should be comfortable to […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s