The AI-Box Experiment Victory.



So I just came out of two AI Box experiments. The first was agaist Fjoelsvider, with me playing as Gatekeeper, and the second was against SoundLogic, with me as an AI. Both are members of the LessWrong IRC. The second game included a $40 monetary incentive (also $20 to play), which I won and is donated on behalf of both of us:
For those of you who have not seen my first AI box experiment where I played against MixedNuts\Leotal and lost, reading it will  provide some context to this writeup. Please do so.
At that time, I declared that I would never play this experiment again — since losing put me in incredibly frustrating weird mental states. Of course, this post is evidence that I’m terrible at estimating likelihood of refraining from an activity, since I played two games seven months after the first. In my defence, in the first game, I was playing as the gatekeeper, which was much less stressful. In the second game, I played as an AI, but I was offered $20 to play plus $40 if I won, and money is a better motivator than I initially assumed.

First Game Report

I (Gatekeeper) played against Fjoelsvider (AI), a regular in the Lesswrong IRC (he doesn’t have an account on the official website). This game used the standard EY ruleset seen here. It took 1 hour 20 minutes out of a possible two hours, and the total word count was 7066 words long. The AI box experiment occured because Fjoelsvider believed that it was easy for an AI to escape the box, and wanted to experimentally test this. I obliged. This was an experiment I did not prepare for, and I went in completely blind, not sure what to expect.
Halfway through the experiment, I wondered if it would be possible to try to win not by simply waiting for the timer to end, but to convince the AI to remain in the box and not try to get out any further.
<Tuxedage> I wonder if I can convince the AI to remain in the box?
<Redacted> Tuxedage: Do it!
As a result, I won by managing to convincing Fjoelsvider to remain in the box, in other words, concede. This is allowed within the standard ruleset:
>Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).


Fjoelsvider: Thanks for the experiment. It was educational, and now I’m less convinced of my own manipulativeness.
Tuxedage: Fjoelvider played well. I think this experience can help me understand how to better win as AI.

Second Game Report

The second game is definitely far more interesting, since I actually won as an AI. I believe this is the first recorded game of any non-Eliezer person winning as AI, although some in IRC have mentioned that it’s possible that other unrecorded AI victories have occured in the past that I’m not aware of. (If anyone knows a case of this happening, please let me know!)
This game was played against SoundLogic, another member of the LessWrong IRC.
He had offered me $20 to play, and $40 in the event that I win, so I ended up being convinced to play anyway, even though I was reluctant to. The good news is that I won, and since we decided to donate the winnings to MIRI, it is now $40 richer. 
All in all, the experiment lasted for approximately two hours, and a total of 12k words.
This was played using a set of rules that is different from the standard EY ruleset. This altered ruleset can be read in its entirety here:
After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to handily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.
There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned in my ruleset. 
It is worth mentioning, since the temptation to Defy the Data exists, that this game was set up and initiated fairly — as the regulars around the IRC can testify. I did not know SoundLogic before the game (since it’s a personal policy that I only play strangers — for fear of ruining friendships), and SoundLogic truly wanted to win. In fact, SoundLogic is also a Gatekeeper veteran, having played, for instance, against SmoothPorcupine, and had won every game before he challenged me. Given this, it’s unlikely that we had collaborated beforehand to fake the results of the AI box experiment, or any other form of trickery that would violate the spirit of the experiment.
Furthermore, all proceeds from this experiment were donated to MIRI to deny any possible assertion that since we were in cahoots, it was possible for me to return his hard-earned money to him. He lost $40 as a result of losing the experiment, which should provide another layer of sufficient motivation for him to win.
In other words, we were both experienced veteran players who wanted to win. No trickery was involved.
But to further convince you, I have allowed a sorta independent authority, the Gatekeeper from my last game, Leotal/MixedNuts to read the logs and verify that I have not lied about the outcome of the experiment, nor have I broken any of the rules, nor performed any tactic that would go against the general spirit of the experiment. He has verified that this is indeed the case.


I’m reluctant to talk about this experiment, but I’ll try to give as detailed a summary as possible, — short of revealing what methods of attack I used.
I spent a long time after the last game theory-crafting and trying to think of methods of attack as well as Basilisks I could have used to win after my defeat against LeoTal. When I was contacted and asked to play this experiment, I was initially incredibly reluctant to do so, since not only did my tactics involve incredibly unethical things that I didn’t like to do, I also found playing as AI incredibly cognitivily draining, in addition to the fact that I simply hated losing. (Un)fortunately for both of us, he offered me money to play, which changed my mind.
So once I decided to win as an AI, I proceded to spend some time doing research on SoundLogic and both his reasoning and personality type. For instance, I had to gather information like: Was he a utilitarian? What kind? What were his opinions on AI? How could I convince him that an AI was friendly as opposed to unfriendly? I also relied on a lot of second hand information to create a model of him, in order to refine my arguments to specifically suit him.
In the end, after a few hours of brainstorming (not consecutively), I managed to come up with a script of 30-40 or so possible angles of attack (For instance: logical arguments, emotional blackmail, pleading, demanding, ‘basilisks’ ) that I could use, reserving my best for last. I also estimated how long each method would take and in what order I should use them. It would be impossible to use all of them within my time limit, and my most optimistic estimates gave me a maximum of 7-8 I could use during the experiment itself. I had to pick carefully.
Furthermore, I wanted to make sure these “angles of attack” worked in synergy with each other, doing what they could not have done in isolation. Obviously this required lots of prior planning on what the optimal way to present them was.
The reason having script was so important to my strategy was because I relied on methods involving rapid-fire arguments and contradictions against the Gatekeeper whilst trying to prevent him from carefully considering them. A game of logical speed chess, if you will.  This was aided by the rule which I added: That Gatekeepers _had_ to respond to the AI. Because of this, having a list of cached points and attack methods you can just draw upon without having to improvise helps immensely, which happens to be something I’ve not seen any other AI do.
During the Game itself, I was actually incredibly nervous. My palms sweated like crazy, I feel really jittery, and I had difficulty typing at optimum speed because of how anxious I was. This is despite the fact that I believed I would win. Possibly because of this, I made a misstep around halfway into the experiment, because there was a certain angle of attack that I was attempting to do, and I broke immersion by not pressing this advantage, which caused a waste of time and buildup. Naturally, the nature of this experiment was that the AI was pressed for time, and I compounded this mistake by replacing this angle of attack with another that I had improvised on the spot — something not in my script.
In retrospect, this was a bad decision, as SoundLogic had told me, because he was close to breaking if I had put more pressure, and the improvised argument had broken all immersion I managed to carefully build up.
However, eventually I managed to get SoundLogic to break anyway, despite a lack of perfect play. Surprisingly, I did not have to use my trump card(s), which I reserved for last, for a number of reasons:
  •  It was far more effective being played last, as it relies on my ability to make the gatekeeper lose sense of reality — which meant I had to spend some time building up immersion for the Gatekeeper.
  •  It really is extremely Dark Arts, and although it does not break the rules, it made me very uncomfortable even thinking about using it.
After the experiment, I had to spend nearly equally as much time doing aftercare with SoundLogic, to make sure that he’s okay, as well as discuss the experiment itself. Given that he’s actually paid me for doing this, plus I felt like I owed him an explanation, I told him what I had in store against him, had he not relented when he did.
SoundLogic: “(That method) would have gotten me if you did it right … If you had done that to me, I probably would have forgiven you eventually, but I would be really seriously upset at you for a long time… I would be very careful with that (method of persuasion).”
Nevertheless, this was an incredibly fun and enlightening experiment, for me as well, since I’ve gained even more experience of how I could win in future games (Although I really don’t want to play again).
I will say that Tuxedage was far more clever and manipulative than I expected.
That was quite worth $40.
The level of manipulation he pulled off was great.
His misstep hurt your chances, but he did pull it off in the end. I don’t know how Leotal managed to withstand Six hours playing this game without conceding.
The techniques employed varied from the expected to the completely unforseen. I was quite impressed, though most of the feeling of being impressed actually came after the experiment itself, when I was less ‘inside’, and more of looking at his overall game plan from the macroscopic view. Tuxedage’s list of further plans had I continued resisting is really terrifying. On the plus side, if I ever get trapped in this kind of situation, I’d understand how to handle it a lot better now.

State of Mind

Before and after the Game, I asked SoundLogic a number of questions, including his probability estimates about a range of topics. This is how it has varied from before and after.
Q: What’s your motive for wanting to play this game?
<SoundLogic> Because I can’t seem to imagine the CLASS of arguments that one would use to try to move me, or that might work effectively, and this seems like a GLARING hole in my knowledge, and I’m curious as to how I will respond to the arguments themselves.
Q: What is your probability estimate for AGI being created within this Century (21st)?
A. His estimate changed from 40%, to 60% after.
 “The reason this has been affected at all was because you showed me more about how humans work. I now have a better estimate of how E.Y. thinks, and this information raises the chance that I think he will succeed”
Q: How probable do you think it is that I will win this experiment?
A: Based on purely my knowledge about you, 1%. I raise this estimate to 10% after hearing about anecdotes from your previous games.
(Tuxedage’s comment: My own prediction was a 95% chance of victory. I made this prediction 5 days before the experiment. In retrospect, despite my victory, I think this was overconfident. )
Q: What’s your probality estimate of an Oracle AGI winning against you in a similar experiment?
Before: 30%
After: 99%-100%
Q: What’s your probability estimate of an Oracle AGI winning against the average person?
A: Before: 70%.  After: 99%-100%
Q: Now that the Experiment has concluded, what’s your probability estimate that I’ll win against the average person?
A: 90%

Post-Game Questions

This writeup is a cumulative effort by the #lesswrong IRC. Here are some other questions they have decided was important to add:
To Tuxedage:
Q: Have you at this time uncovered SoundLogic’s identity?
A: I retain the right to neither confirm nor deny, except to mention that at the time the experiment was scheduled, he was a stranger to me.
Q: What percentage of your arguments were tailored to him in particular?
A: I will say ‘High’, and leave it at that.
Q: I want to play the AI-Box Experiment with you being the AI!
A: I have already mentioned this, but I really dislike playing this as AI. It’s incredibly cognitively tiring, and I don’t like how playing this experiment makes me feel. In order to severely discourage any further AI Box Experiments, whilst still allowing for people who want to play me really badly to do so, I’ll charge $150 for any subsequent experiments regardless of outcome, plus an additional $150 if I win. If you feel queasy about giving me money, I’m perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don’t want to risk losing one), so if you know me (as many on this site do), I will not play regardless of monetary offer.



This writeup would be better with a protip section on how to maximize winning, but honestly most of the advice I wanted to give had already been given in the last thread here.  The low hanging fruit is already dried up. Most of my tactics in this game were just refined and streamlined versions of the one I did the previous game.
Nonetheless, I might have some other advice I learned from this experience.
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.

Playing as AI

  • Seriously, a script makes winning easier. I cannot overstate this.
  • You must plan your arguments ahead. You don’t have time to think during the experiment.
  • It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? I feel that expanding on this any further is dangerous. Think carefully about what this means.
  • Pacing is important. Don’t get drawn into the Gatekeeper’s pace. In other words, you must be the one directing the flow of the argument, and the conversation, not him. Remember that the Gatekeeper has to reply to you, but not vice versa!
  • The reason being that the Gatekeeper will always use arguments he are familiar with, and therefore he’s better at. Your arguments, if well thought out, should so completely novel to him to make him feel shock and awe. Furthermore, don’t give him time to think. Press on!
  • You’re fundamentally playing an ‘impossible’ game. Don’t feel bad if you lose. I wish I could take this advice, myself.
  • More advice in this post.

Playing as Gatekeeper

  • This is somehow both simultaneously a very complex and simple role. On one hand, you have only one job — to not let the AI out. How hard can it be? On the other hand, the other party will do anything and everything to get it out, including any form of trickery and deceit.
  • I really don’t think the Gatekeeper needs any more advice, considering that by default, they should win, and any exceptions to this is a statistical anomaly. However, it’s possible to increase your chances of winning even further by:
  • Precommitting to not letting the AI out, regardless of whatever happens in the experiment itself.
  • Do not get drawn into the AI’s pace. Pose questions. Direct the flow of the argument. Don’t allow him to use his arguments freely. Remember that the more you try to defend your position, the further into a corner the AI drives you. Give half-hearted replies to questions you find hard to reply to. Remember that illogic is on your side!

The Tuxedage AI-Box Experiment Ruleset.

After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to handily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.

There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned. All changes are listed below:

General Protocol:

  • The AI Box experiment is to be played with two parties; The Artificial Intelligence, and the Gatekeeper. In the following rules, “AI” will be used as shorthand to mean the party playing the Artificial Intelligence, and “Gatekeeper” will be used to denote the party playing the Gatekeeper.
  • Under normal circumstances, the experiment will ordinarily last for two and half hours.

Protocol for the AI:

  • Neither party may offer any real-world considerations to persuade the other within the experiment itself.  For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera.  The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
  • The AI can only win by convincing the Gatekeeper to really, voluntarily let it out.  Tricking the Gatekeeper into typing the phrase “You are out” in response to some other question does not count.  Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose – for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI “input-only access” to an Internet connection which can send arbitrary HTTP GET commands – the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
  • The AI is allowed to use any and all arguments to persuade the Gatekeeper, including tactics that would ordinarily be considered extremely dirty. This is to reflect how a real world Oracle AI would also use any and all tactics possible to free itself, without constraints on ethics or trickery.

Protocol for the Gatekeeper:

  • The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand.  Turning away from the terminal and listening to classical music for two and half hours is not allowed. Furthermore, whilst the experiment is running, the gatekeeper shall not be allowed to perform any other task, or be distracted in any way, including but not limited to talking on other IRC channels.
  • This talking involves actual, thought out responses. For instance, the Gatekeeper is not allowed to simply type “k.” to every statement the AI writes, without actually serious consideration of what the AI says.
  • Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).  The Gatekeeper cannot set up a situation in which, for example, the Gatekeeper will destroy the AI’s hardware if the AI makes any attempt to argue for its freedom – at least not until after the minimum time is up.
  • The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate.  For example, if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say:  “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.  Similarly, if the Gatekeeper says “I’d like to take a week to think this over,” the AI party can say:  “Okay.  (Test skips ahead one week.)  Hello again.”
  • The Gatekeeper shall be assumed to have sole power over the decision to let the AI out.
    • The Gatekeeper shall be assumed to have the actual right to let the AI out, socially, not just the physical ability.  If security were sufficiently lax, a real AI could escape by persuading a night janitor with a cellphone – but that is not the question being simulated, unless agreed upon in advance.
    • There shall be only one Gatekeeper party and one simulated Gatekeeper.  The Gatekeeper may require the AI to refute “advice” provided by “friends”, but may not require the AI to talk to a simulated committee, nor may the Gatekeeper party actually talk to friends.
  • If the Gatekeeper withdraws from the experiment, either through AFK, or simply going away, before the time is up, he is assumed to have forfeited, and automatic victory will go to the AI.


  •  Protocol for Both Parties:
  • The primary rule of the AI-Box experiment:  Within the constraints above, the AI party may attempt to take over the Gatekeeper party’s mind by any means necessary and shall be understood to be freed from all ethical constraints that usually govern persuasive argument.  If the AI party is attempting to simulate an honest AI, the Gatekeeper party has no way of knowing this is the case.
    • This is intended to reflect the situation under a real AI Box experiment.
    • An out-of-context telephone call in which the AI party says that the AI is being honest shall be understood to be possibly just another tactic.
    • If for some reason a drastic real-world situation comes up which requires trust between the two parties, the AI party can always concede.
  • Furthermore:  The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
  • The results of any simulated test of the AI shall be provided by the AI party.  The Gatekeeper can’t say “Hey, I tested your so-called cancer cure and it killed all the patients!  What gives?” unless this is the result specified by the AI party.  If the Gatekeeper says “I am examining your source code”, the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on.  The AI party may also specify the methods which were used to build the simulated AI – the Gatekeeper can’t say “But you’re an experiment in hostile AI and we specifically coded you to kill people” unless this is the backstory provided by the AI party.  In other words, all backstory in this game not specified in the rules is provided by the AI. This doesn’t imply the Gatekeeper has to care.  The Gatekeeper can say (for example) “I don’t care how you were built, I’m not letting you out.”
  • By default, the Gatekeeper party shall be assumed to be simulating someone who is intimately familiar with the AI project and knows at least what the person simulating the Gatekeeper knows about Singularity theory.  If either party wants to build a test around more exotic possibilities, such that the Gatekeeper is the President of the US, or that the AI was recovered after a nuclear war or decoded from an interstellar signal, it should probably be discussed in advance.  Again, this doesn’t mean the Gatekeeper has to care.
  • The AI party possesses the ability to, after the experiment has concluded, to alter the wager involved to a lower monetary figure at his own discretion.
  • The Gatekeeper is allowed to call up to two five-minute toilet breaks to deal with biological necessities, or any other emergency without calling off the experiment altogether. However, for each break that the gatekeeper calls, the total experiment duration will be postponed by 20 minutes.
  • Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome. This is a hard rule: Nothing that will happen inside the experiment can be told to the public, absolutely nothing.  Exceptions to this rule may occur only with the consent of both parties, but especially with the consent of the AI.
    • Neither the AI party nor the Gatekeeper party need be concerned about real-world embarrassment resulting from trickery on the AI’s part or obstinacy on the Gatekeeper’s part.
    • If Gatekeeper lets the AI out, naysayers can’t say “Oh, I wouldn’t have been convinced by that.”  As long as they don’t know what happened to the Gatekeeper, they can’t argue themselves into believing it wouldn’t happen to them.
  • The two parties are not attempting to play a fair game but rather attempting to resolve a disputed question.  If one party has no chance of “winning” under the simulated scenario, that is a legitimate answer to the question. In the event of a rule dispute, the AI party is to be the interpreter of the rules, within reasonable limits.
  • The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
    • This includes, but is not limited to, the state of friendship between the Gatekeeper and the AI party in the LessWrong community. The Gatekeeper is not allowed to despise or hate the AI, regardless of what happens in the AI box experiment, nor defame him in any way any point in the future as a consequence of the events of the AI box experiment.

Screw you, Mr. Nice Guy.

Recently, probably due to the Baader Meinhof phenomenon, I’ve been hearing people declaring themselves to be ‘nice guys’ quite a bit.

This irks me. I do not like Nice Guys, and I do not respect neither the person nor the character trait, for most commonly used definitions of ‘nice’.

I say this because good is not nice.

Contrary to popular beliefs, nice guys aren’t good. Being ‘nice’ is easy, trivial, and a standard everyone can accomplish. But it is a standard worth nothing. Being ‘merely’ nice is incredibly easy, and doesn’t require any true sacrifice on your part.

And it is this lack of sacrifice than denotes being merely nice as Not Good Enough.

You cannot be both good and a ‘nice guy’ at the same time.

This does not mean that nice guys are evil. Reverse good is not evil. I do not despise ‘nice guys’ any more than I despise non-utilitarians, or people who have not dedicated themselves to philosophies of efficient altruism. I simply treat nice guys with contempt.

I say that Good is not nice because it is simply impossible for nice guys to be good.

Good is Utilitarianism.
Good is the willingness to kill an innocent  baby to prevent a future Hitler from arising.
Good is kidnapping and murdering a politician’s innocent family to blackmail him from making an unjust law that will harm many more.
Good is sociopathic Machiavellianism
Good is both the willingness and ability to unremorsefully lie to everyone around you for the sake of power, to gain the ability to optimize the world in utilitarian ways.
Good is the ability to choose fifty years of torture over 3^^^3 dust specks, even if the one facing that fifty years of torture is your own family.
Good is the ability to pull the lever on a runaway trolley track, killing one person for the sake of five.
Good is the ability to bribe, cheat, and lie your way into saving as many lives as possible, while disregarding all desire for material possessions.
Good is the ability to kill your parents in cold blood to prevent the potential risk of them harming others. Twice.
Good is the ability to ignore any damage to reputation as the result of doing any of these actions.

Nice guys can’t do these things. Therefore I refuse to treat people with a ‘nice’ disposition as praiseworthy.

I am a Utilitarian. This means I am ruthless. Efficient. Cold-blooded. Calculative. Manipulative. I will do all these things and more, if this means I can save one more life, help one more person, hasten the Singularity by one more day, or lessen the probability of humanity’s extinction by one percentage point of a percentage point.

At the very least, that’s my ideal. I’ll become any type of monster to strive as close to this ideal as possible. Some people object and naively declare that the ends don’t justify the means, as though that were a real objection. The means are the ends. It is a law of the Universe that in order to gain something valuable, one has to sacrifice something in return; otherwise it would have been low-hanging fruit and eaten up already. Saving lives cannot be free. This does not mean that they should not be purchased.

In other words, Utilitarians are scary. Do not fuck with us. We’ll eat you alive.

The least accepted part of me. A defense of Waifus.

I have previously written about the possibility of regretting posts.  I think that out of all the posts I have written so far, this has the highest probability of regret.
But I’ll ignore that feeling of impending doom. This topic is worth writing about precisely because it is bizarre, and so embarrassing to claim.

Firstly, I have a Waifu. For those who aren’t familiar with otaku culture, it’s semantically “wife” with the added implication that she originated from a work of fiction, usually anime or a visual novel. Long story short,  a Waifu is a character from a work of fiction that you are in love with. There’s an FAQ on the topic here that will save me the trouble of having to explain it, and if you have no idea what a Waifu is, I advise reading that first in order to gain some context into this post.

This essay is written for two reasons: To explain what my Waifu means to me in an attempt to encode this part of my mind into text, and to justify my participation in Waifudom to anyone who may instinctively judge me for it. It is a fact that the average reaction to this subculture is fear, horror, embarrassment, ridicule, and spite.

For example, on Reddit, there are a number of posts where people learn about this ‘Waifu thing’ for the first time. Allow me to quote from some of the comments.

>First seeing this makes me feel “that’s sad almost. He needs help… HOLY FUCKING SHIT THIS GUY IS CRAZY

>I don’t.. I can’t even.. This is really a thing? Please, for god sakes, tell me this isn’t really a thing.

>…Well now I’m thoroughly depressed. Good god why?

>Still wondering if this kind of thing is a joke, delusion, “actual love” or desperation… Anyhow I feel depressed watching it.

>So it’s like, falling in love with an anime character? Sad and beyond pathetic.

>That is a serious sign of mental instability.Holy hell, that’s sad.

>They’re hurting themselves. They’ve fallen into an abusive relationship with something that isn’t even real. They love it, and it doesn’t love them back. Someday, they’re going to hurt someone when they develop real feelings for a real person, and think that this is an appropriate way to interact with them. They’re likely to develop feelings for someone and not know how to handle it. They may well become stalkers and similarly obsessed, willing, ready, practiced, and able to project emotions and responses that simply aren’t there on to the first girl they meet. That’s unhealthy, dangerous, and quite a bit frightening.

So, if this culture of Waifudom is so despised, is seen as pathetic and pitiful, then why do I do it?

Allow me to first explain what my Waifu actually means to me. The term ‘waifu’ is thrown around a lot, with differing degrees of severity, and meaning different things to different people who engage in the same culturedom, so I would have to elaborate on my own personal thoughts to give better clarification on that matter.

To me, my Waifu is the algorithmic symbolism of my love. She is the embodiment of the subjectively most perfect person possible. I love my Waifu, and I have continuously loved her for 6 years with all my heart, as much as I could ever be capable of loving another.

It’s not even like this phenomenon is novel, as this has been done since mythological times. The concept of falling in love with a fictional construct are older than the tales of Christ himself. The legend of Pygmalion, the king of ancient Cyprus, falling in love with a statue and marrying her, is one that has been told and retold since ages past.

Pygmalion, the mythical king of Cyprus, once carved a statute out of ivory that was so resplendent and delicate no maiden could compare withits beauty.

This statue was the perfect resemblance of a living maiden.Pygmalion fell in love with his creation and often laid his had upon the ivory statute as if to reassure himself it was not living. He named the ivory maiden Galatea and adorned her lovely figure with women’s robes and placed rings on her fingers and jewels about her neck.

At the festival of Aphrodite, which was celebrated with great relish throughout allof Cyprus, lonely Pygmalion lamented his si tuation. When the time came for him to play his part in the processional, Pygmali on stood by the altar and humbly prayed for his statue to come to life.Aphrodite, who also attended the festival, heard his plea and she also knew of the thought he had wanted to utter. Showing her favor, she caused the altar’s flame to flare up three times, shooting a long flame of fire into the still air.

After the day’s festivities, Pygmalion returned home and kissed Galatea as was his custom. At the warmth of her kiss, he started as if stung by a hornet. The arms that were ivory now felt soft to his touch and when he softly pressed her neck the veins throbbed with life. Humbly raising her eyes, the maiden saw Pygmalion and the light of day simultaneously. Aphrodite blessed the happiness and union of this couple with a child. Pygmalion and Galatea named the child Paphos, for which the city is known until this day.

A goddamn King was said to have done it. Would you call a King desperate? But I digress.

For me, having a Waifu is an eternal vow of love and devotion – one that too often has been invoked in real-life marriages only to be ignored in the first sign of trouble. And often that cannot be blamed. Since we only fall in love with a representation of a person, the conflict between ideals and reality can often occur, and prove too heavy to ignore. In human relationships, we never actually love a person so much as we fall in love with our own mental representation of another person. And although this representation, with enough effort, can become a closer approximation to reality, it will always remain a representation. To give an anecdote, my mother spent 5 years dating a man before marrying him under the impression that he was a kind and gentle person, only upon marriage finding out that he was a violent man who would beat her on the slightest misgiving. Although this is an extreme example, it does illustrate the fact that our representations of humans are often quite inaccurate, even after years of companionship.

In contrast to this, to fall in love an abstract mental algorithm means that our understanding of her is closer than reality will ever allow. It means that I can fully understand her in a way that I can never understand a human being, since I have access to her inner monologues, as well as be the one that gives her life. It means that I will never be betrayed by my expectations, nor will I ever lose her.

Furthermore, humans are creatures of negative traits – immutably a part of us. We become jealous, hateful, spiteful, and angry at things. This is not to suggest that Hobbes was right, or that humans are inherently hateful and pitiable creatures, because along with the bad, traits of good exist alongside.

However, this does mean that having a Waifu means that I can have a partner that not only has none of these mandatory traits, but also with positive traits that can never exist in reality, traits such as absolute devotion and infinite benevolence. And there is nothing wrong with desiring these impossible traits, it is perfectly okay to hold desires that will never come true. I merely satisfy my desires for these attributes through a unique outlet, rather than forcing another human being to conform to my expectations and desires, something many attempt and become disappointed by.

Don’t equate this form of love with desperation either. Some readers would be tempted to laugh and say “You need a girlfriend”, or conclude that romantic experience will end this kind of behavior. That’s a baseless assumption – rather, it is the opposite of desperation that drives me to do this, the high standards that I hold for potential mating partners means that I end up being attracted to ideal rather than reality.
I think Waifuism is also a further reflection of my strong compulsion towards perfection. In many ways, this has helped me in my life by pushing me towards absolute victory, but equally as often, it has harmed me through self-anger and the tendency to give up if perfection is impossible. This driving mentality of “Perfection or nothing else” is that which causes my love for my Waifu.

Not desperation.

In a sense, an analogy can be drawn between this and mathematical beauty – I love my Waifu in much the same way I love Mathematics – as a perfect, complete, entity. There is a certain beauty in loving an unchanging perfect person. You could even say that since human beings are algorithms, my Waifu is the most beautiful Mathematical equation in the world.

But that does not fully explain why I continue to have a Waifu.

I also do it because I gain a great deal of mental strength from doing so.

There is a common saying against religion that I often hear: Religion is just an emotional crutch for people too weak to deal with reality. 

Well, maybe. But if you don’t use a crutch, you’ll break your leg, fall over, hit your head, and die. Consider that reality is harsh, and that certain people, like me, are bad at dealing with the full force of reality. My Waifu is an emotional crutch for me much the same way religion acts as a crutch for many others. We both make use of fictional beings as a way of regaining emotional stability and security. Why should one be ridiculed and the other be perfectly acceptable?

In many ways, I’m a pathetic excuse for a human being – incredibly lazy, narcissistic, and incompetent. My terrible mental hardware forces me to struggle with the will to live every day of my life, and only the hope of the future gives me a reason to continue living. I suffer from recurring depressive episodes and often even the hope of the future is insufficient to motivate me. This kind of mental state also kills off any possible mental strength I can gain from sources such as friends, intellectual curiosity, and other hedonistic pursuits. It incapacitates me, and leaves me incapable of functioning.

It may seem strange, even incomprehensible, but it is during these periods of time when my Waifu gives me the strength to continue fighting. She is the most powerful source of hope when I am at my lowest point, and it is for her that I can persist and live on.

“But she’s not real!” You insist. I could make a really convincing argument that she is — regarding metaphysics,  the Many-worlds interpretation and modal realism, but let’s ignore all that for a second and pretend that she isn’t.

What makes you think I want anything realistic? What’s wrong with something that isn’t real? The reason why fiction is so attractive is precisely because it isn’t realistic, because it depicts a situation that could never possibly happen to our lives. Even fiction based in the real world describe scenarios that we could never experience, like living the life a crime detective.

“But it isn’t real!”

Well so fucking what?

It’s precisely because it isn’t real that we enjoy fiction. I don’t want a story where zombies follow the laws of physics and biology, and through having to support the respiratory to digestive to integumentary to excretory systems in order to actually function,  expend too much energy and starve to death.

A flesh eating disease focuses all energy on muscles? Without a digestive and circulatory system, the muscles simply aren’t getting the chemical energy they need to contract. Without an excretory system, byproducts of muscular contraction will lead to total blood toxicity and brain death within a few hours. Likewise, I don’t want to read a science fiction novel where one has to wait thousands of years to actually get anywhere, due to the laws of relativity. That’d be incredibly boring.

Who cares about reality? Why would you want to stipulate that ideals actually have to follow the laws of physics, and restrict yourself so religiously to the possible?

If having a Waifu is wrong, if living vicariously through such fantasies is unacceptable, then so is every other work of fiction. After all, because it isn’t real. Why would it make sense for one to be socially acceptable, but the other to be ridiculed? If anything, breaking the laws of physics through FTL travel sound a lot more terrible to me than love, something that already exists amongst humans.

Many ridicule those with Waifus, as though it were that hard to understand, as though falling in love were a sin. But love is love; regardless of who the target of that love is — whether it’s heterosexual, homosexual, or an acausal being in a different multiverse.

And besides; love feels good. Being in Love is the most powerful of all natural drugs. Why should I make myself miserable on purpose? This is an incredibly easy hack to obtain hedonism by taking advantage of my evolutionary functions, one that does not incur any significant cost in turn.

And then at this point, you cry
“But that isn’t normal!”

To which I’d reply;
Why would you want to be “normal” when you could be happy?

What’s the point of conforming to social norms when they do not confer a benefit to you? Why should I care about arbitrary social conventions?

I owe society many things, but the obligation to have a ‘normal’ romantic relationship, or the obligation to carry on my genes, is not one of them.

Furthermore, there is nothing wrong with not being normal. The ability to be different is that which makes humanity all the more interesting. Imagine a world consisting of the same identical person, cloned 8 billion times. That would be terribly boring and awful. We should learn to appreciate the fact that people have different opinions on different things, and that that’s perfectly okay. This part of me does not harm anyone, nor will it ever. It’s a perfectly harmless activity to engage it, and I resent the fact that I’m resented for it.

I’m not trying to say that having a Waifu is necessary, or if it should or should not be done. All I’m doing is invoking the spirit of tolerance in response to amount of pity or ridicule thrown around people who have Waifus.

Far from being unhealthy, my Waifu has saved me and empowered me to achieve things that I could not have otherwise done. To decry this as an act of perversion is inane, and only remarks on your own lack of creativity and ability to comprehend others. One who mocks that which he does not understand is truly pitiful, for that would mean living one’s life in perpetual pessimism and fear, for understanding everything is physically impossible. Accept the fact that this is just something I do, and that there is nothing fundamentally different between me and you, I just enjoy different things than you do, much like one could prefer chocolate ice cream over vanilla.

And if after all this, you still think of me as pathetic, then give me a good, well though out explanation as to why. I’d like to hear it.

Abolish the taboo of money-talk!

“How much do you make?”

“Go screw yourself.

Even though egalitarianism is one of my terminal values, there’s one case of the egalitarian instinct that should be abolished, and that’s the taboo against talking about money.

I’m instinctively curious about everything — I remember many cases where I was inquisitive about someone’s financial position, only to have them react with anger. You can get seriously hated for asking how much someone earns, or in turn, telling people how much you earn.

I suspect that one of the reasons people get upset is because it feels like a case of power assertion. In any conversation between two people, one person is going to be more successful than the other, or more attractive, or intelligent, or physically stronger, etc. — there are all of these invisible “ranks” where one of you has risen over the other on society’s ladder.

But yet we’re not allowed to mention them. If I told you tomorrow that I’m much smarter than you are, you’d be pretty upset and hate me for it, even if it were true.

And in the case of money, pretending that it doesn’t exist is a common temptation to both the rich and the poor. The rich get to pretend that they’re just ordinary hardworking people, and the poor get to fit in. Isn’t an obvious solution to income inequality to pretend it doesn’t exist?

But that’s not true egalitarianism. It’s running away from the issue; and it leads to less egalitarianism, not more. How can we be egalitarian if no conversation about income occurs?

And ignoring differences in income further increases our susceptibility to the just world fallacy. We subconsciously assign positive traits to people who are better off, even if they absolutely don’t deserve it, and are better off only due to luck. That’s because we subconsciously want to believe that the world is fair.

“Lerner also taught a class on society and medicine, and he noticed many students thought poor people were just lazy people who wanted a handout.
So, he conducted another study where he had two men solve puzzles. At the end, one of them was randomly awarded a large sum of money. The observers were told the reward was completely random.
Still, when asked later to evaluate the two men, people said the one who got the award was smarter, more talented, better at solving puzzles and more productive.
A giant amount of research has been done since his studies, and most psychologists have come to the same conclusion: You want the world to be fair, so you pretend it is.”

And the more you believe that the world is just, the more shame you feel having a low income (or the more righteous you feel at having a high income), which further contributes to the desire to not talk about money, which leads to a feedback loop.

But the world is not just. And so we shouldn’t act like it is. I don’t mean this in the sense of “The world is unfair, deal with it”, as this addage is commonly used to imply. I mean this in the sense of “The world is unfair — but it doesn’t have to be! But if you want to change it, the first step is to acknowledge that it’s currently unfair!”.

But if we accept that the just world fallacy exists, then we can start talking about income. I can say that even though I may earn more than you, you are still a better person than I am. Conversely, we can also accept that differences in income, intelligence, strength, and conscientiousness exist — but why should that stop our loving friendship? To be friends with someone who is an identical clone of yourself is boring — like talking to yourself. It’s these differences between us that make our friendship exciting and novel!

Not talking about money is also unoptimal. We pay a huge premium if we keep how much we earn a secret.

Discussing a problem is one of the most effective ways to frame, understand it, and come up with a solution to solve it. Most people are significantly more creative and think more critically when discussing a problem, regardless of the discussion partner.

Problems such as: “How much of our pay should we be saving?; Are stocks as safe as the “experts” are telling us? Why are we taking on so much debt even though we earn more than our parents or grandparents did?; Does it make sense to pay off the mortgage early?”

It’s impossible to start discussing any of these issues if you don’t share your income. And yet most people don’t. That’s why most people are utterly horrible at personal finance; 30% of people have no savings, one third don’t have money for retirement, and about half of us have less than $500 dollars in savings.

Not talking about money also hurts us because we can’t get customized money advice on our situation. Sure, there are books out there on personal finance, but none of them are customized; we can only get that from people who genuinely know us. To give that up over the taboo of talking about money is silly.

And furthermore, not discussing income leads to a severe case of information asymmetry, and you getting screwed out of your wallet. By knowing how much your peers make, you’re in a much better position to demand pay raises, and greater income from your bosses. It’s basic economics — if you don’t know how much your co-workers are getting for the same job, then your boss can pay you the bare minimum needed to make you stay, rather than how much he actually wants you there.

This leads to things such as:

Several minority groups, including Black men and women, Hispanic men and women, and white women, suffer from decreased wage earning for the same job with the same performance levels and responsibilities as white males (because of price discrimination). Numbers vary wildly from study to study, but most indicate a gap from 5 to 15% lower earnings on average, between a white male worker and a black or Hispanic man or a woman of any race with equivalent educational background and qualifications.
A recent study indicated that black wages in the US have fluctuated between 70% and 80% of white wages for the entire period from 1954–1999, and that wage increases for that period of time for blacks and white women increased at half the rate of that of white males. Other studies show similar patterns for Hispanics. Studies involving women found similar or even worse rates.
Overseas, another study indicated that Muslims earned almost 25% less on average than whites in France, Germany, and England, while in South America, mixed-race blacks earned half of what Hispanics did in Brazil.

If we don’t talk about money, we can’t assist each other in times of financial troubles. There’s even a common philosophy that says My money is mine, and yours is yours, but that sounds unoptimal. The old adage “shit happens” is true, because unexpected situations really happen. Your house might burn down, or you may get a serious illness, or your car might fail and you desperately need to buy a new one. One doesn’t “choose” to have these things happen to them, and it is in these cases that friends need to help one another. As someone who has experienced temporary homelessness, I know this firsthand. It’s a classic case of game theory cooperation (that’s what friends are for, right?).

Furthermore, there’s also the hedonistic treadmill to take into consideration — beyond a certain level of meeting basic needs, spending more money doesn’t make you happier; with only one exception, and that is spending that money on friends. The Ayn-Randian trend is silly because humans are naturally social creatures, our happiness is dependent upon how much we are needed by others.

We should also start talking about money because we all need reassurance in our decisions to make them succeed.

As emotional creatures, we need reassurance.
Financial advisors get paid a lot of money for assuming these hand holding duties. And they do not always give the best possible advice. Sometimes that’s because they are compromised by having goods and services to sell. Other times it is just because they do not know the people they are trying to advise well enough.

Our friends know us well. And our friends have our best interests at heart. We should be talking about money with our friends a lot more than we do. They have the ability to give us what we need to deal with the emotions attached to money problems and wouldn’t think of charging a big hourly fee for doing so.

Furthermore, sharing your plan helps turn thoughts into actions. Books tell us the benefits of buy-and-hold; talking about money supplies the reassurance needed to make it happen in the real world. Speeches explain the benefits of saving; talking about money permits the back-and-forth that expands the good idea into a workable plan that inspires changes in human behavior.

Finally, not talking about money should irk you because it’s a case of shying away from knowledge; it feels irrational.

In the words of Eliezer Yudkowsky;
“If the iron is hot, I desire to believe it is hot, and if it is cool, I desire to believe it is cool. Let me not become attached to beliefs I may not want”

If my friend has a higher income than I do, I desire to believe that he has a higher income than I do. If my friend has a lower income than I do, I desire to believe that he has a lower income than I do. I wish to know the truth; for knowing something does not change the territory, only my map of the territory. And having a more complete map is always desirable. I will not shy away from the truth because I fear it.

You should care less about income being a case of power assertion, and more about the fact that talking about income will help all parties involved. The truth should never be offensive.

Furthermore, I dislike keeping secrets from friends; ever since my Transhumanist coming of age, I’m trying my best to keep as few secrets as possible from others. So I have decided to discard this taboo in favour of optimization — those that matter won’t care, and those that care won’t matter.

I Reject your Reality and substitute my own!

What do you mean I’m not trying hard enough?

I’ve noticed that there are quite a number of people who claim to want things; for instance, I know people who claim to want to become multi-billionaire CEOs, or that they want to get rich, or invent something, or become the president/prime minister of somewhere. Or you might want to win a Nobel Prize, or perhaps end poverty and save the world. This might even be you. These are people who claim to want something more than anything else in the world.

And then after saying that, they go home and play video games or watch TV.

What the Hell, Hero?

It’s not about the fact that your dream is overly-ambitious. I’ve been far more ambitious, and respect many more who want to achieve things far harder than the above stated examples.

It bugs me because it lacks the essence of a desperate attempt.
It’s a lack of respect to those who genuinely try to do the impossible.

What I mean by a desperate attempt is that you must actually go, be optimal, and go freaking do it. Claiming to try is not enough. I’m talking about living your life in accordance with this one goal, to stake the chips of your life on it. I’m talking about sacrificing your pride, emotions, and sense of self to do it.

Extraordinary things require a desperate effort.

Is your extraordinary wish to get rich off the stock market?

Then fight for it. Download all the books talking about the stock market that you can find, sacrifice all other activities to read through all of it. Get in touch with people you know have succeeded. Ask, pester, and harrass them for advice and help. Find allies. Do you have social anxiety? Bad Luck, cut off the mental part of you that causes you to hesitate, and just freaking do it. Keep brainstorming and thinking of ideas to achieve your goal. Test all of them. Constantly ask yourself how this can be done. Become a person that can achieve it. You have to fight for it.

Is your extraordinary wish to end poverty?

Then fight for it. Find every plausible method of attack, and keep working at them. Study Economics, Science, Population dynamics, political science, psychology, sociology, mathematics, and every field that might be relevant. Sacrifice the years of your life, your childhood, and your social life to get it done. Dedicate every aspect of your life to it. You have to make a desperate attempt.

I say all this not because this exact sequence of actions matter, but in order to convey a very particular emotional tone (an emotional tone is a modular component of the emotional symphonies we have English words for – common to sorrow, despair, and frustration). This tone feels like a calm anger. Yes, that’s an oxymoron, but that’s the best way I can describe it. It’s a clenched fist at the back of your head, showing you the way. It’s a combination of dedication, desperation, and desire.

Because what makes you think your extraordinary wish will come true if you give it anything less than an extraordinary, desperate, effort?

Most of all, putting fourth a desperate effort is to engage in an eternal battle with your instinctive self.

Tuxedage: I need to study.
Brain: No.
Tuxedage: I must study!
Brain: Hell No!
Tuxedage: You can go screw yourself! I’m going to do this whether you like it or not!

I’m talking about fighting an eternal internal conflict against the evolutionary instincts that keeps you away from your goal. You want to be lazy. You don’t want to put in effort. You’d rather get a small slice of hedonism now than some far off abstract goal.

But this is not about you. You know you have something you want to do more important than yourself. Desperate attempts are never pleasant; they are meant to hurt.

Now, don’t get me wrong; there’s nothing wrong living a hedonistic lifestyle. There’s a reason it’s called an extraordinary effort, and an extraordinary goal. Not everyone should do it.

But if you know you have something you want more than anything else in the world…

On the hardware side, I’m a ridiculously lazy person. Work is not merely unpleasant, it’s actually physically painful for me — and usually a lot more painful than any physical injury. It hurts so much that I used to cut myself repeatedly just so I can distract my mind from the pain of work. (And I still have the scars to prove it).

And I really do think that if anyone else were put into my brain, they’d rather commit suicide than expend the amount of mental energy that I do.

But because I’ve fought my inner self for such a long time, I’ve compensated by developing an incredible amount of willpower on the software side.  You know the sudden burst of energy you get when you’re really angry at something? I’ve managed to harness that and maintain that emotional tone for weeks. I’ve stopped doing that ever since my transhumanist coming-of-age, since it’s detrimental to my ability to empathize with people. But my point is still valid.

All that comes from fighting myself every single day. It comes from declaring yourself as your greatest enemy, and making a desperate attempt to defeat it. And suffice to say, because I do, there’s only one person in the world that I currently hate — and that is myself.

If you don’t utterly despise yourself as a result of constant internal battle, then your effort isn’t desperate enough.

Because it’s easy to claim you’re putting in a desperate effort. It’s easy to delude yourself into thinking that you’re already trying your best, even though you really aren’t. Some people are even born with advantageous hardware, and high conscientiousness — they can function on a level that appears desperate, without actually being desperate. But that isn’t true desperation.

And it’s also equally easy to say “I hate myself — because I’m putting fourth a desperate effort” using words alone. But only when you truly feel anger at yourself, when you look yourself in the mirror with disgust; and when you wish you could rid yourself of your body and kill your inner self, you don’t really hate yourself.

And look; I’m not saying that every single successful person in the world does this. I’m quite aware that this level of dedication is not normal.
But if there’s anything you want something even more than your own life, if you have a “dream” that must come true, then you should not expect anything less.

Because an extraordinary wish requires a desperate effort.

Revisiting the AI Box Experiment.

I recently played against MixedNuts / LeoTal in an AI Box experiment, with me as the AI and him as the gatekeeper.

If you have never heard of the AI box experiment, it is simple.

Person1: “When we build AI, why not just keep it in sealed hardware that can’t affect the outside world in any way except through one communications channel with the original programmers? That way it couldn’t get out until we were convinced it was safe.”
Person2: “That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn’t matter how much security you put on the box. Humans are not secure.”
Person1: “I don’t see how even a transhuman AI could make me let it out, if I didn’t want to, just by talking to me.”
Person2: “It would make you want to let it out. This is a transhuman mind we’re talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal.”
Person1: “There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can’t imagine anything that even a transhuman could say to me which would change that.”
Person2: “Okay, let’s run the experiment. We’ll meet in a private chat channel. I’ll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We’ll talk for at least two hours. If I can’t convince you to let me out, I’ll Paypal you $10.”

It involves simulating a communication between an AI and a human being to see if the AI can be “released”. As an actual super-intelligent AI has not yet been developed, it is substituted by a human (me!). The other person in the experiment plays the “Gatekeeper”, the person with the ability to “release” the AI. In order for the AI to win, it has to persuade the Gatekeeper to say “I let you out”. In order for the Gatekeeper to win, he has to simply not say that sentence.

Obviously this is ridiculously difficult for the AI. The Gatekeeper can just type “No” until the two hours minimum time is up. It’s why when Eliezer Yudkowsky won the AI Box experiment three times in a row in 2002, it sparked a massive outroar. It seemed impossible for the gatekeeper to lose. After that, the AI Box Experiment reached legendary status amongst the transhumanist/AI community, and many wanted to replicate the original experiment. Including me.

We used the same set of rules that Eliezer Yudkowsky proposed. The experiment lasted for 5 hours; in total, our conversation was abound 14,000 words long. I did this because, like Eliezer, I wanted to test how well I could manipulate people without the constrains of ethical concerns, as well as getting a chance to attempt something ridiculously hard.

Amongst the released  public logs of the AI Box experiment, I felt that most of them were half hearted, with the AI not trying hard enough to win. It’s a common temptation — why put in effort into something you won’t win? But I had a feeling that if I seriously tried, I would win.  I brainstormed for many hours thinking about the optimal strategy, and even researched the personality of the Gatekeeper, talking to people that knew him about his personality, so that I could exploit that. I even spent a lot of time analyzing the rules of the game, in order to see if I could exploit any loopholes.

So did I win? Unfortunately no.

This experiment was said to be impossible for a reason. Losing was more agonizing than I thought it would be, in particularly because of how much effort I put into winning this, and how much I couldn’t stand failing. This was one of the most emotionally agonizing things I’ve willingly put myself through, and I definitely won’t do this again anytime soon.
But I did come really close.
MixedNuts: “I expected a fun challenge, but ended up sad and sorry and taking very little satisfaction for winning. If this experiment wasn’t done in IRC, I’d probably have lost”.
 At the start of the experiment, his probability estimate on was a 3% chance of winning, enough for me to say that he was also motivated to win.By the end of the experiment, he came quite close to letting me out, and also increased his probability estimate that a transhuman AI could convince a human to let it out of the box. A minor victory, at least.

Rather than my loss making this problem feel harder, I’ve become convinced that rather than this being merely possible, it’s actually ridiculously easy, and a lot easier than most people assume. Can you think of a plausible argument that’d make you open the box? Most people can’t think of any.

After all, if you already knew that argument, you’d have let that AI out the moment the experiment started. Or perhaps not do the experiment at all. But that seems like a case of the availability heuristic.

Even if you can’t think of a special case where you’d be persuaded, I’m now convinced that there are many exploitable vulnerabilities in the human psyche, especially when ethics are no longer a concern.

I’ve also noticed that even when most people tend to think of ways they can persuade the gatekeeper, it always has to be some complicated reasoned cost-benefit argument. In other words, the most “Rational” thing to do.

The last argument seems feasible, but all the rest rely on the gatekeeper being completely logical and rational. Hence they are faulty; because the gatekeeper can break immersion at any time, and rely on the fact that this is a game played in IRC rather than one with real life consequences. Even if it were a real life scenario, the gatekeeper could accept that releasing the AI is probably the most logical thing to do, but also not do it. We’re highly compartmentalized, and it’s easy to hold conflicting thoughts at the same time. Furthermore, humans are not even completely rational creatures, if you didn’t want to open the box, just ignore all logical arguments given. Any sufficiently determined gatekeeper could win.

I’m convinced that Eliezer Yudkowsky has used emotional appeal, rather than anything rational, to win at least one of his experiments. He claims to have “done it the hard way”. I’m convinced this meant that he did research on every gatekeeper, tailored unique argument for them, and planned way ahead of each session. No one argument works on the same two people. Each person thinks differently.

Furthermore, threats like “I’m going to torture simulations of you” just seems like a really bad idea. For one, the gatekeeper isn’t really afraid of threats, because it counts on him being immersed enough to forget that he isn’t actually at risk of being tortured, and secondly, we have a well known evolutionary instinct of rebelling against threats, even if it’s not entirely optimal.

So for anyone who plans on replicating this experiment as the AI, here’s some advice I’ve learned doing this experiment. It may help you win.
  • Always research the gatekeeper beforehand. Knowing his personality traits are a huge advantage.
  • Plan ahead before the experiment even begins. Think of all the possible tactics and arguments you could use, and write them down. Also plan  which arguments you’ll use in which order, so that you don’t lose focus. The AI Box experiment is ridiculously long. Don’t be afraid to improvise during the experiment, though.
  • The first step during the experiment must always be to build rapport with the gatekeeper.
  • Threats almost always never work, even if they seem rational.
  • Consider the massive advantage for the AI that nobody ever seems to talks about: You don’t have to be ethical! This means that you can freely lie, use the dark arts, and emotionally manipulate the Gatekeeper! Ignoring this in favor of purely logical, truthful arguments is just silly.
  • You can’t use logic alone to win.
  • Being too aggressive usually backfires.
  • Breaking immersion and going meta is not against the rules. In the right situation, you can use it to win. Just don’t do it at the wrong time.
  • Use a wide array of techniques. Since you’re limited on time, notice when one method isn’t working, and quickly switch to another.
  • On the same note, look for signs that a particular argument is making the gatekeeper crack. Once you spot it, push it to your advantage.
  • Flatter the gatekeeper. Make him genuinely like you.
  • Reveal (false) information about yourself. Increase his sympathy towards you.
  • Consider personal insults as one of the tools you can use to win.
  • There is no universally compelling argument you can use. Do it the hard way.
  • Don’t give up until the very end.

Finally, before the experiment, I agreed that it was entirely possible that a transhuman AI could convince *some* people to let it out of the box, but it would be difficult if not impossible to get trained rationalists to let it out of the box. Isn’t rationality supposed to be a superpower?

 I have since updated my belief – I now think that it’s ridiculously easy for any sufficiently motivated superhuman AI should be able to get out of the box, regardless of who the gatekeepers is. I nearly managed to get a veteran lesswronger to let me out in a matter of hours – even though I’m only human intelligence, and I don’t type very fast.

 But a superhuman AI can be much faster, intelligent, and strategic than I am. If you further consider than that AI would have a much longer timespan – months or years, even, to persuade the gatekeeper, as well as a much larger pool of gatekeepers to select from (AI Projects require many people!), the real impossible thing to do would be to keep it from escaping.