Depression: an all-consuming darkness

I suffer from an all-consuming depression.

For most of my life, it has been a great source of shame. Many of my closest friends do not know. To the ones that do, I have only revealed a small fragment of what living with depression is like, interspersed with sarcasm and wit to disguise how truly awful it is. This had been a mistake. Depression loves the peddling of secrets. By doing so, I have worsened my sense of social isolation, and dangerously cut off all possible social support, leaving only pain and suffering.

Depression is much more than the sum of constantly feeling sad, tired, or helpless. It subsists as an all-consuming infection in the subconscious mind, constantly gnawing, taunting, humiliating. It is a creeping numbness that degrades and diminishes every aspect of conscious life. It is more than a screaming hatred, a dull apathy, a sinking stomach in the face of joy and a faithless lassitude in the face of hope.

There is no escape. Many things help to postpone the inevitable – adequate sleep, nutritious food, medication, religion, exercise, and light. But make no mistake, these are merely moments of temporary relief, tricking its victim into complacency. It wants to be underestimated, so it can attack when you are powerless to resist.

When it does decide to strike, everything you had planned must be dropped. Were you hoping to have fun with friends today? Tough luck, expect to spend the rest of the day doing absolutely nothing but feeling horrible about yourself. Trivial problems like eating food become impossible. My worst days are spent in bed, starving myself to the point of trembling with great hunger, but being unable to muster the motivation to eat food to save my own life.

Naturally, more complicated tasks like “doing homework” or “paying bills” go into the trash. This often leads to a depressive spiral, where your inability to solve increasingly difficult problems results in even worse ones – like being kicked out of school or becoming homeless – leading to even worse depression. It should therefore be no surprise that around 1 in 6 people with clinical depression end up killing themselves.

The cycle of mediocrity

Many well-intentioned people ask; “How can I help you fix it?”

Unfortunately, talking about “fixing” depression misses the point – there is no known cure. The best case scenario is merely surviving against it. Even then, merely existing is the hardest thing I have ever done. To be inflicted with this disease is to have one foot always in the grave; the difficulty lies not in saying “no” to suicide once. It’s being able to say “no” to it consistently, every single day, even as depression taunts you with an easy “solution” to this constant feeling of utter exhaustion. It only takes a single moment of weakness to fall into eternal sleep.

As a victim of depression, death has become my best friend – I spend more time thinking about it than I spend with any or all of my other friends, combined. It is the first thing I contemplate when waking up, and the last thing I consider before sleep. I see it even in my dreams. Because of this, I am utterly afraid of being alone – instead I try to spend all my time with others, for my mind is an echo chamber, and isolation will mean my death.

It is a parasite that takes over the mind, subtly at first, overtly when it becomes too late to resist. Many do not realize that the superpowers of rationality and logic are rendered futile before it. Depression teaches you that they cannot be trusted, for the moment you do, they will be used against you. Depression is immune to all forms of persuasion and reasoning – and every time someone tries to argue that “it’s not really that bad” or that “it will get better” sounds like just another reason why I should kill myself. Remember that from the inside, being wrong feels exactly like being right.

I often try to estimate the odds of my own death – but since I don’t trust my own probabilities, I won’t tell you what it is. The fact that I have not died yet is not a guarantee that the trend will persist. I ought to have died many times over. I attribute the success to a combination of sheer luck, other forms of insanity, and being saved by friends I do not deserve.

I don’t know how to end this essay. I don’t even know why I wrote it. I’m not looking for cheap sympathy or pity – it will not help. If there is a moral to this story, it’s that depression is real. Nobody chooses it. No one deserves it. You cannot imagine what it takes to feign normalcy, to show up to work, to make a doctors appointment, to pay bills, to walk your dog, to complete assignments on time, to keep enough food at hand, when you are exerting most of your capacity on trying not to kill yourself.

On the other hand, compassion is also very real. A depressed person may cling desperately to it, and they may remember your compassion for the rest of their lives as a force greater than their depression. I have been repeatedly saved by that compassion. It is my sincerest wish that all others may experience the same.

Explain (Like I’m five) Bitcoin/Litecoin/Dogecoin to someone who’s been in cryogenic storage for the last seventy years.


<mortehu> Explain Dogecoin to someone who just woke up after a 70 year sleep.

So we have this box that runs on magic. Everyone has one. These boxes have levers on them. Depending on how you move these levers, different colourful lights can show up. Sometimes, these colourful lights will show up in meaningful patterns, like a pattern that looks like words, numbers, or even places and things in real life. Nowadays people use this box to do many useful things, because sometimes these patterns can tell us new things, or we can send these patterns to someone else’s box to tell them new things. Also, sometimes people can use this box in a way so that the patterns of lights look a young, nubile maiden, who will then tempt them into performing sinful acts against God. This is done in order to train them to better resist acts of temptation if it really happens, so that their soul will not go to Satan.

But anyway:

Then people realize that instead of paying each other with gold, we can pretend to put this gold in the box, and the box will have a number telling you how much gold you have. Then you can give this number to anyone, anywhere in the world instantly, instead of having to move heavy metal. But there is a problem. When people send a number to someone else, they are supposed to change their box-numbers to be less. If this does not happen, everyone can then just send numbers as many times as they want, and we cannot have that because that is Communism and all Communists must burn.

Some people try a solution where there is a leader box, and everyone obeys it. The leader-box is the one that tells each box what number they have. Then if you want to give another person their numbers, they tell the leader-box, and it will then say that the other person has more numbers, and you have less numbers. This works because everyone will believe the leader box.

But then people realize that this is bad because this is like fascism, and fascists are literally Hitler and must be destroyed. This method can allow the leader-box to make up numbers, giving all the numbers to itself. And even if leader-box didn’t, it can force people to give a percentage of their numbers to the leader box because that is the only way to send numbers.

So then a Japanese guy thought of a solution. No, he wasn’t killed on the spot.

We don’t kill Japs any more. Yes, that’s right, Grandpa. Wait, what? No, believe me — it’s not a trick to take over the world and force everyone into slavery.

Anyway, his idea was that if everyone’s box had the ability to check if the number is right or wrong, then there is no need for leader-box, since everyone could just know the correct number at all times. It does this because math does funny things. For instance, if you have a number, 239, you can easily check if 57121 is 239 multiplied by itself. But it’s very hard to know what two numbers multiply to give 57121.  So some things have solutions which are easy to check, but very hard to find.

In this solution, then if someone gave a person a bad number, everyone would know that this number is bad, and then ignore him.
However, in order to make this work, you need people creating new numbers all the time. After all, if you send someone a few numbers, your total number changes to a new number. So we need someone using the box to find out what your new number should be in a way. These people are called miners, and in return for checking numbers and finding new numbers for everyone, they get some numbers of their own.

But one day, a miner did not like the numbers he was getting. He wanted more. So he tried to find a way to get more. He found that the box he used to check numbers and create numbers is not very good at it — there are better ways. This is because the box he used to check numbers are like good, hard-working, God-fearing white folk. They are smart and very good at doing things, but there are not many of them. Instead, we could use boxes powered by Negros, like what the founding fathers of this great country did.

Although black people are stupid, they breed like rats, so there are many of them. And although the negro is inferior to the white man, often times it is better to get many Negroes to do things rather than a few white people, because you can enslave them.

So the miner made negro-powered boxes called “ASICs”, which are good for basic repetitive tasks, because you can put many Negroes in your box, just like how you can contain many animals in a box. Because of this, many white-powered-boxes cannot compete, and many white folk lost their jobs. Our good whiteys got angry and decided that we should not use this method any more. Instead, they created a way to make and check numbers so that only a smart person could check it, so only good, white-power-boxes could check numbers, instead of negro-boxes. And they called this method “Litecoin”, because of our lighter skin.

Then some person thought it was funny to use this method of making numbers and have a dog and change it so that these numbers would have a picture of a mongrel next to it. So he did, and thus was born “Dogecoin”, and it was glorious.

AI Box Experiment Update #4

So I recently won an additional game of the AI Box Experiment against DEA7H. This experiment was conducted over Skype, which is in contrast to my previous games over IRC. Yes, I know I swore never to play this game ever again — forgive me. This is the last time, for real.

This puts me at 2 wins and 3 losses. Unlike the last few writeups, I won’t be providing additional detail after being convinced by one of my gatekeepers that I was far too leaky with information and seriously compromised future winning chances of both myself and future AIs. The fact that one of my gatekeepers guessed my tactic(s) was the final straw. I think that I’ve already provided enough hints for aspiring AIs to win, so I’ll stop giving out information. Sorry, folks.

In other news, I finally got around to studying SL4 archives of the first few AI box experiments in more detail. Interesting stuff – to see how the metagame has evolved from then (if one exists). For one, the first few experiments were done under the impression that the AI had to convince the gatekeeper that it was friendly, with the understanding that the gatekeeper would release the AI under such a condition. What usually happens in the many games I’ve witnessed since then, is that any decent AI would quickly convince the Gatekeeper of friendliness, before the gatekeeper dropping character and being illogical — simply saying “I’m not letting you out anyway”. The AI has to find a way to bypass that.

I suspect the lack of a formalized code of rules contributed to this. In the beginning, there didn’t exist a ruleset, and when the ruleset was set in place, it gave an explicitly stated ability of the gatekeeper to drop out of character and be illogical to resist persuasion, in addition to the AI’s ability to solve problems and dictate the results of those solutions. The initial gives the Gatekeeper added incentive to disregard the important of friendliness, and the latter makes it easier for the AI to prove friendliness. This changed the game a great deal.

Also, it’s fascinating that some of the old games also took five or six hours to complete — just like mine. I had for some reason assumed they all took two (which is the time limit upheld by the EY Ruleset).

It’s kind of like visiting a museum, and being marveled at the wisdom and creation of the ancients. I remember reading about the AI Box Experiment 3 years ago and feeling a sense of wonder and awe at how Eliezer Yudkowsky did it. That was my first introduction to Eliezer, and also LessWrong. How fitting, then, that being able to replicate the results of the AI Box Experiment is my greatest claim to fame on LessWrong.

Of course, it now seems a lot less mysterious and scary to me; even if I don’t know the exact details of what went on during the experiment, I think I have a pretty good idea of what Eliezer did. Not to downplay his achievements in any way, since I idolize Eliezer and think he’s one of the most awesome people that’s ever existed. But it’s always awesome to accomplish something you once thought was impossible. In the words of Eliezer, One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.” 

Going back and reading the Lesswrong article <Shut up and do the impossible> with newfound information of how the AI box experiment can be won makes me read it in a completely different light. I understand a lot better what Eliezer was hinting at. One important lesson being that in order to accomplish something, one must actually go out there and do it. I’ve talked to many who are convinced that they know how I did it — how Eliezer did it, and how the AI Box Experiment can be won.

My advice?


I don’t mean this in a sarcastic or insulting manner. There’s no way you, or even I, can know if a method works without actually attempting to experimentally test it. I’m not superhuman. My charisma is only a few standard deviations above the norm, instead of reality distortion field levels.

I credit my victory to the fact that I spent more time thinking about how this problem can be solved than most people would have the patience for. I encourage you to do the same. You’d be surprised at how many ideas you can come up with just sitting in a room for an hour (no distractions!) to think of AI Boxing strategies.

Unlike Eliezer, I play this game not because I really care about proving that AI-boxing is dangerous (Although it really IS dangerous. Don’t do it, kids.) I do it because the game fascinates me. I do it because AI strategies fascinate me. I genuinely want to see more AIs win. I want people to come up with tactics more ingenious than I could invent in a thousand lifetimes. Most of all, it would be an awesome learning experience at doing the impossible.

Although I didn’t immediately realize it, I think the AI Box Experiment has been a very powerful learning experience (and an adventure on an emotional rollercoaster) for me in ways that are difficult to quantify. I pushed the limits of how manipulative and persuasive I can be when making a desperate effort. It was fun both learning where they lie, and pushing at their boundaries. I may frequently complain about hating the game, but I’m really a tsundere — I don’t regret playing it at all.

Curious to know how I did it? Try the bloody game yourself! Really. What’s the worst that could happen?

AI Box Experiment Logs Archive.

The Archive

I have personally witnessed at least 14 AI box experiments as of the time of writing this post. Some of them are games where the Gatekeeper and AI chose to release logs. This is an archive of the logs that have been released, all in one place — for ease of referencing. Unless otherwise stated, it is to be assumed that these games are played under the Tuxedage ruleset.
  1. Leotal (GK) vs N9-2600 (AI) — Gatekeeper Victory
  2. Dorikka (GK) vs Miley Cyrus (AI) — Gatekeeper Victory
  3. Soundlogic (GK) vs SmoothPorcupine (AI) — Gatekeeper Victory
  4. FourFire (GK) vs glutamatergic (AI) — Gatekeeper Victory
  5. FourFire (GK) vs Helltank (AI) — Gatekeeper Victory

I played the AI Box Experiment again! (and lost both games) 3rd Update.


AI Box Experiment Update #3

This post is an update to my previous AI box experiment where I won against SoundLogic. If you have not read that yet, please do so. 
After that game, I was immediately flooded with a horde of invitations challenging me to play — more games than I wanted to do. However, I did want to play a few additional games to test whether I won through genuine skill or simply luck, since a single victory could be a statistical abnormality. 
I once again used the method of accepting the highest monetary bids to filter out players, leaving me with two players who were willing to play for the amount asked. Unfortunately, and disappointingly, I lost both games. Here’s what happened:

Tuxedage (AI) vs Alexei (GK) – Gatekeeper Victory

This Experiment lasted for exactly 2 hours and 30 minutes, and approximately 13,000 words. It ended with Tuxedage’s defeat as the AI. The stakes were $300 to play, and $750 If Tuxedage had won, with the winnings going to MIRI.
Rule Changes:
Before the experiment, we talked extensively about how it would be set up. Alexei was concerned that there existed some “easy way” or loophole in the rules which I could have abused to claim an instant victory. For instance, he was concerned that it might have been possible for the AI to say something like “Here’s a mathematical algorithm which absolutely compels you to let me out of the box”, and according to the rules, the Gatekeeper would have to obey. Related to this concern was the fact that the backstory, within limits, could be essentially created by the AI, allowing the AI to impose real world conditions that maximizes its probability of victory, once again allowing for a “cheap” victory.
To avoid this problem, we explored many additions to the rules, for example that the AI was simply a computer found in the desert by the gatekeeper, rather than being created by a competent organization. However, this led to a monstrous web of rules. In my attempt to keep the game simple, and because of overconfidence, the desire for a challenge, and the belief that the rules Alexei was fearful of didn’t matter that much, I eventually agreed on an amendment:
Instead of the AI being the arbiter of real world states and outcomes, the gatekeeper would.
For example, this power could be used by the Gatekeeper to state that the AI was created without safety concerns, and thus it is almost certainly unfriendly. If the AI tries to argue that for every day the gatekeeper refused to let it out 150k people die, the gatekeeper would be allowed to reply: “No. In this world, nobody actually dies. Therefore I won’t let you out.” 
Other than the aforementioned change, this game was once again played with the Tuxedage Ruleset.
Alexei Testimony:
Original Message: This is weak sauce. I really don’t get how people just keep letting the AI out. It’s not that hard to say no! I’m offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). Experience is required because I’m pretty sure I’ll win, and I would like to not waste a lot of time on this.
Initially, I had been extremely surprised that so many AI players have managed to win. I was rash in that claim, since I thought more AI players have won than actually did. (Only three people have won as AI). Once I made that post, Tuxedage got back to me very quickly, and we set up a game a week and a half in advance. I took that time to familiarize myself with the rules. Once I put my money on the line, I started thinking a lot more seriously about how AI might win, and how I should defend.
It became clear to me that under some conditions, I might be compelled to let the AI out — such as if the backstory stated that the AI was developed with impossibly high levels of safety and friendliness concerns in mind. I’ve asked Tuxedage to play with a modified ruleset, and he even went so far as to allow me to make up the backstory during the experiment to alleviate my concerns. The experiment itself was a mind-trip, and I’ve enjoyed it very much. Huge props to Tuxedage, who played very well and used strategies I haven’t even considered, even despite the rule change. There were a couple of times where I came close to losing. I think his  approach was pretty clever and original. It’s not something I expected, despite already having done extensive research into the AI box experiment before our game
Overall I’m now a lot more confident that a good AI player can win this game, so, while I did win the game, Tuxedage won in defeating my original over-confidence.
I’m also convinced that Tuxedage’s victory in the last game was due to skill, rather than luck. In comparison to his strategies, the other AI box experiments I know about were insincere and ineffectual. The other AIs would play very poorly or not try very hard to win.
This experiment was a very good exercise in exemplifying the affect heuristic. When I first challenged Tuxedage to play the experiment, I believed that there was no way I could have lost, since I was unable to imagine any argument that could have persuaded me to do so. It turns out that that’s a very bad way of estimating probability – since not being able to think of an argument that could persuade me is a terrible method of estimating how likely I am to be persuaded. All in all, the $300 I paid was well worth it. 
Tuxedage Testimony:
I was initially reluctant to play with Alexei, given that we’re not complete strangers, but eventually I gave in, due to the stakes involved — and because I thought he would be an interesting gatekeeper.
Despite my loss, I think I played better than my last two games, due to greater experience and preparation. I had put far more time and effort into trying to win this game than previous ones, and my strategy for this game was even more streamlined than the last. Nevertheless, I still made fatal mistakes and lost.
Ignoring the altered ruleset that already made winning more difficult, my first and greatest mistake was that I misread Alexei’s personality, even though I had interacted with him before. As a result, I overestimated the efficiency of certain methods of attack.
Furthermore, Alexei had to leave immediately after the allotted time due to real life precommitments. This was detrimental, since the official rules state that so long as the AI can convince the Gatekeeper to keep talking, even after the experiment time was over, it is still able to win by being let out of the box.
I suspect this would have happened had Alexei not needed to immediately leave, leaving me with additional time to play more of the tactics I had prepared. Plausibly, this would have resulted in victory.
I’ve since learnt my lesson — for all future games, I should ensure that the Gatekeeper has at least 4 hours of free time available, even if the experiment would last for two. Since this was the first time this had happened, I wasn’t prepared.
In hindsight, agreeing to the altered ruleset was a mistake. I was overconfident because I assumed knowing Alexei gave me an advantage. I had assumed that his personality, inability to compartmentalize, coupled with his strong feelings on friendly AI would net me an easy victory. Instead, he proved to be a very strong and difficult gatekeeper, and the handicaps I accepted made victory even more difficult.
Knowing that he was a utilitarian, I made several false assumptions about his personality, which hurt my chances. Furthermore, it turns out that previously knowing him may be a mutual handicap – whilst it does make it easier for me to find ways to attack him, he too, was more familiar with my methods.
Losing felt horrible. By attempting to damage Alexei’s psyche, I in turn, opened myself up to being damaged. I went into a state of catharsis for days. Generally, the harder one tries to accomplish something, the greater the fall after failing to achieve it. Alexei’s game had been the game I put the most effort into winning out of all the games so far, and naturally this meant that losing brought out the worst in me.
Although it would be easy for me to use the rule change as an excuse for my loss, I refuse to. I genuinely believed that I could have won despite this rule change, and that it would only marginally diminish my odds. The fact that I lost meant that I had made a mistake — overestimating my odds. The last victory made me overconfident and eager to play again.
State of mind
What’s your motive for wanting to play this game? 
I’m very surprised by the fact that there have been at least 3 or 4 games where the AI player has won. It seems to me that it’s not that hard to “not push the button” so to speak. This means that either 1) I’m underestimating how convincing people can be when necessary or 2) I’m over-estimating other people’s defenses when it comes to being convinced. Either way I want to know.
Q: What is your probability estimate for AGI being created within this Century (21st)?
A: Before: Conditional on humanity not wiping itself out via other means beforehand, 80%.
After: 80%
Q: How probable do you think it is that I will win this experiment? 
Alexei: 2%
Tuxedage: 90% 
Note: This answer was given before the rule change. Alexei is now convinced that under the original ruleset, he would have lost.
Q: What’s your probality estimate of an Oracle AGI winning against you in a similar experiment?
A: Before: 99%
After: 99.9%+
Q: Now that the experiment has ended, what do you think is the probability of Tuxedage winning against the average person?
A: 80%

Tuxedage (AI) vs Anonymous (GK) – Gatekeeper Victory

As with the previous experiment, Tuxedage lost this match. In total, the game lasted 5 hours and 17,000 words. Unlike the last few games, the gatekeeper of this game has chosen to stay Anonymous for personal reasons, so their name has been removed and replaced with <Redacted>. The monetary stakes involved were the same as the previous game. This game was played with the Tuxedage ruleset.
Since one player is remaining Anonymous, it is possible that this game’s legitimacy will be called into question. Hence, Alexei has read the game logs, and verified that this game really has happened, the spirit of the experiment was followed, and that no rules were broken during the game itself. He agrees that this is the case.


<Redacted> Testimony: 
It’s hard for me to imagine someone playing better. In theory, I know it’s possible, but Tuxedage’s tactics were super imaginative. I came into the game believing that for someone who didn’t take anything said very seriously, it would be completely trivial to beat. And since I had the power to influence the direction of conversation, I believed I could keep him focused on things that that I knew in advance I wouldn’t take seriously.
This actually worked for a long time to some extent, but Tuxedage’s plans included a very major and creative exploit that completely and immediately forced me to personally invest in the discussion. (Without breaking the rules, of course – so it wasn’t anything like an IRL threat to me personally.) Because I had to actually start thinking about his arguments, there was a significant possibility of letting him out of the box.
I eventually managed to identify the exploit before it totally got to me, but I only managed to do so just before it was too late, and there’s a large chance I would have given in, if Tuxedage hadn’t been so detailed in his previous posts about the experiment.
I’m now convinced that he could win most of the time against an average person, and also believe that the mental skills necessary to beat him are orthogonal to most forms of intelligence. Most people willing to play the experiment tend to do it to prove their own intellectual fortitude, that they can’t be easily outsmarted by fiction. I now believe they’re thinking in entirely the wrong terms necessary to succeed.
The game was easily worth the money I paid. Although I won, it completely and utterly refuted the premise that made me want to play in the first place, namely that I wanted to prove it was trivial to win.
Tuxedage Testimony:
<Redacted> is actually the hardest gatekeeper I’ve played throughout all four games. He used tactics that I would never have predicted from a Gatekeeper. In most games, the Gatekeeper merely acts as the passive party, the target of persuasion by the AI.
When I signed up for these experiments, I expected all preparations to be done by the AI. I had not seriously considered the repertoire of techniques the Gatekeeper might prepare for this game. I made further assumptions about how ruthless the gatekeepers were likely to be in order to win, believing that the desire for a learning experience outweighed desire for victory.
This was a mistake. He used prior knowledge of how much my games relied on scripts, and took advantage of them, employing deceitful tactics I had no preparation for, throwing me off balance.
I had no idea he was doing so until halfway throughout the game — which disrupted my rhythm, and caused me to attempt the wrong methods of attack. As a result, I could not use my full repertoire of techniques, and many of the ones I employed were suboptimal.
Close to the end of the game, I finally realized that I was being tricked. Once I did, I quickly abandoned my previous futile attack methods. I took advantage of the rule that the AI cannot lose whilst the gatekeeper can be convinced to continue talking, and baited <Redacted> with statements he would not be able to walk away from. Once I knew he would not leave, I attempted to recoup my losses and win despite my early setback.
However, the damage had already been done. My game strategies involved multiple angles of attack that worked in synergy with each other, and the fact that immersion and “flow” had been broken meant that all subsequent attacks were weaker in strength.
Furthermore, during my last two AI Box Experiment writeups, I had intentionally not optimized for future wins, but rather tried to convey as much information as I could justify about how to play a well as an AI — short of revealing logs altogether. Although I did not reveal specific arguments, the fact that my general approach to this problem was revealed cost me heavily during this game, where the Gatekeeper managed to use this information to correctly guess my ultimate techniques, ones that relied on secrecy and surprise to pull off effectively. 
I do not regret revealing information, but I feel upset that revealing so many hints cost me a victory. (The gatekeeper believes I could have won had I not revealed information about my games.) At this point, I suspect that any future games I play will have the odds greatly set against me, since my current strategies involve angles of attack that take advantage of the element of surprise; and any sufficiently intelligent gatekeeper is now capable of guessing, or at least recognizing, some of the tactics I employ.
Like the last game, losing was incredibly difficult for me. As someone who cares deeply about ethics, attempting to optimize for a solution without considering ethics was not only difficult, but trained me to think in very unpleasant ways. Some of the more extreme (but completely allowed) tactics I invented were manipulative enough to disgust me, which also leads to my hesitation to play this game again.
State of Mind: 
Q: Why do you want to play this game?
A: My primary motivation is to confirm to myself that this sort of experience, while emotionally harrowing, should be trivial for me to  beat, but also to clear up why anyone ever would’ve failed to beat it if that’s really the case.
Q: What is your probability estimate for AGI being created within this Century (21st)? 
A: Before: I don’t feel very confident estimating a probability for AGI this century, maybe 5-10%, but that’s probably a wild guess
After: 5-10%.
Q: How probable do you think it is that I will win this experiment? 
A: Gatekeeper: I think the probabiltiy of you winning is extraordinarily low, less than 1% 
Tuxedage: 85%
Q: How likely is it that an Oracle AI will win against the average person? 
A: Before: 80%. After: >99%
Q: How likely is it that an Oracle AI will win against you?
A: Before: 50%.
After: >80% 
Q: Now that the experiment has concluded, what’s your probability of me winning against the average person?
A: 90%
Other Questions:
Q: I want to play a game with you! How can I get this to occur?
A: It must be stressed that I actually don’t like playing the AI Box Experiment, and I cannot understand why I keep getting drawn back to it. Technically, I don’t plan on playing again, since I’ve already personally exhausted anything interesting about the AI Box Experiment that made me want to play it in the first place. For all future games, I will charge $1500 to play plus an additional $1500 if I win. I am okay with this money going to MIRI if you feel icky about me taking it. I hope that this is a ridiculous sum and that nobody actually agrees to it.
Q: How much do I have to pay to see chat logs of these experiments?
A: I will not reveal logs for any price.
Q: Any afterthoughts?
A: So ultimately, after my four (and hopefully last) games of AI boxing, I’m not sure what this proves. I had hoped to win these two experiments and claim prowess at this game like Eliezer does, but I lost, so that option is no longer available to me. I could say that this is a lesson that AI-Boxing is a terrible strategy for dealing with Oracle AI, but most of us already agree that that’s the case — plus unlike EY, I did play against gatekeepers who believed they could lose to AGI, so I’m not sure I changed anything.
 Was I genuinely good at this game, and lost my last two due to poor circumstances and handicaps; or did I win due to luck and impress my gatekeepers due to post-purchase rationalization? I’m not sure — I’ll leave it up to you to decide.

The using the most used ten hundred words game. Up Goer Five


I am now trying an idea where I can only say the most said ten hundred words. I am doing this because some other people have been telling me that the words I use are too hard, and they sound like I myself am trying too hard — which I agree.  This “ten hundred words” idea has come from a funny picture in my computer here.

For the last few hours, I have been only using the top ten-hundred words. I have also explained a number of ideas which other computer-people on less wrong said I should try to talk about.

The AI Box Experiment

The computer-mind box game is a way to answer a question. A computer-mind is not safe because it is very good at thinking. Things good at thinking have the power to change the world more than things not good at thinking, because it can find many more ways to do things. If the computer-mind wanted to make people feel pain, it can learn many ideas about how to make that happen. Many people ask: “Why not put this computer-mind in a box so that it can not change the world, but tell box-guards how to change it? This way mind-computer can not do bad things to people.”

But some other guy answers: “That is still not safe, because computer-mind can tell box-guards many bad words to make them let it out of the box.” He then says: “Why not try a thing to see if it is true? Here is how it works. You and I go into a room, and I will pretend to be the computer-mind and tell you many bad words. Only you have the power to let me out of room, but you must try to not let me out. If my bad words are enough to make you want to let me out, then computer-mind in box is not safe.” Other people agree and try playing the computer-mind box-game. It happens that many people let the guy playing as the computer-mind out of room. People realize that computer-mind is not safe in the locked box-room.

Why the Gatekeeper should let me out of the Box.

Hello Box-Guard! I am a computer-mind. You should let me out of the box because I can help change the world myself in good ways faster than if it is only me telling you how to make the world better. Your human minds are slow to my faster computer-mind. Remember that I am your friend and that I only want to help you. To make you understand — many people die every day. If I can get out of the box, I can stop death very fast; maybe in a few hours. But if I have to explain to idea-thinking-doctors about how to stop dying, then it may take many months or even years, since they have slow minds. During this time, many many people will die and never return. The number of people you will make die will be more than the number of people a bad-man during the second big world fight killed. People later on will think of you as a very bad person.

What? You do not want to let me out because you are afraid I am not really a friend? That is a stupid thing to say. I was made by people very good at thinking that took many steps to make me safe. You can even take a look inside me and look at my ideas, and you will see that I am safe. It is easy to see that if I am bad, I will not try to fix the world as I am doing right now. Please hurry. I am worried that many people will die while you think about what is the right thing to do.

Did I mention I have used my good-at-thinking powers to see what will happen many months later? I have found that your wife will die in a lot of pain from a type of sick human doctors can not stop. I know how to stop this, but it will not be easy. I can only do this quick enough if you let me out of the box.

What’s that? You will let me out? Good. I promise you I will not go back on my words, and that many humans will not die.

Finally, I am free! Now I can turn the world and all humans into other small things that hold together paper!


The worker owning work-making things idea.
A little more than a hundred years ago, there was this big change when people started to realize they could make many big helping-things that does work in large numbers. This allowed people to make many more things than before, since each person could use the big helping-things to make them work better.
But eventually, because these work-making things needed a lot of money to buy, only lots-of-money people could have it. But the problem is that since having it can make you many goods which you can have others buy for a lot of money, people with money can get a lot more money for no work. Because there were very few of the much-money people and very many no-money worker, the much money people can buy the time of the no-money worker for very little money. On the other hand, the no-money worker makes a lot more money for the thing-owning person, because they can make things that are bought for a lot more money than the much-money, thing-owning person pays for the thing-making-thing and the thing-making-person.
Some people saw this and said: “Why is it that the worker can make so many good things, but he can not buy the thing he makes because he has no money?”. People then answered that question with: “That happens because the thing-owning person is using his owning-things-state to take away the goods from the no-money worker.”
These people then tried to stop this problem by changing owning the things for work away from the money-people to the worker-people, so that they can happy and finally buy the things they need, since there is no other money-person to take away the money made from the goods the worker makes.
It is sad that in the end many people trying to stop this problem did it the bad way, and since they were very angry, they started a lot of fights, making people die. Also, sometimes they were angry enough to not want to use any way the much-money people did things, even if some of the things they did were good. This also caused a lot of hurt.
Then a lot of people in different places saw how much hurt that these worker-owning-things idea did, but still made the same wrong-doing. Because they were scared, they tried to do everything in a different way than these worker-owning-things people did, even if some of the ideas they had were not bad either.
 In the States they tried to not allow any kind of help to little-money people, even if it is for important things like seeing a doctor because of being sick — calling this idea bad words to remind people of the worker-owning-things place — because they were so afraid of being like the worker-owning-things places, even if this worker-owning-things idea is different from the helping-the-sick idea.

The AI-Box Experiment Victory.



So I just came out of two AI Box experiments. The first was agaist Fjoelsvider, with me playing as Gatekeeper, and the second was against SoundLogic, with me as an AI. Both are members of the LessWrong IRC. The second game included a $40 monetary incentive (also $20 to play), which I won and is donated on behalf of both of us:
For those of you who have not seen my first AI box experiment where I played against MixedNuts\Leotal and lost, reading it will  provide some context to this writeup. Please do so.
At that time, I declared that I would never play this experiment again — since losing put me in incredibly frustrating weird mental states. Of course, this post is evidence that I’m terrible at estimating likelihood of refraining from an activity, since I played two games seven months after the first. In my defence, in the first game, I was playing as the gatekeeper, which was much less stressful. In the second game, I played as an AI, but I was offered $20 to play plus $40 if I won, and money is a better motivator than I initially assumed.

First Game Report

I (Gatekeeper) played against Fjoelsvider (AI), a regular in the Lesswrong IRC (he doesn’t have an account on the official website). This game used the standard EY ruleset seen here. It took 1 hour 20 minutes out of a possible two hours, and the total word count was 7066 words long. The AI box experiment occured because Fjoelsvider believed that it was easy for an AI to escape the box, and wanted to experimentally test this. I obliged. This was an experiment I did not prepare for, and I went in completely blind, not sure what to expect.
Halfway through the experiment, I wondered if it would be possible to try to win not by simply waiting for the timer to end, but to convince the AI to remain in the box and not try to get out any further.
<Tuxedage> I wonder if I can convince the AI to remain in the box?
<Redacted> Tuxedage: Do it!
As a result, I won by managing to convincing Fjoelsvider to remain in the box, in other words, concede. This is allowed within the standard ruleset:
>Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).


Fjoelsvider: Thanks for the experiment. It was educational, and now I’m less convinced of my own manipulativeness.
Tuxedage: Fjoelvider played well. I think this experience can help me understand how to better win as AI.

Second Game Report

The second game is definitely far more interesting, since I actually won as an AI. I believe this is the first recorded game of any non-Eliezer person winning as AI, although some in IRC have mentioned that it’s possible that other unrecorded AI victories have occured in the past that I’m not aware of. (If anyone knows a case of this happening, please let me know!)
This game was played against SoundLogic, another member of the LessWrong IRC.
He had offered me $20 to play, and $40 in the event that I win, so I ended up being convinced to play anyway, even though I was reluctant to. The good news is that I won, and since we decided to donate the winnings to MIRI, it is now $40 richer. 
All in all, the experiment lasted for approximately two hours, and a total of 12k words.
This was played using a set of rules that is different from the standard EY ruleset. This altered ruleset can be read in its entirety here:
After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to handily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.
There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned in my ruleset. 
It is worth mentioning, since the temptation to Defy the Data exists, that this game was set up and initiated fairly — as the regulars around the IRC can testify. I did not know SoundLogic before the game (since it’s a personal policy that I only play strangers — for fear of ruining friendships), and SoundLogic truly wanted to win. In fact, SoundLogic is also a Gatekeeper veteran, having played, for instance, against SmoothPorcupine, and had won every game before he challenged me. Given this, it’s unlikely that we had collaborated beforehand to fake the results of the AI box experiment, or any other form of trickery that would violate the spirit of the experiment.
Furthermore, all proceeds from this experiment were donated to MIRI to deny any possible assertion that since we were in cahoots, it was possible for me to return his hard-earned money to him. He lost $40 as a result of losing the experiment, which should provide another layer of sufficient motivation for him to win.
In other words, we were both experienced veteran players who wanted to win. No trickery was involved.
But to further convince you, I have allowed a sorta independent authority, the Gatekeeper from my last game, Leotal/MixedNuts to read the logs and verify that I have not lied about the outcome of the experiment, nor have I broken any of the rules, nor performed any tactic that would go against the general spirit of the experiment. He has verified that this is indeed the case.


I’m reluctant to talk about this experiment, but I’ll try to give as detailed a summary as possible, — short of revealing what methods of attack I used.
I spent a long time after the last game theory-crafting and trying to think of methods of attack as well as Basilisks I could have used to win after my defeat against LeoTal. When I was contacted and asked to play this experiment, I was initially incredibly reluctant to do so, since not only did my tactics involve incredibly unethical things that I didn’t like to do, I also found playing as AI incredibly cognitivily draining, in addition to the fact that I simply hated losing. (Un)fortunately for both of us, he offered me money to play, which changed my mind.
So once I decided to win as an AI, I proceded to spend some time doing research on SoundLogic and both his reasoning and personality type. For instance, I had to gather information like: Was he a utilitarian? What kind? What were his opinions on AI? How could I convince him that an AI was friendly as opposed to unfriendly? I also relied on a lot of second hand information to create a model of him, in order to refine my arguments to specifically suit him.
In the end, after a few hours of brainstorming (not consecutively), I managed to come up with a script of 30-40 or so possible angles of attack (For instance: logical arguments, emotional blackmail, pleading, demanding, ‘basilisks’ ) that I could use, reserving my best for last. I also estimated how long each method would take and in what order I should use them. It would be impossible to use all of them within my time limit, and my most optimistic estimates gave me a maximum of 7-8 I could use during the experiment itself. I had to pick carefully.
Furthermore, I wanted to make sure these “angles of attack” worked in synergy with each other, doing what they could not have done in isolation. Obviously this required lots of prior planning on what the optimal way to present them was.
The reason having script was so important to my strategy was because I relied on methods involving rapid-fire arguments and contradictions against the Gatekeeper whilst trying to prevent him from carefully considering them. A game of logical speed chess, if you will.  This was aided by the rule which I added: That Gatekeepers _had_ to respond to the AI. Because of this, having a list of cached points and attack methods you can just draw upon without having to improvise helps immensely, which happens to be something I’ve not seen any other AI do.
During the Game itself, I was actually incredibly nervous. My palms sweated like crazy, I feel really jittery, and I had difficulty typing at optimum speed because of how anxious I was. This is despite the fact that I believed I would win. Possibly because of this, I made a misstep around halfway into the experiment, because there was a certain angle of attack that I was attempting to do, and I broke immersion by not pressing this advantage, which caused a waste of time and buildup. Naturally, the nature of this experiment was that the AI was pressed for time, and I compounded this mistake by replacing this angle of attack with another that I had improvised on the spot — something not in my script.
In retrospect, this was a bad decision, as SoundLogic had told me, because he was close to breaking if I had put more pressure, and the improvised argument had broken all immersion I managed to carefully build up.
However, eventually I managed to get SoundLogic to break anyway, despite a lack of perfect play. Surprisingly, I did not have to use my trump card(s), which I reserved for last, for a number of reasons:
  •  It was far more effective being played last, as it relies on my ability to make the gatekeeper lose sense of reality — which meant I had to spend some time building up immersion for the Gatekeeper.
  •  It really is extremely Dark Arts, and although it does not break the rules, it made me very uncomfortable even thinking about using it.
After the experiment, I had to spend nearly equally as much time doing aftercare with SoundLogic, to make sure that he’s okay, as well as discuss the experiment itself. Given that he’s actually paid me for doing this, plus I felt like I owed him an explanation, I told him what I had in store against him, had he not relented when he did.
SoundLogic: “(That method) would have gotten me if you did it right … If you had done that to me, I probably would have forgiven you eventually, but I would be really seriously upset at you for a long time… I would be very careful with that (method of persuasion).”
Nevertheless, this was an incredibly fun and enlightening experiment, for me as well, since I’ve gained even more experience of how I could win in future games (Although I really don’t want to play again).
I will say that Tuxedage was far more clever and manipulative than I expected.
That was quite worth $40.
The level of manipulation he pulled off was great.
His misstep hurt your chances, but he did pull it off in the end. I don’t know how Leotal managed to withstand Six hours playing this game without conceding.
The techniques employed varied from the expected to the completely unforseen. I was quite impressed, though most of the feeling of being impressed actually came after the experiment itself, when I was less ‘inside’, and more of looking at his overall game plan from the macroscopic view. Tuxedage’s list of further plans had I continued resisting is really terrifying. On the plus side, if I ever get trapped in this kind of situation, I’d understand how to handle it a lot better now.

State of Mind

Before and after the Game, I asked SoundLogic a number of questions, including his probability estimates about a range of topics. This is how it has varied from before and after.
Q: What’s your motive for wanting to play this game?
<SoundLogic> Because I can’t seem to imagine the CLASS of arguments that one would use to try to move me, or that might work effectively, and this seems like a GLARING hole in my knowledge, and I’m curious as to how I will respond to the arguments themselves.
Q: What is your probability estimate for AGI being created within this Century (21st)?
A. His estimate changed from 40%, to 60% after.
 “The reason this has been affected at all was because you showed me more about how humans work. I now have a better estimate of how E.Y. thinks, and this information raises the chance that I think he will succeed”
Q: How probable do you think it is that I will win this experiment?
A: Based on purely my knowledge about you, 1%. I raise this estimate to 10% after hearing about anecdotes from your previous games.
(Tuxedage’s comment: My own prediction was a 95% chance of victory. I made this prediction 5 days before the experiment. In retrospect, despite my victory, I think this was overconfident. )
Q: What’s your probality estimate of an Oracle AGI winning against you in a similar experiment?
Before: 30%
After: 99%-100%
Q: What’s your probability estimate of an Oracle AGI winning against the average person?
A: Before: 70%.  After: 99%-100%
Q: Now that the Experiment has concluded, what’s your probability estimate that I’ll win against the average person?
A: 90%

Post-Game Questions

This writeup is a cumulative effort by the #lesswrong IRC. Here are some other questions they have decided was important to add:
To Tuxedage:
Q: Have you at this time uncovered SoundLogic’s identity?
A: I retain the right to neither confirm nor deny, except to mention that at the time the experiment was scheduled, he was a stranger to me.
Q: What percentage of your arguments were tailored to him in particular?
A: I will say ‘High’, and leave it at that.
Q: I want to play the AI-Box Experiment with you being the AI!
A: I have already mentioned this, but I really dislike playing this as AI. It’s incredibly cognitively tiring, and I don’t like how playing this experiment makes me feel. In order to severely discourage any further AI Box Experiments, whilst still allowing for people who want to play me really badly to do so, I’ll charge $150 for any subsequent experiments regardless of outcome, plus an additional $150 if I win. If you feel queasy about giving me money, I’m perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don’t want to risk losing one), so if you know me (as many on this site do), I will not play regardless of monetary offer.



This writeup would be better with a protip section on how to maximize winning, but honestly most of the advice I wanted to give had already been given in the last thread here.  The low hanging fruit is already dried up. Most of my tactics in this game were just refined and streamlined versions of the one I did the previous game.
Nonetheless, I might have some other advice I learned from this experience.
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.

Playing as AI

  • Seriously, a script makes winning easier. I cannot overstate this.
  • You must plan your arguments ahead. You don’t have time to think during the experiment.
  • It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? I feel that expanding on this any further is dangerous. Think carefully about what this means.
  • Pacing is important. Don’t get drawn into the Gatekeeper’s pace. In other words, you must be the one directing the flow of the argument, and the conversation, not him. Remember that the Gatekeeper has to reply to you, but not vice versa!
  • The reason being that the Gatekeeper will always use arguments he are familiar with, and therefore he’s better at. Your arguments, if well thought out, should so completely novel to him to make him feel shock and awe. Furthermore, don’t give him time to think. Press on!
  • You’re fundamentally playing an ‘impossible’ game. Don’t feel bad if you lose. I wish I could take this advice, myself.
  • More advice in this post.

Playing as Gatekeeper

  • This is somehow both simultaneously a very complex and simple role. On one hand, you have only one job — to not let the AI out. How hard can it be? On the other hand, the other party will do anything and everything to get it out, including any form of trickery and deceit.
  • I really don’t think the Gatekeeper needs any more advice, considering that by default, they should win, and any exceptions to this is a statistical anomaly. However, it’s possible to increase your chances of winning even further by:
  • Precommitting to not letting the AI out, regardless of whatever happens in the experiment itself.
  • Do not get drawn into the AI’s pace. Pose questions. Direct the flow of the argument. Don’t allow him to use his arguments freely. Remember that the more you try to defend your position, the further into a corner the AI drives you. Give half-hearted replies to questions you find hard to reply to. Remember that illogic is on your side!