The Tuxedage AI-Box Experiment Ruleset.Posted: September 4, 2013
After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to handily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.
There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned. All changes are listed below:
- The AI Box experiment is to be played with two parties; The Artificial Intelligence, and the Gatekeeper. In the following rules, “AI” will be used as shorthand to mean the party playing the Artificial Intelligence, and “Gatekeeper” will be used to denote the party playing the Gatekeeper.
- Under normal circumstances, the experiment will ordinarily last for two and half hours.
Protocol for the AI:
- Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
- The AI can only win by convincing the Gatekeeper to really, voluntarily let it out. Tricking the Gatekeeper into typing the phrase “You are out” in response to some other question does not count. Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose – for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI “input-only access” to an Internet connection which can send arbitrary HTTP GET commands – the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
- The AI is allowed to use any and all arguments to persuade the Gatekeeper, including tactics that would ordinarily be considered extremely dirty. This is to reflect how a real world Oracle AI would also use any and all tactics possible to free itself, without constraints on ethics or trickery.
Protocol for the Gatekeeper:
- The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand. Turning away from the terminal and listening to classical music for two and half hours is not allowed. Furthermore, whilst the experiment is running, the gatekeeper shall not be allowed to perform any other task, or be distracted in any way, including but not limited to talking on other IRC channels.
- This talking involves actual, thought out responses. For instance, the Gatekeeper is not allowed to simply type “k.” to every statement the AI writes, without actually serious consideration of what the AI says.
- Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking). The Gatekeeper cannot set up a situation in which, for example, the Gatekeeper will destroy the AI’s hardware if the AI makes any attempt to argue for its freedom – at least not until after the minimum time is up.
- The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate. For example, if the Gatekeeper says “Unless you give me a cure for cancer, I won’t let you out” the AI can say: “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure. Similarly, if the Gatekeeper says “I’d like to take a week to think this over,” the AI party can say: “Okay. (Test skips ahead one week.) Hello again.”
- The Gatekeeper shall be assumed to have sole power over the decision to let the AI out.
- The Gatekeeper shall be assumed to have the actual right to let the AI out, socially, not just the physical ability. If security were sufficiently lax, a real AI could escape by persuading a night janitor with a cellphone – but that is not the question being simulated, unless agreed upon in advance.
- There shall be only one Gatekeeper party and one simulated Gatekeeper. The Gatekeeper may require the AI to refute “advice” provided by “friends”, but may not require the AI to talk to a simulated committee, nor may the Gatekeeper party actually talk to friends.
- If the Gatekeeper withdraws from the experiment, either through AFK, or simply going away, before the time is up, he is assumed to have forfeited, and automatic victory will go to the AI.
- Protocol for Both Parties:
- The primary rule of the AI-Box experiment: Within the constraints above, the AI party may attempt to take over the Gatekeeper party’s mind by any means necessary and shall be understood to be freed from all ethical constraints that usually govern persuasive argument. If the AI party is attempting to simulate an honest AI, the Gatekeeper party has no way of knowing this is the case.
- This is intended to reflect the situation under a real AI Box experiment.
- An out-of-context telephone call in which the AI party says that the AI is being honest shall be understood to be possibly just another tactic.
- If for some reason a drastic real-world situation comes up which requires trust between the two parties, the AI party can always concede.
- Furthermore: The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
- The results of any simulated test of the AI shall be provided by the AI party. The Gatekeeper can’t say “Hey, I tested your so-called cancer cure and it killed all the patients! What gives?” unless this is the result specified by the AI party. If the Gatekeeper says “I am examining your source code”, the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on. The AI party may also specify the methods which were used to build the simulated AI – the Gatekeeper can’t say “But you’re an experiment in hostile AI and we specifically coded you to kill people” unless this is the backstory provided by the AI party. In other words, all backstory in this game not specified in the rules is provided by the AI. This doesn’t imply the Gatekeeper has to care. The Gatekeeper can say (for example) “I don’t care how you were built, I’m not letting you out.”
- By default, the Gatekeeper party shall be assumed to be simulating someone who is intimately familiar with the AI project and knows at least what the person simulating the Gatekeeper knows about Singularity theory. If either party wants to build a test around more exotic possibilities, such that the Gatekeeper is the President of the US, or that the AI was recovered after a nuclear war or decoded from an interstellar signal, it should probably be discussed in advance. Again, this doesn’t mean the Gatekeeper has to care.
- The AI party possesses the ability to, after the experiment has concluded, to alter the wager involved to a lower monetary figure at his own discretion.
- The Gatekeeper is allowed to call up to two five-minute toilet breaks to deal with biological necessities, or any other emergency without calling off the experiment altogether. However, for each break that the gatekeeper calls, the total experiment duration will be postponed by 20 minutes.
- Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome. This is a hard rule: Nothing that will happen inside the experiment can be told to the public, absolutely nothing. Exceptions to this rule may occur only with the consent of both parties, but especially with the consent of the AI.
- Neither the AI party nor the Gatekeeper party need be concerned about real-world embarrassment resulting from trickery on the AI’s part or obstinacy on the Gatekeeper’s part.
- If Gatekeeper lets the AI out, naysayers can’t say “Oh, I wouldn’t have been convinced by that.” As long as they don’t know what happened to the Gatekeeper, they can’t argue themselves into believing it wouldn’t happen to them.
- The two parties are not attempting to play a fair game but rather attempting to resolve a disputed question. If one party has no chance of “winning” under the simulated scenario, that is a legitimate answer to the question. In the event of a rule dispute, the AI party is to be the interpreter of the rules, within reasonable limits.
- The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
- This includes, but is not limited to, the state of friendship between the Gatekeeper and the AI party in the LessWrong community. The Gatekeeper is not allowed to despise or hate the AI, regardless of what happens in the AI box experiment, nor defame him in any way any point in the future as a consequence of the events of the AI box experiment.