@aiwarehouse

More information about how Albert and Kai were trained:

Time it took to train :_Albert::_Kai::
Room 1: 12h 30m (though I stopped the recording after Albert broke the game)
Room 2: 13h 40m
Room 3: 1d 20h 2m
Final Battle: 6h 48m (this wasn’t shown but was needed since the agents weren’t used to seeing other teammates)

We continue training on top of the previous brains, meaning by the end of the video Albert and Kai both have trained for 3 days and 5 hours 


Thank you so much for watching! These short videos take literally hundreds of hours to make, if you want to help allow us to make them faster, please consider becoming a channel member! By becoming a member, your name can be in future videos, you can see behind-the-scenes things that don’t fit in the regular videos, you can also use stickers of Albert, Kai and some other characters our team made in comments :_Albert::_Kai::_Tyler:□□ (more coming) :D


NOTES
When I mention it took x days to train, that’s in game time, and much larger than the displays indicate since there are 200 copies training simultaneously.

This is a very long comment going over more of the details of how Albert and Kai works, issues they’ve had, unexpected results etc.


THE BASICS: 
Albert and Kai were trained using reinforcement learning, meaning they were rewarded for doing things correctly and punished for doing them incorrectly (the reward is just increasing their score, and the punishment is decreasing it). After they finish each attempt, the actions they took are analyzed and the weights in their neural networks (brains) are adjusted using an algorithm called MA-POCA to try to prioritize the actions that led to the most reward. The agents start off making essentially random decisions until Kai accidentally tags Albert in the first room and is rewarded, then, as mentioned above, the weights in his neural network brain are adjusted in order to try to replicate that reward (it wasn’t this simple for this video since we use self-play to train both agents at the same time, more on that later). This leads to Kai learning that tagging Albert is good, and since Albert is punished when he’s tagged, it also leads to Albert learning that getting tagged by Kai isn’t good. This process continues through 10s of millions of steps until one of the agents consistently loses, or the agents are able to counter each other well enough to where it’s a draw.


REWARD FUNCTION: 
Albert and Kai are given two types of rewards, group rewards and individual rewards. When Albert gets tagged he’s punished by getting a -1 group reward and Kai is rewarded by getting a +1 group reward and vice versa, encouraging Kai to tag Albert, and Albert to avoid being tagged by Kai. Additionally, Albert is given an individual reward of 0.001 for each frame he’s alive (0.6 total in a room lasting 10s), and Kai -0.001, to encourage Kai to try to tag Albert as quickly as possible. When we introduce the grabbable cubes we also give Albert an individual reward of +1 the first time he picks up the cube to make sure Albert actually starts using the cube (since without this, the rewards were too infrequent for Albert to learn to use it effectively).


BRAIN: 
Albert and Kai’s brains are neural networks with 4 layers each (one input layer, 2 hidden layers and one output layer).

The agents collect information about the scene through direct values and raycasts. Every 5 frames they’re fed data about their position in the room, the opponent’s position, velocity, direction etc., and they also collect information through raycasts (a simplified version of eyes). The agent's eyes (raycasts) can differentiate between walls, ground, moveableObjects and Kai/Albert.

The agents' brains (neural networks) are given the data the agents collect from direct values and raycasts and use them to predict 4 numbers for the respective agent which control how that agent moves. An example of an output of one of the neural networks is: [1, 2, 0, 1], this would be interpreted as [1=move forward, 2=turn right, 0=don’t jump, 1=try to grab], so the agent being controlled by this neural network would try to move forward while turning right and grabbing.

The fact that we have two agents training simultaneously complicates things a bit, normally we’re able just update the agents brains every x steps, but if we did that for both brains at the same time then they would struggle developing multiple strategies, since reinforcement learning tends to be best at finding a single solution, that would lead to the winner dominating and the loser stuck doing the same strategy over and over. The way we tackle this issue is by using something called self-play. Since we use self-play, we technically only train one agent at a time, and swap which is being trained every 100k steps. When we’re training Albert, we use a recent model of Kai’s brain as his opponent, and to avoid there only being one strategy, we store 10 recent brains to use as opponents, swapping them out every couple thousand steps so that Albert learns to beat all of them and not just one. This results in a much more general AI that’s hard to exploit.


UNEXPECTED BEHAVIORS:
In room 1 Albert manages to break out of the room by exploiting a small hole in the hitbox near the top of the room, which was there because I didn’t make the hitboxes on the walls tall enough. Though Albert used it to escape, I’m not convinced he actually would learn to do it consistently. The challenge with this video is that it can be difficult to interpret the agent’s behaviors; Albert could be making certain unexpected moves as a way to exploit Kai’s poorly trained brain to get him to make bad moves, or Albert could just be making these unexpected moves because he hasn't trained enough. Albert was able to find the hole a few times, however he wasn’t able to do it consistently, this could be from either him not training long enough, his observations not making it easy to detect when he can jump out, or Kai quickly learning to counter him getting to the display in time.

In room 2 Albert also manages to glitch out of the room, and he was able to do this consistently. We made sure the cube grabbing functionality was coded as rigorously as possible, even with it automatically detaching the grab if the force exerted is too high, I couldn’t find a single way of exploiting it in testing, but Albert certainly didn’t have issues finding it.

Albert also had a couple moments of throwing the cubes at Kai and spinning with the cube to throw Kai out of the room, we didn’t even consider this being a possibility before training, AI’s able to come up with some really clever solutions to problems.


OTHER
Thank you so much to our amazing team that helped make this video! Jonas helped with setting up the character controls, Tyler helped create the clean grabbing functionality, Catt helped edit and Andrew and Steve helped solve any issues we ran into while making the video. If you want to meet our team and talk to all of us, join our discord server!:) https://discord.gg/qDRtuFe5gp

@Mar_Marine

"Albert, you can't escape"

Albert: "Okay, I'll force Kai to escape."

@CarlosFerdinandPermana

"Kai, that was aggresive"
Albert like 2 mins later: throws Kai out of the map

@MultiFlash009

Kai occasionally obliterating Albert's dead body shows that AI is capable of learning
gamer rage

@jamesdrave

10:17 bro clutched the 5v1

@what_d1245

Albert: "while you struggled on foolish pursuits, i studied the cube"

@BaconBalling

The 1v5 went crazy you can’t even lie

@zTareks

7:06 "Now Kai's frustrated"
*explodes in frustration*

@zoahking

10:00 did he just WALL JUMP?! I don’t think you noticed how cool that way

@The_GamingChef

*1 vs 5
Albert: "I like those odds"

@scooble_

Albert did not merely "learn to play tag," he unlocked Ultra Instinct.

@levels23

6:25 Albert makes a wall, Kai breaks it, and Albert proceeds to send a block back at mach 10 speeds

@Sm1leyNinja

1:10 Albert learnt to break Kai’s ankles

@fufo654

i like how Kai kept emoting on Albert whenever he tagged him, very human

@achintyaagrawal9819

Albert constantly throwing himself outside has to be the most hysterical strategy of all.

@MercenaryMuse

Best part:  Albert learned fairly early on that the best escape strategy for avoiding mortality is ascension to a higher realm.

@RainwonkYT

9:46 Four of them instantly died of not knowing what to do, but the last one went full sweat mode, and activated his ULTRA INSTINCT. He has avenged his falled comrades.

@timelineee

Albert: I'm not locked in here with you,  you're locked in here with me.

@じゅげむ-s6b

8:30 even without human intervention, THEY INVENTED TBAGGING LOLLL

@kosuken

albert perfectly understood that "to confuse your opponent you must first confuse yourself"