My Adventure with Machine Learning #1
In this newsletter will be talking about my adventures with creating my first AI algorithm from scratch in Unity3D.
I had always wanted to create an AI algorithm, but had no idea how they worked. Then, when I saw a video in which the person created an AI robot, which was controlled by an array of numbers. At this, I suddenly had an idea on how I could create my own machine learning algorithm, with the knowledge I had. At this time, I had no idea how neural networks worked, so I decided to start on this project.
I wanted to create an unsupervised learning algorithm, so that I could let the algorithm teach itself, rather than looking at the output and telling it what it did wrong, every iteration. So I decided to create an agent which would be trained to make its way around a track as fast as possible. I chose this because I knew I could create checkpoints along the track to automatically set the agent’s reward/punishment, and I also wanted a program that could be placed in a different environment and perform similarly well.
For days, I wrote down small snippets of code that I would need for the algorithm, on paper, whenever I had an idea about it. Soon, I had 3–5 post-it notes on my desk, so I decided to start working on this project.
First, I modelled the track in Fusion 360, and exported it as a .obj for Unity.
After that, I created the code for the AI. There are few steps for how the AI works.
- First of all, the main code looks for any saved information of previous AI training. If none is found, the array (named “current”) is defaulted to 0, 0, 0, 0, 0, 0, 0. The first digit in the array is reserved for the agent’s score, and the rest control how it behaves.
- Then, a random one of those numbers is selected, and for one agent (alpha), that digit is increased by the set increment value, and the other agent (beta), has its digit decreased by the increment.
- The two agents then go through the track, and are rewarded depending on how many checkpoints they go through, and punished if they go through checkpoints backwards (I used a function called Vector3.dot to figure out the side of the checkpoint the player was on).
- After 20 seconds, the agent with the lower score is deleted and the “current” array is set to the controller array of the agent with the higher score.
This worked, and I was happy with how it turned out. The agents gradually got better at going around the track, but I noticed a tendency to only turn right (The agents start at the short straight bit of the track, and go clockwise). As I had previously hoped to later test the agents on different tracks, I tried to fix this issue. I moved the starting point to a different place on the track where the first turn is a left turn, but this reduced the performance of the agents. I eventually gave up on this, and it continued to be an issue. However, as you will see later, this tendency to turn right did not prevent the agents from successfully making a left turn!
I wasn’t satisfied with this track, and I wanted something different. I didn’t want to create a new track in Fusion 360 and then place the checkpoints manually, every time I wanted to try something new, so I chose to use Sebastian Lague’s Bézier Path Creator tool to place small units of track, each with a checkpoint, along a path to form a track.
This worked perfectly for my needs, as you can see here.
The walls of the track are a little bit rough, because I used boxes for the walls of the units. I wanted to keep the visual and collider geometry as simple as possible since it was going to be copied into the world hundreds of times.
After playing around with this for a bit, I decided on a track design, and then let the algorithm run. It was better than I was hoping for!
The above video demonstrates the AI’s performance, after 130 iterations of training.
Interesting behaviour I noticed:
- The AI seemed to not like turning left. Instead, it figured out a way to turn left by spinning right! I found this very interesting, because the AI only ended up using three of its 6 inputs properly. The weights for the other inputs were 0 or almost 0.
- The green rays allow the AI to shift left or right, without turning (my intention was to give the AI some way to centre itself on the track). At one point in the training, the algorithm figured out how to exploit this feature to get an extra boost off the wall.
This project was very interesting, and I enjoyed the process, including all the problem solving. It was defenitely worth it to see the program figure out how to navigate a track on its own!
For next time:
- I will try to have more agents for faster and more accurate learning.
- I will experiment with a more optimized method of generating the track, such as removing the floor pieces of the units.