How to get started using Unity3D Machine Learning Agents
A few years ago I started learning about Unity’s Machine Learning package, and how to use it to create AI that can act intelligently. I followed a tutorial from the Unity website, about creating and training ML hummingbirds to collect nectar from flowers. If you want to see it, click here: https://learn.unity.com/course/ml-agents-hummingbirds. Now, I am going to go through the basics of how to get started with Unity ML Agents hopefully so that you can do the same after reading this article as opposed to spending days learning from the tutorial. Some programming experience is recommended.
Keep in mind, this is a general overview of how to create a machine learning project. There are no specific instructions on how to make a certain project. The aim of this article is to give you a good understanding of the general process, but you will most likely need to do your own research on particular implementation details, especially if any information here is out of date at the time of reading. For the same reason, imitation learning, optimizing model rewards, and some other in depth topics will not be covered in this article.
Step 1 : Downloads
First of all, we need to set up the basic project. First, download and install Unity if you have not done so already. You can access the download for Unity Hub here: https://unity.com/download. Make sure to create an account first. There is some confusing licencing stuff too, but you will figure it out eventaully.
You first need to install Unity Hub, and then inside of Unity Hub, sign into your Unity account and activate a licence for your device. Then click on the ‘installs’ tab. Make sure you go into settings (the gear icon in the top right corner) and set the install location to the folder where you want to install the actual Unity application. Then, click ‘Add’ and select the version of Unity that you want (preferably the latest LTS version). On the next page, you can add ‘Modules’ which is support for exporting to various platforms. I would recommend at the very least to have support for building to the platform that you are on (e.g If you are on macOS, add in the “Mac Build Support” module). WebGL is nice to have as well. You can host WebGL builds on Github pages to allow anyone to access it.
Installing the module will take quite a while, so come back to it in a bit. Make sure that you have set a download location! I made this mistake the first time I tried installing modules, and the download mysteriously disappeared immediately after being installed, every single time…
Once you’ve got your modules set up, click on the Projects tab and create a new project. Choose 3D or 2D depending on what you’re hoping to do. Don’t use HDRP or URP, because those use more performance intensive and better looking rendering methods. For ML training, it’s best to have very simple graphics to minimize the amount of processing power spent on those. It is possible to upgrade the project later if you choose. Name your project, and choose where you want to save it. The project will be saved in a folder with the name of your project. I would recommend creating a folder dedicated to just Unity project. Click ‘Create’ and Unity should prepare the project and open it up.
When the project is open, you will need to instal the ML Agents package from Unity’s package manager (Window>Package Manager). Once that is installed, the project is set up for machine learning, and you can begin the fun part!
Step 2 : Setup
Create your universe. Decide what you want to create, and what the agent needs to do. For a first project, I would recommend something simple like balacing a ball on a flat surface (2 inputs : pitch and roll) or driving a car around a track (2 inputs : forward/back and right/left).
There are two main ways you can go about this. You can create the game world first, and all the human controls.
The other option is to integrate the AI into the world immediately from the start. This is pretty much required for very complex agents like a multilegged robot, which would be incredibly difficult to control manually.
This first half a step should not be neglected. I am compressing pretty much all of game dev into half a step. Do not underestimate how long this could take.
Now, you need to add the ML components to the agent to turn it into an agent. You must add Behaviour Parameters, Decision Requester, and an agent script. The agent script should be your own script, where you will write your own code for the agent in the coding section below. In this script, you must change the default script’s MonoBehaviour to derive from Agent instead.
Now you are ready for the next step.
Step 3 : Code
The next step is to write all the code for the AI. For this example, wrote some code for an ML racecar. I will not include any code, because there is an abundance of examples on the internet already. One more will have little benefit.
Here’s a general overview of the process: First, the environment is initialized. Then the agent is spawned. The agent needs to collect observations of its environment, and then the neural network processes what it needs to do, and decides what actions to perofrm. This repeats every training step, until the training episode time runs out, or you call EndEpisode()
to immediately end the training episode (for example, if an AI car crashes into a wall, or falls off the edge of the world)
What you need to do is override a few functions. These functions are Heuristic (manual control), ActionReceived (convert actions to real behaviour), Initialize (essentially the same as Start() after the agents start training), Episode Begin (respawn agents when their episode ends, for example), and Collect Observations (give the agent info about its environment).
The first one is Initialize, which runs at the very start of the program. Here you can initialize variables, spawn in agents, etc.
public override void Initialize()
Then you have the Heuristic. This lets you use your own pretrained neural network (your brain :P) to control the AI.
public override void Heuristic(in ActionBuffers actionsOut)
In my agent for example, I did something like this. vertical
and horizontal
are values from the arrow keys to control the wheels, and brake
is for the brakes.
var continuousActionsOut = actionsOut.ContinuousActions;
continuousActionsOut[0] = verticalInput;
continuousActionsOut[1] = horizontalInput;
continuousActionsOut[2] = brakeInput;
These values are then sent to OnActionReceived, which converts those actions into real behaviour.
public override void OnActionReceived(ActionBuffers actions)
In my case, I just extract the control values and then use them to control the car.
vertical = actions.ContinuousActions[0];
horizontal = actions.ContinuousActions[1];
brake = actions.ContinuousActions[2];
// These values are then used within FixedUpdate() for physics stuff
During training, Heuristic is not used at all. The neural network sends information to the action buffer instead of you.
Another important one is OnEpisodeBegin. This is called for every agent every time its training episode ends when the time runs out, or EndEpisode()
is called.
public override void OnEpisodeBegin()
Here you can respawn the agents, reset a score, or anything you want. The sky’s not the limit. Only processing power!
public override void CollectObservations(VectorSensor sensor)
Here, you can add ‘observations’ to the agent. If you are using the ray perception component on your agent, it will already use those without having to add anything here. However, you may find it useful to provide the AI with more information about its environment, such as its position, rotation, speed, etc. It’s best to keep it simple, to avoid overloading the agent with too much information.
To give the agent custom information, you need to add the following code:
sensor.AddObservation(yourData);
sensor.AddObservation(moreData);
sensor.AddObservation(usefulData);
Inside sensor.AddObservation()
, you can pass a Vector3
, an int
, or a float
.
Those are almost all the most important functions. If the code above does not work, check online for other people having the same problems. Given the current pace of AI development, it is likely that the code in this article will stop working eventually for future versions of Unity ML-Agents.
The most important function is AddReward(float)
. This single function is the most powerful determinant for how well your model will perform. Correctly balancing reward is a surprisingly complex topic, so much so that I may write another article going into depth about how to optimize it. However, the main idea is as follows: When the agent does something good, reward it. When the agent does something undesirable, punish it (using negative reward). The agent will do everything it can to maximize that score. Treat your agent as an extremely clever toddler and be careful not to punish the agent too much. Too much punishment, and it might learn that the best way to maximize reward is to sit still and do nothing.
Tuning this will take a while. For now, make your best guess on how the agent should be rewarded.
Onto the next step!
Step 4 : Python environment
In order to set up your python environment, you need to use the terminal, or CMD on Windows. For this article I will use macOS commands, which are similar for Linux. Some terminal experience is recommended, but ml-agents was actually one of the very first things I used terminal for.
Optionally, you can chooose to do the commands in a sandbox, separating the packages you install from the rest of your system. To do so, first install miniconda, then “conda create -n environmentName python=3.7” to create a new environment, and enter it with “conda activate environmentName”. Do this before beginning the steps below. You can exit with “conda deactivate”. Any packages you install inside the conda environment will only exist in there and will not interfere with the rest of your system. This is not required, but can be useful.
Now the very first thing to do is to choose a folder on your computer in which to store the training files. Once you have decided this folder, create a new file there called someRandomName.yaml
. Now in this file, you need to set up some parameters for your agent. These are too extensive to go over in detail, so I will instead link to the documentation explaining what they all do. An example of how your file might look is shown below:
The name of your model here is very important. This must match the name of the agent’s “Behaviour Name” in the Behaviour Parameters component inside Unity. The python in the terminal will use the computer’s network to connect with the agent inside Unity, and the terminal will do the difficult part of optimizing the neural network, while Unity simulates the results.
In my experience, the most critical parameters to understand are hidden_units
and num_layers
. I strongly recommend this video by 3Blue1Brown for some background understanding of how neural networks work.
Next, install ml-agents in the terminal with pip3 install mlagents
.
Note: You may need to later reinstall the mlagents python package or Unity package or both to find the versions that work together.
Now running the training program is a bit trickier. In the terminal, navigate to the folder you chose, which contains your someRandomName.yaml
file. This is done using the cd
command.
If the path to the yaml configuration file is /users/me/path/to/AI-Stuff/someRandomName.yaml
, then the command you need would be one of the following:
cd /users/me/path/to/AI-Stuff
cd ~/path/to/AI-Stuff
where the tilde (~) functions as the path to your user account’s directory.
Now we can finally start training with
mlagents-learn ./someRandomName.yaml --run-id agent_01
agent_01
is the name of the training session, and you can change this to train different versions of the AI with different names and save previous ones for future reference. This will now create a new folder in your selected folder with the name that you provided in -run-id
. If you stop it for any reason, you can resume training of the same model by simply adding --resume
on the end of the previous command (you can access command history by pressing the up arrow on your keyboard).
Once the program is ready to start training, press play in the Unity editor. If all goes well, your agent(s) will now be behaving chaotically :)
Step 5 : Optimize
Training takes time. I typically leave my computer running overnight to train the model, and sometimes more than a day.
If you have just one agent in your scene, it will run at a very high speed, trying to make the most of the processing power available, but it can only do so much. To speed things up, create several copies of the agent and it’s environment spaced apart so that the agents cannot interact with each other. If your environment is static, like training a race car for example, you can use a single environment and have all the cars be invisble to each other and not collide with each other using Unity’s physics layers. If you want the agents to compete with each other, then you can put all of them in a single environment, but then you must be very careful about how many agents you put into one environment.
With more agents in your world, training will be quicker since all of the agents share the same neural network brain, and learn from each others’ experiences. There is a balance between not enough and too many agents, which differs based on the complexity of your system and available processing power.
To see how your agent is doing, open a new terminal window (activate the conda environment again for this window if you’re using it) and navigate to the same folder using cd
. Then input tensorboard --logdir TheModelName
and press enter. Now, go the link provided. This will show you a graph of all the agents trained so far and in progress. There are several buttons and sliders to fiddle with the display.
After a while, it is possible that your AI is doing exactly what you want it to! This is the ideal scenario. However, it is more likely that the agent has figured out a loophole to farm rewards. Some of my experiences include agents applying negative braking to accelerate faster and doing donuts to maintain speed near the centre of the lane. Another time, a robotic arm found out that the best way to get rewards was to keep it’s hand close to the target but not touch it in order to rapidly gain points. Another agent figured out how to do ridiculous acrobatics on one leg to stand as tall as possible, using the other one to improve stability.
Once this happens, fix the reward system and train again. Keep fine tuning the rewards and retraining the agent until you are happy with the result. This can take several days of work. Be patient :)
That’s all for now. There is a lot more to AI, even just within Unity3D ml-agents which I did not cover in this general overview. It will take time to master the basics. A first project could take anywhere from a few hours to a few weeks to set up. I strongly recommend doing your own research to improve your future projects. As mentioned above, I may write another article explaining in depth some methods for optimizing reward functions. Good luck!
Below are some demos I created a while ago for you to compete against the trained AI model in an arcade style race around a simple track.
Original : Better graphics, tire smoke, controllable camera.
Performant : Easier to control given the automatic camera. Highly optimized graphics for low end devices (got 35 fps on a 10 year old PC!) and a slightly different track to show that the agent can generalize to a new track it has never seen.