Optimizing characters in Unity
In Empire of Ember I wanted the player to feel what it would be like to experience a massive seige battle, in a city you designed yourself, in first person, with a fully destructible environment.
The player can layout city buildings, walls, and decorations however he or she wishes. The combined structure is connected through joints calculated automatically at runtime. It should use realistic physics, so you could destroy the corner of a tall and heavy tower, crushing enemy units beneath. Or cut through a wall, with the help of magic, reaching the archers laying fire on your armies below.
The AI should be smart at a high level, and also fight in detail on the level of the individual unit. Each unit uses utility AI to select the best target, pickup weapons, use special attacks, flee, or just roam around. Units should be able to destroy buildings section by section. In addition, they should provide an interesting realtime challenge, with tactical behaviors such as dodging, blocking, and switching between multiple ranged or melee attacks.
I was having a hard time getting Unity to perform even with 35 units and that was just combat on an empty map. I noted in my forum post “Old games such as Mount and Blade could handle 100 units at a time years ago on much worse hardware, so I know at least in principle this is possible.” Part of the bottleneck is that Unity is single threaded for the game loop, which can’t be fixed short of switching engines.
Despite this and other engine limitations, I’ve managed to bump up the AI unit count to 70, holding a consistent 30 FPS with the city falling apart around me.
Problem: Animated ragdolls are slow, even if they aren’t doing anything useful.
Detail: Each of the AI units in EoE game has a full ragdoll setup. When animated, this would cause each of the kinematic colliders to move each frame, causing an overhead of about 3 milliseconds per frame for 64 units. The cost is unhelpfully categorized under “Physics” which made it very hard to track down.
Fix: Disable as much as the ragdoll as you can. In my case, all joints and colliders in the character’s skeleton except those that lead to held a weapon or shield are disabled. However, when the player shoots an arrow, the arrow has a physics trigger with a component called EnableAIHighResolutionColliders.cs. This will fully activate the skeleton, so the player can perform headshots, legshots, and so on. I switch layers between AI units in ragdoll and not in ragdoll. I also switch collectionDetectionMode between the slower CollisionDetectionMode.ContinuousSpeculative and the faster CollisionDetectionMode.Discrete as needed.
Problem: Pathfinding “Is my target reachable?” checks become slow when multiplied by dozens of units
Fix: Caching. On every pathfinding check for a particular destination, I added a cache structure called PathReachableCache. It would store the results of the last reachability check, and if not enough time had passed, return the stored results. Previously I would also reuse the cache if the target did not change position from the last check, but that would not work with a dynamic battlefield.
Problem: Memory allocation is slow in general, worse when multiplied by dozens of units.
Fix: Preallocate memory, pooling, use struct instead of class. Multiplied by dozens of units, Unity might spend an entire extra millisecond per frame just allocating memory. I had to find basically every “new” whatever in the AI combat controller and make the item a member variable. In certain cases I also created object pools for certain classes. In other cases I changed class to struct, where a struct is created on the stack instead of the heap, so wouldn’t incur the performance overhead.
Problem: Instantiating prefabs is slow.
Fix: Pooling and preallocation. Every object of a particular type has a component with System.Guid that identifies the object. Objects that are no longer needed are disabled and go to the pool rather than get deleted. In some cases I would preallocate objects as well. For example, a 1/8 second hitch is very noticeable during gameplay, but not during a loading screen. So I allocate a certain number of all decals during the loading screen, to where it’s unlikely I would create a new instance. To prevent the pool getting filled with stuff from other levels, the pool is an object in the current playable scene, so when the player changes scenes the pool goes away too.
Problem: Garbage collection is slow and if not done the game can crash due to out of memory.
Fix: Call GC.Collect(); before gameplay start on every new scene where performance matters.
Problem: A* Pathfinding Project’s RVO is slow when there are many dozens of units, consuming the same CPU time as pathfinding itself.
Fix: Upgrade. The latest beta version of A* Pathfinding Project appears to have fixed this via the burst compiler. RVO no longer shows up in the significant list of costs.
Problem: Physics checks allocates memory, which is slow.
Fix: Turn on Reuse Collision Callbacks in Project Settings / Physics. Do not use the regular overlap functions, instead use the non-alloc versions into a preallocated and shared destination in memory.
Problem: Vision checks with many AI vs many other AI has complexity O(n^2). Also, the AI needs to see building colliders, of which there are potentially thousands.
Fix: Process the complexity in stages, with cost limits. Stage 1: The AI performs 6 overlap box checks, in an expanding approximate cone. In the first pass, the AI only sees the AI layer. In the second pass, the AI only sees the buildings layer. Limit 128 for units that are relevant to gameplay. Stage 3: Sort units, up to a limit, by distance squared. Stage 4: Only the closest 16 are raycasted. The raycast vision check is cached for some seconds, so typically fewer than 16 raycasts will be performed.
Problem: AI decision making is slow
Fix: One AI will perform a decision per turn. This is decided round-robin, with some exceptions where an AI can promote itself (for example, first combat update of the level, or just took damage). If an AI unit takes less than a certain time limit to perform its decision calculations, the next AI unit will be processed, up to a maximum time limit. We are using Behavior Designer at a cost of about 1 millisecond per unit, and this could probably be improved to 1/2 that time if we used a custom solution.
Problem: AI tactical decision making (block, dodge, attack selection, move to point selection) is expensive and reaction based, requiring a call to Update() every game tick.
Fix: I implemented a limiter on tactical decisions, where the AI will sleep for varying amounts of time depending on the circumstance. For example, a melee attacker against a target far away might sleep for a whole second, while in close may only sleep for .15 seconds. An AI sleeping isn’t actually sleeping, but will skip most operations aside from movement and facing their target. When the AI decides if it should block, dodge, etc. it operates on a % chance per time interval check, and will account for the time spend sleeping if it had in fact done so. When the AI is fighting the player the sleep times are much lower, so the player would never be able to tell the difference.
// Chance of 1 success in n tries, each try having a probability p is
// 1 - ((1-p)^n) = s
// Where s is the chance of success
// This simplifies to p = -(-s + 1) ^ (1/n) + 1
float GetReactionProbablityForTimeInterval(float timeInterval, float chancePerTimeInterval, float deltaTime)
if (chancePerTimeInterval < .001f) return 0.0f; float frameTime = deltaTime; float n = timeInterval / frameTime; float s = chancePerTimeInterval; return (float)-Math.Pow((-s + 1), (1 / n)) + 1.0f; }
Problem: Just having root motion turned on in Animator is slow, even if the animation has no root motion.
Fix: I turn off root motion during frames in which the AI is not actually performing an attack. This is most of the time, since walking, standing still, and waiting between attacks are not attacks.
Problem: It takes a lot of raycasts to determine if a target can be shot.
Detail: Each ranged unit in the game has one or more of 5 possible attack vectors: Direct fire, high trajectory shot, low trajectory shot, and for the two trajectory shots to to shoot at min or max shot velocity. These all require raycasts to make sure nothing is in the range of the shot.
Fix: Caching. The direct fire attack vector use one raycast. The trajectory shot vectors use one raycast from the shot origin to the apex of the shot, and another from the apex to the target. The results are cached and only updated every few seconds.
Problem: Grounded checks are slow and unreliable.
Detail: characterController.isGrounded is broken - if I were to rely on it AI units would enter ragdoll every 5 steps.
Fix: Caching and a custom raycast. I cache the results of the last IsGrounded check from my own code. Every second or so this is recalculated. I use a raycast from the feet of the character to the ground and check the distance squared is under the max step height of the character. This sounds simple, but previously I tried many other methods before settling on this one. For example, Physics.CheckCapsule doesn't work reliably with large terrains due to floating point inaccuracy, even with a radius as large as .4 meters.
Problem: Need to optimize layer intersections.
Detail: If two AI characters are able to collide with each other through layers, this slows down physics. In addition, Unity doesn't handle this case in a useful way, causing the characters to stand on top of each other.
Fix: AI characters collide with the minimum possible other layers. Most notably, they don't collide with each other, which has complexity O(n^2). Instead, we rely on RVO from A* Pathfinding project so although they can technically overlap, they rarely actually try to do so and it looks the same as if they couldn't. The player is a special case. If the player tries to move into any AI, we subtract that part of the movement vector from the intended input. Conversely, we do the same thing if the AI happens to move into the player, by offsetting the position (both with normal movement and with root motion movement).
Problem: Monobehavior calls are slow en-mass
For certain updates that are performed frequently, I implement an interface equivalent such as
public interface IOUCM_Update
All objects that implement this interface are put into an array. I index through the array using an index, rather than foreach (foreach is slower than using an index). In order to add to or remove from the array, I have to put hooks in OnEnable and OnDisable() for said objects, which will notify the OptimizedUnityCallbackManager singleton that the array needs to be updated.
Also, I removed the Update() function from components that don't actually need it.