Unity ECS

ECS testing review : system testing VS world testing

In Unity ECS, you will have much easier time testing your game programmatically. Everything is data, and data lives in easily accessible from code World, it is as if you can "see" the game running just by looking at all the data in your code editor.

5argon / Sirawat Pitaksarit

Dec 29, 2019 • 22 min read

In Unity ECS, you will have much easier time testing your game programmatically (i.e. edit mode test) because you are no longer tied with GameObject. Everything is data, and data lives in the easily accessible from code World, it is as if you can "see" the game running just by looking at all the data in your code editor.

Testing a game in just edit mode test therefore covers much more than before since data controls everything and you could test the data. In old GameObject paradigm, edit mode test will left you worry about how it would be at runtime.

It is logical to think that you can test on the system as one unit, then move onto the next system when you make some more. The library provided .Update() public method to call on a system that allows the approach. Let's review this.

Are they "unit test"?

I don't think it is useful to argue over programming terms. You can just learn more tools to help finish your project and you can choose to use or not use them. But I came across too many "this is not THAT" that ended up not constructive to discuss. (Similar to esports is not sports!)

Some may view system in ECS as a unit. However one could also say a stronger test is a test that do not touch or pass through ECS library at all and therefore this whole "ECS testing" is already integration level.

In that definition, your unit test asmdef should not have to reference Unity.Entities, because that is literally integrated with other assembly. You must be able to test things you use in OnUpdate inside the system, or inside job's Execute without doing system.Update at all, so that it should be finally called a unit test for you. But you can always go further. If your mathematics logic uses Unity.Mathematics, would you have to mock that library again so you could truly test in isolation? In the end you define your own level of unit. And it is not productive to judge others that the level that he is testing is or is not a unit test.

Call it what you like. "You skipped unit tests! Therefore this article is a bunch of bullshit." This article is to describe "this kind of test that runs through ECS library". To skip or not skip your version of unit test, or to use or not use these kind of test, is up to you. But there are some strength in these kind of tests that you may want.

You may even have your "true unit tests" in place and render these ECS testing all irrelevant and if that is so, great! I think ideally it should be that way. But I am just too lazy/undisciplined to design ECS code that is unit testable without ECS library or requires mocking. For example if I start writing math logic into Entities.ForEach then they won't be testable without including ECS library, unless I separate the calculation part into internal static or something and let the test library with [InternalsVisibleTo] to test them. I have to balance how the code looks how I like, how fast I could pump out tests, how happy I could write those code, available time, and the coverage. All these are more important than conforming to programming terms.

So when I say just "unit" without "test" (e.g. "as a unit") from now on don't take the word too seriously, this article is not for that. For example when I say "use system/world/data as a unit" I don't mean use them for writing "unit test". (I realize this is a holy world and would trigger someone when definition doesn't match with what you studied.) Think instead that "unit" is each test method. No matter what kind of test you will have some test methods. Think instead that I use them for writing "tests". Just a test. No power levels. No offense.

I will always touch ECS library in these tests. You can imagine I already have more granular tests, or I am lazy and a bad programmer and OK with seeing result passed back from ECS library. They are not related to this article. This article explain what happen and what you get when you write these kind of tests. (No "instead")

Or even, just think "Edit mode test" like the Test Runner tab says. As long as they could run on edit mode I think they are good enough, unit or not. (So it is instant to run a lot of them to check your program.)

My setup

By now you probably realize I hate terminologies and best practices and prefer I just call my test a test. So, I prefer appraoch that is not exactly pretty or proper but works with good velocity. It's not a good feeling that you can't whip up a test fast enough after you get a feature done to "pin" that feature down. You can't wait to implement the next feature.

It's a reason I write a test that just go through ECS library because separating pieces for test will make the code looks less ECS. (e.g. can't dump things in OnUpdate or ForEach freely anymore.) No mocks or stubs for me. Just create data and run the real thing.

Write all ECS code in a separated UPM package. It should be purely data. If the library deals with rendering, make an another UPM package that its asmdef requires the data only package.
Put all IComponentData on internal except the one you intended from the outside to add/remove/set. There are many tags that your system tag and remove among themselves. Keep those internal.
Test assembly get [InternalsVisibleTo] to be able to check internal tags or skip steps. For example a system may expect prior system to tag something to begin work. In the test you can just tag from test code and try out.
Systems are all internal, the UPM package expect systems to be added via default world initialization assembly scan. (Less flexible, but simpler. Plus you can now make the system you intended to be added manually )

System testing

Green rope meshwork — Photo by Clint Adair / Unsplash

System testing is simply you made a World containing only that system, then because you are still holding a reference of that system, you can .Update() on it. Calling Update on World would do the same because there is only one system in there. Then you may use EntityManager of that World to assert entity state after update.

See an example setup here : https://github.com/5argon/EcsTesting/blob/master/SystemTestBase.cs

I have been testing going through each system like this for about a year, but not much nowadays. I had observed several things different from OOP that you may want to consider :

Advantages

Testing a system allows you to see how a system transform input entity in isolation. Confirming your complicated logic inside. It is easy to write, a system sounds like a unit for test the same way a class's methods sounds like a unit for test in OOP when you think by preparing the right Entity then update the system, it feels just like you have selected a method with the right overload and invoke it.

The need for tests that fail

Unit test never goes red when adding any amount of other systems, no matter if they are related (regarding to [UpdateBefore/After/InGroup]) or not related. Which maybe a good or bad thing, a test that never break may not be because that the tests are strong, but because the test aren't doing their job : catch your regressions. You could say that the world regressed and not the system, so this evergreen test is still alright. But do you really want to keep it around?

System unit tests often goes red in not a useful way. For example you remove some of its tag component you previously use, because you know you have a substitute later in the frame that someone else would tag it. The unit test that you write to confirm that this one system will tag entities will be red now, which you must fix the test because of course that is no longer a part of this system. Then I realize why did I write that unit test in the first place, I never really want to know if this specific system could tag entities but rather could any systems in the world. This kind of red test is not catching your mistake but rather a housework. If this was a world test, the test about the world tagging entities after a round of update would still stay green which is a useful green.

I then feel the need to write a similar integration test that the only difference is that the system is now among others. For example after unit testing a system with a couple of [UpdateBefore], then you also want to test the world with this system plus those systems to see how it goes together. And even more, I now want the complete world and see again if this one system still works. It creates redundant tests to fully confirm its function.

When anything break, often all 3 versions of the test will break together increasing maintenance work. At this point skipping unit test to integration test where you test the world for the purpose of checking a single system maybe a better idea that gives similar confidence with less work. But it will be more difficult to write the test.

It feels like all the unit test do is then just cement the system from changes, and you must drill out the cement (fix tests, but not because regression) for any little changes. The systems are really good at allowing you to change without affecting others. It is a shame that flexibility of modifying the system code is reduced because all systems are coupled with their unit tests. I feel like the test should just refer to data ( IComponentData ) and World to kind of free them up.

In comparison with OOP's method calls

In OOP, unit testing all public methods of a class instance looks really promising as it covers everything it could ever perform. Arguments to the methods you see in your unit test are direct-feed, and reflects their real use because methods are getting called the same way. The essence of how unit testing methods in OOP is reliable is because of method argument, they provide an insulated environment that translate the test result to real use.

Input entities you put in the world and then run the system update in hope that the system would take them in directly, isn't working the same spoon-fed way as arguments to the public methods, as the world is very likely shared among others in real environment. Unit testing system is then much weaker than unit testing classes in OOP because you assumed the system has exclusive use of the world.

Systems has relationships in the design

An extreme case of the previous point : your EntityCommandBufferSystem is impossible to unit test. Its very definition is to wait for commands from prior systems and depends strongly on its update position stated on itself or implicitly positioned by other systems having Before/After relating to it. It is a modular system of purely nothing. Will you still view it as a unit? Or view it as the same unit as the system that use it? What if multiple systems use it?

The same reason for normal systems, unit test of any system that aims to use EntityCommandBufferSystem (often with [UpdateBefore(ECBS)] in the design) is incomplete since the command cannot playback. If you view a system as a modular unit that could stands on its own, then those update attributes sticking on its head is also a part of this unit. A real unit test should also be able to test this, but it can't. Of course it could stand on its own, but that doesn't mean it works as intended on its own.

To unit test this kind of system, you must not unit test and perform integration test instead. To still keep unit as smallest as possible, the world should have just enough systems that are really related, no more no less. But following this correct best practice was very tiring, with little gain as compared to world test where I just pour everything in. (gain = easier to achieve useful green test, though I lose much more time in setting up the test.)

I did that before, and it is a pain to prop up a world with one system plus all its ECBS. I even try to write a reflection code that scans all UpdateBefore of a system in hope of automatically create a world for unit testing that system, but sometimes relationship is on that ECBS as an UpdateAfter, etc.

The best way to end all these problems is an outrageous approach to just test on world with everything, which we will get to in a moment. I still remember that [SetUp] 30+ lines code of cherry picking systems one by one for tests. It made me don't want to refactor systems since I always forget to come to the test and add a new one that would restore its old function. Suddenly I can't use all that lighbulb refactor commands the text editor provided me. I don't want to go through that again.

Sometimes the system is not working as intended because their relationship are wrong. Systems maybe modular in concept, but they often has concern about update order position relationship with others in its design, making traditional unit test not covering all of its design unlike in OOP where unit test translates well and give you much higher confidence than this.

Is the system really a good unit?

The first year I learn ECS I was under impression that it would be much easier to test than GameObject as I see system could be developed piece by piece and therefore this piece must be a good unit for test. (It is true that it is much easier than GameObject) But by the time I actually get to write them that I started to realize something.

In my opinion, system may not be the correct "unit" as it appears from the first glance. The fact that I need ConstantDeltaTimeSystem to properly unit test systems with Time usage already said something that systems are never meant to work alone, though they could be designed alone piece by piece and be added to the world one by one.

Only if that system has [DisableAutoCreation], which means it is really designed to be proc an Update on its own in a world with only it, like those game object conversion systems, then I agree the system is really a perfect unit.

In OOP I used to go through all public interface of everything and make sure the tests are covering them all giving me full confidence. It is understandable why unit tests are important there. But here, going through all EntityQuery of a system and ram in Entity combinations and update it no longer giving full confidence as in real use it is not going to be updated alone. There are much more problems about this one system waiting when it is together with everything else.

World testing

Èze views — Photo by Joshua Earle / Unsplash

World testing is when you write a test with specific objective about a single aspect of a single system like in system unit test, but you don't call Update on just that system.

Instead, the world is now populated fully like in runtime : default world initialization code that places all systems (yours, and Unity's, anything assembly reflection could reach) sorted in ComponentSystemGroup then by their relative update order in group.

Then after you setup a starting Entity data, you run Update on World. (I remembered you can't do this in earlier preview version? Now you can run Update on world.) This starting data would cause the system you are interested in to pass its OnUpdate criteria, ideally only that one, but it may not be the case now. Extraneous updates may or may not affect the end result. You assert the result like normal but you always think of the world.

See a setup example here : https://github.com/5argon/EcsTesting/blob/master/WorldTestBase.cs

The new unit is data combinations

Instead of system, the unit is now a different combination of starting data. They cause a set of transformations of data by rounds of world update. "If this set of data exists, it became that data after the world updates".

The things who did that are systems, with an "s", working together in order you designed to cause the mutation (changes)/emission (more data than input)/filter (less data than input)/generation (new data unrelated to input). This is why I can't bring myself to see a system as a single unit to test when data is a first class object to think about in data-oriented design.

Naming the test method

The test should have reference to data (IComponentData and friends) and World but no system class name. The test method name may faintly referring to a system class name, but make sure when you rename the system a bit or dice up the system more, that test method name still make sense so you don't have to fix the test down to its name.

The test is still named like you are testing a single system but it should read like "how would a world with that system transform data" instead of "if this kind of entity exist then the system updates alone what would happen".

For example, if you have a system named CpuAutoplayProcessorSystem then don't name the test method that tied strongly to the system AutoplayProcessorDestroys2PInputEntities like you would in system unit test. (Then you create some 2P input entites, update the system, and see those entities disappears. Successful as a unit test, but not quite enough here.) But instead something like PlayerTwoControlDisabledInCpuMode while checking the same thing.

In your mind, you know what make or break this test is gonna be that CpuAutoplayProcessorSystem's working but you are now testing on perspective of the world. Much later when your code base grow, you may now have PlayerTwoDisablerSystem and CpuBehaviourSystem separately that still keep the previously written test stays green and good sounding.

It is not a unit test after all?

It maybe controversial when I say anything is better than a unit test, because unit test was so strong in OOP. (As explained, testing on OOP object's public interfaces and all allowed constructor combinations pretty much covers everything it has to offer.) This is no unit test, and you don't need to listen to me either if you feel offended.

It is maybe simply that system wasn't the right unit but rather a set of data. I may view these data and this ever growing world as a unit on my own (you don't have to mind me), but to make you comfortable I am going to call it integration test, since it integrates all systems together from viewpoint of someone who viewed system as a unit.

Integration testing sounds like something inferior to unit test and would be no replacement, but as I say in the previous section there are a lot more to desire from unit testing systems in data-oriented design unlike in OOP. I think I need to break the golden 100% unit test coverage rule.

It make writing the test harder

Testing this way is harder that you are trying to test whether a single system works, yet the test run on world, making it hard to write assertion what is the end result and harder to think of a starting data state that would cover just enough of a function of this particular system.

The test occasionally "spill" over to some other systems as time goes on, but usually the assert will not look at the result from those spillages. (There will be other world tests that does look at them and go red appropriately.) The tests are still written in small scope like a unit test, just that it updates the world that other system coexists.

You have less control that the input entity will be modified before it actually arrives at the system you are targeting or not, this control is a careful Entity setup at start of the test. An input must be designed for the world rather than for a single system.

Benefits

It however creates a dynamic test that is always changing as your code grow (in my opinion this is a positive thing), if these test could stay green after each systems are added or as [UpdateBefore/After] are changing around, then you have got a strong confidence. One test may aim for a single system, but it lightly covers all other systems for the fact that you are updating the world.

Integration testing can give strong confidence about system orders and how they would receive or being interfered by unexpected prior system. I caught tons of bugs about wrong ComponentSystemGroup, even before arriving at wrong [UpdateBefore/After].
Catch bugs related to prior systems added later modifying starting data, this is usually command buffer systems. Often it is that you "give in" somewhere in the code (or simply tried to optimize with persistent native containers) and start relying on baking/sorting/ordering entity data and somehow the order you thought would never change is not the same. World testing can catch them.
It allows checks on system refactoring, such as when the job code is getting long (Too many ForEach or massive OnCreate in one system that you run out of idea how to name the query variable, etc.) and you separate it to a new system. If the refactor was correct, the world test stays green confirming you of a successful refactor, while the unit test that was locked to a single system will be red as it is now missing the function you were expecting before. It is not missing, it was just refactored to a new system.
Newly added systems are automatically imbued into all written tests thanks to assembly reflection that adds them into the world test. Cherry picking related systems for each tests manually was technically more best practice-sounding, but frustrating and tiring. Usually when unit testing a system, you then want all those [UpdateBefore/After] in the world with it too because they are in its definition and you want to see their interaction. This got annoying so in the end I just test the world rather than each system, but with objective to test a specific function of each system in each test.
Now you can write one test that give you about the same confidence as a unit test plus integration tests over multiple systems. No more fixing 2-3 red tests that are almost the same.

Changed filters

Worthy of its own section. World test allows you to test component changed filter, which is considered a definition of a single system yet it is difficult to unit test if we view system as a unit.

Changed filter could get you free performance! Because if it worked on its work while its source data didn't change, the mutation destination will be written with the same value and therefore useless. There are case where slapping changed filter will make the logic wrong such as your system was resetting junks produced by other systems that make the world works, according to source data that didn't change. (In this case, I suggest you redesign a bit so you could use changed filter.) You can use .WithChangedFilter with the ForEach lambda too. So stick them in where you could, but that's the problem. You are not sure where you could. And the test should help assuring you to get this performance.

Change version has to be recorded as a part of world update chain inside any prior systems. You cannot fake it with EntityManager from the test code.
When sticking with unit testing systems, you could handpick some system to run Update and then run your system you are unit testing Update to see if the changed filter works. But the process of picking one other system already feels out of unit.
By handpicking you need to think which system actually runs earlier. System that runs later may trigger changed in the next round of update in real run, in which case it maybe a bug in real game since the change detection came a frame late. But you will not be able to catch this bug in your unit test.
Changed filter is often on a simple system but heavy duty. Think just 3-5 lines of ForEach code in OnUpdate. By adding changed filter, you have added massive chance for the system to fail as you continue building the world. If you don't have world test you will be afraid of using them.
Remember that changed filter even works better in chain. For example, because prior system didn't catch the change it did nothing. Then because of that, all other later system that was expecting changed type from the system that didn't run will also not run. That is to say there is many more combinations of things to test if the change will finally propagate to your system or not. You can't afford to even think and hand build a world of each changed chain scenario.
World testing on changed filters evolves as you add more systems with their own position in the world accounted automatically.
With world testing, you can even write a test named like "this input data does nothing to system X, becuase system Y prior didn't cause the change". It is possible to say the name of 2 systems, still aiming to test 1 system, and yet not having any of their class name in the test code since we left the world auto populated.

Without world test covering a lot of your game, you will not want to add changed filters since adding one seems to break random things where those per-system unit tests stay green. Adding one successfully then some bug occurs later, you will have to randomly suspect changed filter on many systems at once. (and try comment them out would things come back to work, etc.)

This is yet another evidence that system unit test couldn't cover everything a system has to offer unlike unit testing a class public interfaces in OOP. Systems were made to work together with relationships.

EntityCommandBufferSystem

I have mentioned this, but testing world allow you to naturally see result from EntityCommandBuffer playback.

One of the best practice is to target BeginInitializationEntityCommandBufferSystem if possible. That is, expect the result to be available next frame regardless of update positioning of the system queuing the command. All other ECBS will introduce sync point in the frame. This pattern is quite common.

A system that tag its target so it won't have to work on them again : a no brainer to ECB tagging to BeginInitializationEntityCommandBufferSystem if you know this tag is exclusively used by this system. No one will use the tag until the next frame anyways. In the test, you would want to test if the system would repeatedly work or stop properly. The world test would be updating the world twice. If system test, updating the system twice will not playback the command and the system will really work twice, therefore you cannot make sure.
A hard to see details : for example you have a scheme that your current score is an aggration of each separate Entity with ScoreToken. Therefore if you collect items you want to create ScoreToken and expect UI to change according to new entity introduced. (Along with play sounds, etc.) Creating an entity meant sync point. If you want the UI to update immediately this frame you may use EndSimulationEntityCommandBufferSystem or BeginPresentationEntityCommandBufferSystem to ensure it comes before the UI. However not many players would notice a difference of score UI updating 1 frame later and it is not impacting the gameplay so much that you need it now. Then it is a good idea to use BeginInitializationEntityCommandBufferSystem because it is likely that you repeatedly get coins throughout the game, it would be a shame that jobs are completed mid-frame everytime you get them.

With world testing, when you know you are testing functionality that you expect it to come 1 frame later you can simply call world.Update() twice. Then you can assert the create/remove/destroy. (Need sync point, that is not a set.)

Time

Time in ECS flows from system to system. And there are SetTime, PopTime, PushTime for use in a system that are meant for later running system to receive it. Testing world can ensure your push is properly countered by pop. And your new push sits well with other old pushes and pops.

Also it is possible to speed hack time for test by adding a system that change time on the front in your testing world. Useful to get the game to exact state and see if something happens or not, or catch bugs related to frame skips. e.g. If you designed holding down button would kill incoming enemy since there is a killzone activated in front of you, if the game lags and enemy skips ahead through your character would you kill it? If it skips to be exactly at your character would you take damage?

Difficulties

The previously written world tests appear to go red more often as you develop on other area. But in my opinion that's showing the tests are doing their purpose. System test stays green but I would rather have this kind of red because it covers more than a single system.
A system test on a door may check if it is rectangle, with knobs and usable hinge. I'm pretty sure those test will be very strong in a wrong way since those never really regress. By the time you fit the door into the house and found it was too tight you would need more overall test with the house, that anyways at the same time confirm that the door must be a rectangle. This test will seldomly go red as either the door or house changes but I glad they did.
Hard to predict end result in the state that all systems are present. You have to carefully set input data that most of the things except what you are testing are neutral. (0, false, null, etc.) This is also great when some newer systems break what you thought were neutral and insignificant to the system you are aiming to test.
You may need to design the code so it fits with this testing method. The same way there are both testable and untestable OOP designs. Unit testing only 1 system in isolation "just works" in most case since it will not scale in difficulty however big your game has become.

You can remedy some disadvantage with developing systems in a separated UPM packages. The world test now has more limited systems while still reliving you from propping up your world differently for each tests.

Will it scale?

It is not like you must choose a faction, you can still write system tests. It is just that I now prefer world test first. Never hurt to try, and if it ends up bad I can still come back to delete this article.

Fall back to single system test when you are not able to handle the state where every systems unpredictably changing everything. When the world is crowded, separating UPM package maybe the solution to continue testing this way since ECS scans only available assembly to populate the world. You can open a new Unity project of just one UPM package to perform a world test so assembly is more limited.

Of course with tons of time you can do both unit system test and world test. But world test is only a little bit harder, then it cover what unit test did really well plus many more benefits. It is tempting to cut your dev time by half by ignoring unit system tests altogether. (Or by simply calling this world-but-pinpoint-a-system a new kind of unit test, I don't know.)

Scaling problem has been yet to be proven if it really scales that fast on bigger games or not. I am afraid I would never be working on AAA games like some of you. Therefore I cannot assure that this world test approach is the go-to for everyone. I kept trying it for about 2-3 months and it works well in my game so far, which means I could still setup a good starting data and calculate assertion result for real.

By for real, it means without cheating by taking the actual value in assertion error to fix the test, though, you should use that as a hint and think from the beginning is it possible to arrive at this value. I see that about 40% of the time, the correct test is that value. And don't worry, instead of getting lazy and blindly edit the test according to it, you often get a lightbulb moment from the assertion "Oh right right! It was because of that thing I just added! Fine.". (If you don't get a lightbulb, commit more often and just think about the last commit where all are green.) That said, never fix the test until you can walkthrough the code and arrive at that on your own.

If you can't, it maybe a sign that this kind of test is showing its weakness, or maybe your system code is too long. World test also allow you to freely dice up your system code to multiple systems, making them easier to reason, to actually help fix the red tests. In unit system test doing this would conversely make green tests go red. Green tests are then very good sanity check for any kind of refactor you want to do.

Job testing

As a bonus here's an another "unit" you could consider. If you use the old skool IJob, IJobForEach, IJobChunk, etc. struct instead of Entities.ForEach baked into the system code, you can test those jobs as a unit!

Just separate them to a new file then your test just new the struct and add required data, then schedule and complete it. (Remember that you can create EntityQuery from EntityManager without any system as an input to these jobs when they wants it.) It is a better unit than systems in my opinion.