In Unity ECS, you will have much easier time testing your game programmatically (i.e. edit mode test) because you are no longer tied with
GameObject. Everything is data, and data lives in the easily accessible from code
World, it is as if you can "see" the game running just by looking at all the data in your code editor.
Testing a game in just edit mode test therefore covers much more than before since data controls everything and you could test the data. In old
GameObject paradigm, edit mode test will left you worry about how it would be at runtime.
It is logical to think that you can test on the system as one unit, then move onto the next system when you make some more. The library provided
.Update() public method to call on a system that allows the approach. Let's review this.
System testing is simply you made a
World containing only that system, then because you are still holding a reference of that system, you can
.Update() on it. Calling
World would do the same because there is only one system in there. Then you may use
EntityManager of that
World to assert entity state after update.
See an example setup here : https://github.com/5argon/EcsTesting/blob/master/SystemTestBase.cs
I have been testing traditionally with system as the smallest unit like this for about a year, but not much nowadays. I had observed several things different from OOP that you may want to consider :
Unit testing a system allows you to see how a system transform input entity in isolation. Confirming your complicated logic inside. It is easy to write, a system sounds like a unit for test the same way a class's methods sounds like a unit for test in OOP when you think by preparing the right
Entity then update the system, it feels just like you have selected a method with the right overload and invoke it.
The need for tests that fail
Unit test never goes red when adding any amount of other systems, no matter if they are related (regarding to
[UpdateBefore/After/InGroup]) or not related. Which maybe a good or bad thing, a test that never break may not be because that the tests are strong, but because the test aren't doing their job : catch your regressions. You could say that the world regressed and not the system, so this evergreen test is still alright. But do you really want to keep it around?
System unit tests often goes red in not a useful way. For example you remove some of its tag component you previously use, because you know you have a substitute later in the frame that someone else would tag it. The unit test that you write to confirm that this one system will tag entities will be red now, which you must fix the test because of course that is no longer a part of this system. Then I realize why did I write that unit test in the first place, I never really want to know if this specific system could tag entities but rather could any systems in the world. This kind of red test is not catching your mistake but rather a housework. If this was a world test, the test about the world tagging entities after a round of update would still stay green which is a useful green.
I then feel the need to write a similar integration test that the only difference is that the system is now among others. For example after unit testing a system with a couple of
[UpdateBefore], then you also want to test the world with this system plus those systems to see how it goes together. And even more, I now want the complete world and see again if this one system still works. It creates redundant tests to fully confirm its function.
When anything break, often all 3 versions of the test will break together increasing maintenance work. At this point skipping unit test to integration test where you test the world for the purpose of checking a single system maybe a better idea that gives similar confidence with less work. But it will be more difficult to write the test.
It feels like all the unit test do is then just cement the system from changes, and you must drill out the cement (fix tests, but not because regression) for any little changes. The systems are really good at allowing you to change without affecting others. It is a shame that flexibility of modifying the system code is reduced because all systems are coupled with their unit tests. I feel like the test should just refer to data (
IComponentData ) and
World to kind of free them up.
In comparison with OOP's method calls
In OOP, unit testing all public methods of a class instance looks really promising as it covers everything it could ever perform. Arguments to the methods you see in your unit test are direct-feed, and reflects their real use because methods are getting called the same way. The essence of how unit testing methods in OOP is reliable is because of method argument, they provide an insulated environment that translate the test result to real use.
Input entities you put in the world and then run the system update in hope that the system would take them in directly, isn't working the same spoon-fed way as arguments to the public methods, as the world is very likely shared among others in real environment. Unit testing system is then much weaker than unit testing classes in OOP because you assumed the system has exclusive use of the world.
Systems has relationships in the design
An extreme case of the previous point : your
EntityCommandBufferSystem is impossible to unit test. Its very definition is to wait for commands from prior systems and depends strongly on its update position stated on itself or implicitly positioned by other systems having
Before/After relating to it. It is a modular system of purely nothing. Will you still view it as a unit? Or view it as the same unit as the system that use it? What if multiple systems use it?
The same reason for normal systems, unit test of any system that aims to use
EntityCommandBufferSystem (often with
[UpdateBefore(ECBS)] in the design) is incomplete since the command cannot playback. If you view a system as a modular unit that could stands on its own, then those update attributes sticking on its head is also a part of this unit. A real unit test should also be able to test this, but it can't. Of course it could stand on its own, but that doesn't mean it works as intended on its own.
To unit test this kind of system, you must not unit test and perform integration test instead. To still keep unit as smallest as possible, the world should have just enough systems that are really related, no more no less. But following this correct best practice was very tiring, with little gain as compared to world test where I just pour everything in. (gain = easier to achieve useful green test, though I lose much more time in setting up the test.)
I did that before, and it is a pain to prop up a world with one system plus all its ECBS. I even try to write a reflection code that scans all
UpdateBefore of a system in hope of automatically create a world for unit testing that system, but sometimes relationship is on that ECBS as an
The best way to end all these problems is an outrageous approach to just test on world with everything, which we will get to in a moment. I still remember that
[SetUp] 30+ lines code of cherry picking systems one by one for tests. It made me don't want to refactor systems since I always forget to come to the test and add a new one that would restore its old function. Suddenly I can't use all that lighbulb refactor commands the text editor provided me. I don't want to go through that again.
Sometimes the system is not working as intended because their relationship are wrong. Systems maybe modular in concept, but they often has concern about update order position relationship with others in its design, making traditional unit test not covering all of its design unlike in OOP where unit test translates well and give you much higher confidence than this.
Is the system really a good unit?
The first year I learn ECS I was under impression that it would be much easier to test than
GameObject as I see system could be developed piece by piece and therefore this piece must be a good unit for test. (It is true that it is much easier than
GameObject) But by the time I actually get to write them that I started to realize something.
In my opinion, system may not be the correct "unit" as it appears from the first glance. The fact that I need
ConstantDeltaTimeSystem to properly unit test systems with
Time usage already said something that systems are never meant to work alone, though they could be designed alone piece by piece and be added to the world one by one.
Only if that system has
[DisableAutoCreation], which means it is really designed to be proc an
Update on its own in a world with only it, like those game object conversion systems, then I agree the system is really a perfect unit.
In OOP I used to go through all
public interface of everything and make sure the tests are covering them all giving me full confidence. It is understandable why unit tests are important there. But here, going through all
EntityQuery of a system and ram in
Entity combinations and update it no longer giving full confidence as in real use it is not going to be updated alone. There are much more problems about this one system waiting when it is together with everything else.
World testing is when you write a test with specific objective about a single aspect of a single system like in system unit test, but you don't call
Update on just that system.
Instead, the world is now populated fully like in runtime : default world initialization code that places all systems (yours, and Unity's, anything assembly reflection could reach) sorted in
ComponentSystemGroup then by their relative update order in group.
Then after you setup a starting
Entity data, you run
World. (I remembered you can't do this in earlier preview version? Now you can run
Update on world.) This starting data would cause the system you are interested in to pass its
OnUpdate criteria, ideally only that one, but it may not be the case now. Extraneous updates may or may not affect the end result. You assert the result like normal but you always think of the world.
See a setup example here : https://github.com/5argon/EcsTesting/blob/master/WorldTestBase.cs
The new unit is data combinations
Instead of system, the unit is now a different combination of starting data. They cause a set of transformations of data by rounds of world update. "If this set of data exists, it became that data after the world updates".
The things who did that are systems, with an "s", working together in order you designed to cause the mutation (changes)/emission (more data than input)/filter (less data than input)/generation (new data unrelated to input). This is why I can't bring myself to see a system as a single unit to test when data is a first class object to think about in data-oriented design.
Naming the test method
The test should have reference to data (
IComponentData and friends) and
World but no system class name. The test method name may faintly referring to a system class name, but make sure when you rename the system a bit or dice up the system more, that test method name still make sense so you don't have to fix the test down to its name.
The test is still named like you are testing a single system but it should read like "how would a world with that system transform data" instead of "if this kind of entity exist then the system updates alone what would happen".
For example, if you have a system named
CpuAutoplayProcessorSystem then don't name the test method that tied strongly to the system
AutoplayProcessorDestroys2PInputEntities like you would in system unit test. (Then you create some 2P input entites, update the system, and see those entities disappears. Successful as a unit test, but not quite enough here.) But instead something like
PlayerTwoControlDisabledInCpuMode while checking the same thing.
In your mind, you know what make or break this test is gonna be that
CpuAutoplayProcessorSystem's working but you are now testing on perspective of the world. Much later when your code base grow, you may now have
CpuBehaviourSystem separately that still keep the previously written test stays green and good sounding.
It is not a unit test after all?
Trigger warning. It maybe controversial when I say anything is better than a unit test, because unit test was so strong in OOP. (As explained, testing on OOP object's
public interfaces and all allowed constructor combinations pretty much covers everything it has to offer.) This is no unit test, and you don't need to listen to me either if you feel offended.
It is maybe simply that system wasn't the right unit but rather a set of data. I may view these data and this ever growing world as a unit on my own (you don't have to mind me), but to make you comfortable I am going to call it integration test, since it integrates all systems together from viewpoint of someone who viewed system as a unit.
Integration testing sounds like something inferior to unit test and would be no replacement, but as I say in the previous section there are a lot more to desire from unit testing systems in data-oriented design unlike in OOP. I think I need to break the golden 100% unit test coverage rule.
It make writing the test harder
Testing this way is harder that you are trying to test whether a single system works, yet the test run on world, making it hard to write assertion what is the end result and harder to think of a starting data state that would cover just enough of a function of this particular system.
The test occasionally "spill" over to some other systems as time goes on, but usually the assert will not look at the result from those spillages. (There will be other world tests that does look at them and go red appropriately.) The tests are still written in small scope like a unit test, just that it updates the world that other system coexists.
You have less control that the input entity will be modified before it actually arrives at the system you are targeting or not, this control is a careful
Entity setup at start of the test. An input must be designed for the world rather than for a single system.
It however creates a dynamic test that is always changing as your code grow (in my opinion this is a positive thing), if these test could stay green after each systems are added or as
[UpdateBefore/After] are changing around, then you have got a strong confidence. One test may aim for a single system, but it lightly covers all other systems for the fact that you are updating the world.
- Integration testing can give strong confidence about system orders and how they would receive or being interfered by unexpected prior system. I caught tons of bugs about wrong
ComponentSystemGroup, even before arriving at wrong
- Catch bugs related to prior systems added later modifying starting data, this is usually command buffer systems. Often it is that you "give in" somewhere in the code (or simply tried to optimize with persistent native containers) and start relying on baking/sorting/ordering entity data and somehow the order you thought would never change is not the same. World testing can catch them.
- It allows checks on system refactoring, such as when the job code is getting long (Too many
OnCreatein one system that you run out of idea how to name the query variable, etc.) and you separate it to a new system. If the refactor was correct, the world test stays green confirming you of a successful refactor, while the unit test that was locked to a single system will be red as it is now missing the function you were expecting before. It is not missing, it was just refactored to a new system.
- Newly added systems are automatically imbued into all written tests thanks to assembly reflection that adds them into the world test. Cherry picking related systems for each tests manually was technically more best practice-sounding, but frustrating and tiring. Usually when unit testing a system, you then want all those
[UpdateBefore/After]in the world with it too because they are in its definition and you want to see their interaction. This got annoying so in the end I just test the world rather than each system, but with objective to test a specific function of each system in each test.
- Now you can write one test that give you about the same confidence as a unit test plus integration tests over multiple systems. No more fixing 2-3 red tests that are almost the same.
Worthy of its own section. World test allows you to test component changed filter, which is considered a definition of a single system yet it is difficult to unit test if we view system as a unit.
Changed filter could get you free performance! Because if it worked on its work while its source data didn't change, the mutation destination will be written with the same value and therefore useless. There are case where slapping changed filter will make the logic wrong such as your system was resetting junks produced by other systems that make the world works, according to source data that didn't change. (In this case, I suggest you redesign a bit so you could use changed filter.) You can use
.WithChangedFilter with the
ForEach lambda too. So stick them in where you could, but that's the problem. You are not sure where you could. And the test should help assuring you to get this performance.
- Change version has to be recorded as a part of world update chain inside any prior systems. You cannot fake it with
EntityManagerfrom the test code.
- When sticking with unit testing systems, you could handpick some system to run
Updateand then run your system you are unit testing
Updateto see if the changed filter works. But the process of picking one other system already feels out of unit.
- By handpicking you need to think which system actually runs earlier. System that runs later may trigger changed in the next round of update in real run, in which case it maybe a bug in real game since the change detection came a frame late. But you will not be able to catch this bug in your unit test.
- Changed filter is often on a simple system but heavy duty. Think just 3-5 lines of
OnUpdate. By adding changed filter, you have added massive chance for the system to fail as you continue building the world. If you don't have world test you will be afraid of using them.
- Remember that changed filter even works better in chain. For example, because prior system didn't catch the change it did nothing. Then because of that, all other later system that was expecting changed type from the system that didn't run will also not run. That is to say there is many more combinations of things to test if the change will finally propagate to your system or not. You can't afford to even think and hand build a world of each changed chain scenario.
- World testing on changed filters evolves as you add more systems with their own position in the world accounted automatically.
- With world testing, you can even write a test named like "this input data does nothing to system X, becuase system Y prior didn't cause the change". It is possible to say the name of 2 systems, still aiming to test 1 system, and yet not having any of their class name in the test code since we left the world auto populated.
Without world test covering a lot of your game, you will not want to add changed filters since adding one seems to break random things where those per-system unit tests stay green. Adding one successfully then some bug occurs later, you will have to randomly suspect changed filter on many systems at once. (and try comment them out would things come back to work, etc.)
This is yet another evidence that system unit test couldn't cover everything a system has to offer unlike unit testing a class public interfaces in OOP. Systems were made to work together with relationships.
- The previously written world tests appear to go red more often as you develop on other area. But in my opinion that's showing the tests are doing their purpose. System test stays green but I would rather have this kind of red because it covers more than a single system.
A system test on a door may check if it is rectangle, with knobs and usable hinge. I'm pretty sure those test will be very strong in a wrong way since those never really regress. By the time you fit the door into the house and found it was too tight you would need more overall test with the house, that anyways at the same time confirm that the door must be a rectangle. This test will seldomly go red as either the door or house changes but I glad they did.
- Hard to predict end result in the state that all systems are present. You have to carefully set input data that most of the things except what you are testing are neutral. (0,
null, etc.) This is also great when some newer systems break what you thought were neutral and insignificant to the system you are aiming to test.
- You may need to design the code so it fits with this testing method. The same way there are both testable and untestable OOP designs. Unit testing only 1 system in isolation "just works" in most case since it will not scale in difficulty however big your game has become.
You can remedy some disadvantage with developing systems in a separated UPM packages. The world test now has more limited systems while still reliving you from propping up your world differently for each tests.
Will it scale?
It is not like you must choose a faction, you can still write system tests. It is just that I now prefer world test first. Never hurt to try, and if it ends up bad I can still come back to delete this article.
Fall back to single system test when you are not able to handle the state where every systems unpredictably changing everything. When the world is crowded, separating UPM package maybe the solution to continue testing this way since ECS scans only available assembly to populate the world. You can open a new Unity project of just one UPM package to perform a world test so assembly is more limited.
Of course with tons of time you can do both unit system test and world test. But world test is only a little bit harder, then it cover what unit test did really well plus many more benefits. It is tempting to cut your dev time by half by ignoring unit system tests altogether. (Or by simply calling this world-but-pinpoint-a-system a new kind of unit test, I don't know.)
Scaling problem has been yet to be proven if it really scales that fast on bigger games or not. I am afraid I would never be working on AAA games like some of you. Therefore I cannot assure that this world test approach is the go-to for everyone. I kept trying it for about 2-3 months and it works well in my game so far, which means I could still setup a good starting data and calculate assertion result for real.
By for real, it means without cheating by taking the actual value in assertion error to fix the test, though, you should use that as a hint and think from the beginning is it possible to arrive at this value. I see that about 40% of the time, the correct test is that value. And don't worry, instead of getting lazy and blindly edit the test according to it, you often get a lightbulb moment from the assertion "Oh right right! It was because of that thing I just added! Fine.". (If you don't get a lightbulb, commit more often and just think about the last commit where all are green.) That said, never fix the test until you can walkthrough the code and arrive at that on your own.
If you can't, it maybe a sign that this kind of test is showing its weakness, or maybe your system code is too long. World test also allow you to freely dice up your system code to multiple systems, making them easier to reason, to actually help fix the red tests. In unit system test doing this would conversely make green tests go red. Green tests are then very good sanity check for any kind of refactor you want to do.
As a bonus here's an another "unit" you could consider. If you use the old skool
struct instead of
Entities.ForEach baked into the system code, you can test those jobs as a unit!
Just separate them to a new file then your test just new the
struct and add required data, then schedule and complete it. (Remember that you can create
EntityManager without any system as an input to these jobs when they wants it.) It is a better unit than systems in my opinion.