Batched operations on EntityManager
Jobs aren't always the fastest for all operations. EntityManager operation mostly have to cause structural change on the main thread. Let's see how we could speed this up.
Jobs aren't always the fastest for all operations. EntityManager
operation mostly have to cause structural change on the main thread. Let's see how we could speed this up.
Latest update : Entities 0.14 | 19/08/2020 (https://docs.unity3d.com/Packages/com.unity.entities@0.14)
Don't schedule a job just to queue EntityCommandBuffer
It is true that worker threads are great but the mistake is that EntityCommandBuffer
. It may feel like you are doing things in jobs efficiently, however you are just delaying them for later.
EntityCommandBuffer
is for remembering what to do, then work on them later simply because those are not possible in a job. Unity's "foolproof" C# Jobs System requires structural change on main thread to ensure safety.
When "played back", it would be like EntityManager
doing DestroyEntity
one by one in the main thread. It is not like they are being destroyed in a job. (Pay attention to the word "command buffer". The commands are not being executed just yet.)
No, EntityCommandBuffer.ParallelWriter
won't help you much either, you are just remembering what to do in parallel. (And ParallelWriter
blocks the other thread on write if they clashed.)
The purpose of EntityCommandBuffer
is to defer EntityManager
commands, not to speed up EntityManager
commands. And therefore this job is a complete waste of time if you schedule a job just to add/remove/delete entities via EntityCommandBuffer
. You better off just do it without the job.
However if you are already in a job that is working on something else and you are in just the right if
branch to decide what command to do, that's the right spot to queue up commands to EntityCommandBuffer
.
Just use EntityManager
in the main thread
Even iterate destroying with EntityManager
directly, naively, will be faster because you didn’t schedule a job!
Use NativeArray
overload
The strength of these methods is that you can put whatever Entity
in NativeArray<Entity>
that you may allocate by yourself and handpicked Entity
in them, from ToEntityArray
of EntityQuery
, or from NativeSlice
subset of NativeArray<Entity
.
//Add-remove types on handpicked entities.
//Adding has no data, they will start on default of that type.
AddComponent(NativeArray<Entity> entities, ComponentType componentType)
AddComponent<T>(NativeArray<Entity> entities)
RemoveComponent(NativeArray<Entity> entities, ComponentType componentType)
RemoveComponent<T>(NativeArray<Entity> entities)
//Mass-create. One fills the length of input array, the other returns a new array based on your count.
CreateEntity(EntityArchetype archetype, NativeArray<Entity> entities)
CreateEntity(EntityArchetype archetype, int entityCount, Allocator allocator)
//Like create variant but clones components. Has built-in behavior
//to some special components such as Prefab and LinkedEntityGroup
Instantiate(Entity srcEntity, NativeArray<Entity> outputEntities)
Instantiate(Entity srcEntity, int instanceCount, Allocator allocator)
Instantiate(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities)
//Dumbly copy even Prefab or LinkedEntityGroup
CopyEntities(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities)
DestroyEntity(NativeArray<Entity> entities)
The inside of these methods are bursted, don't underestimate them!
Analysis : CreateEntity
Characteristic of this method is simple since there is no Entity
input. It starts with finding existing chunks of the target archetype you want with some space left then work on that contiguous space given, then deduct from the count of creation you want. If no existing chunk, then you pay for allocating a new, fresh chunk. Therefore performance depends on the archetype's capacity %
your creation count + how many existing chunks.
Analysis : Add/RemoveComponent
If you could picture ECS memory in your head, unlike setting component, adding/removing a component is not "just". It's huge because the chunk need to be moved entirely, rearranged such that there is now more capacity behind every contiguous data types. Picture the chunk : AAA _ _ BBB _ _ CCC _ _
, if you remove C
though it came the last, now it has to be something like AAA _ _ _ _ BBB _ _ _ _
. See that even BBB
can't stay in the old location, those are now the spare capacity for A
component.
- If you added/removed zero-sized component (tag component), you can save time since the new archetype is layout-compatible. If that destination
Chunk
already exist, it is just a matter of moving a section of memory block to a new place! - If entity count is 10 or below, it loop in a
for
loop and remove one by one on main thread. - If over 10 entities, it enters batching operation. As for why, it is quite troublesome and overkill if the amount is low. You can input any arbitrary
Entity
jumbled in the array, but the algorithm inside is trying to see whichChunk
each one is in, at which position, then finally sort them and output them in terms of chunk-startIndex-count. The actual worker then could do faster the more contiguous entities you want to add/remove you have.
Now at least you know how to speed up this operation by including entities that came from the same chunk and are contiguous as much as possible, but still, it is troublesome for the algorithm to be able to accept a wildNativeArray<Entity>
. "Just do it" sequentially without care about chunk and whatnot is better when entities are below 10 because the intelligent sorting outweighs the benefit.
Analysis : Instantiate/Copy
Instantiate(Entity srcEntity, NativeArray<Entity> outputEntities)
Instantiate(Entity srcEntity, int instanceCount, Allocator allocator)
Instantiate(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities)
The futuristic version of MonoBehaviour
instantiation. The hope of mankind. All three variants uses the same innards. Even the version that instantiate a single entity but that's not fitting to this article's topic..
Increasing instanceCount
is not entirely O(n)
on the expensive part, it could work on multiple instances at the same time to the chunk's boundary. For example if your chunk is at capacity 100, instantiating 1200 Entity
will have to loop allocate 12 times to clone em. (Not 1 time, or not 1200 times.) That's some of the magic of contiguous memory!
Use ExclusiveEntityTransaction
But if you wanna destroy (or other EM operations) in the job for real, then use ExclusiveEntityTransaction
.
ExclusiveEntityTransaction
is like an inverse of what normally occurs. Normally to do things to EntityManager
we have to “come back” to the main thread for a moment (at EntityCommandBufferSystem
, at ComponentSystem
, etc.)
With ExclusiveEntityTransaction
, we can “lock the EntityManager
” for one thread to work on and prevent the main thread from using it. At the same time the main thread of that world can go on and do other things which do not touch EntityManager
.
Main thread of other world can touch its EntityManager
though! Remember that EntityManager
is a singleton per world, not per Unity. It manages Entity
in one world.
So the use of ExclusiveEntityTransaction
is heavily geared towards having multiple worlds. It renders 1 World
near unusable (EntityManager
became busy in-job) but your other worlds may still work on with their own EntityManager
and the remaining worker threads. Now you see something that only multiple worlds can achieve! Worker threads do work stealing automatically and is a shared resource for all World
.
But to make multiple worlds useful to each other it would requires more careful planning how to communicate.
Cross-world entity operations
Overcome limitation of EntityManager
with multiple worlds. Unlock full power with ExclusiveEntityTransaction
. What's left is the final glue of moving entities from one EntityManager
to another. (That is, one World
to another.)
MoveEntitiesFrom
public void MoveEntitiesFrom(EntityManager srcEntities)
public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities)
public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping)
public void MoveEntitiesFrom(EntityManager srcEntities, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping)
public void MoveEntitiesFrom(EntityManager srcEntities, EntityQuery filter)
public void MoveEntitiesFrom(EntityManager srcEntities, EntityQuery filter, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping)
public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities, EntityQuery filter)
That's a ton of overloads...
Moving Entity
from the other world will not retain their Index
and Version
, since each World
's EntityManager
manages its own increasing indexes. It would be disastrous if we move world and it randomly overrides Entity
that is already occupying the same index.
Therefore this operation will perform a remapping. After you read that article, it is now clear that the first 4 overloads are just : do you want the remapping result in output
? and do you want to provide your own "remapping workspace" of entityRemapping
or let it allocate then destroy it completely inside? (You may reuse the workspace array if you have a lot of moves to do.)
The last 2 overloads allows you to move only some entities, not all. The "some" is by selecting relevant chunks with EntityQuery
. Remember that EntityQuery
must be made from the world owning things you want to move! EntityQuery
is not interchangeable between worlds.
These methods from roughly looking, runs a job that cuts off queried chunk pointers (or everything if no EntityQuery
) and hand it to the 2nd EntityManager
. No copy or anything. This way you could "get rid" of entities quickly, or just setting up for ExclusiveEntityTransaction
. This is why moving has huge advantage over copy.
Though don't overestimate it, still the Entity
are technically new in the destination even though the method is called "move" and need some reservation. "Entities are just indexes", yes, but they need the right place for their data that the index could point to.
And also the filtered version (EntityQuery
version) is not just filtered in the entity selection phase. There are many more subtle details in moving that all needs to be filtered as well. The deepest code path of all version and filtered version is completely different though the code is similar! They are heavily jobified, and the jobs without filter is certainly leaner.
CopyEntitiesFrom
public void CopyEntitiesFrom(EntityManager srcEntityManager, NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities = default)
public void CopyAndReplaceEntitiesFrom(EntityManager srcEntityManager)
Not to confuse with CopyEntities
which performs in the same manager.
The 1st overload... actually call that CopyEntities
to the same manager. But the copied entities are then tagged IsolateCopiedEntities
component and the rest you may already could deduce, it use the same code as MoveEntitiesFrom
with EntityQuery
(that had been told the IsolateCopiedEntities
component.)
Remember how "move" versions could be considered "copy" since Entity
are created newly then we try to move component data over. This 2nd overload CopyAndReplaceEntitiesFrom
is an on steroid version. Since "replace", unlike "move", does not have to care about remapping anymore, the operation can retain everything down to Entity
identity from the source world. The code comment said it could be used for deterministic rollback. We can easily backup and restore an entire world. It's quite a special method we got here.
Use the EntityQuery
overload of EntityManager
Any time you use the EntityManager
with EntityQuery
instead of Entity
or an array of it, you are doing it to everything in a whole chunk at the same time instead of to each entities one by one.
Remember that EntityQuery
could return multiple chunks, if you have so many Entity
that it spans several chunks on one EntityQuery
. (I like to think that the type name is called "ChunkQuery
".)
Here’s all of them you can use.
//Add just the type without data, all data starts at default for that struct.
//If tag component, no need to reshape the chunk's memory arragnement.
AddComponent(EntityQuery entityQuery, ComponentType componentType)
public void AddComponent<T>(EntityQuery entityQuery)
//Add with tailor-made data for each entities matched in the EQ.
//Requires extreme accuracy of your target entity for each data!
//Length mismatch will throw.
AddComponentData<T>(EntityQuery entityQuery, NativeArray<T> componentArray) where T : struct, IComponentData
//If removing tag component, no need to reshape the chunk's memory arrangement.
RemoveComponent(EntityQuery entityQuery, ComponentType componentType)
RemoveComponent<T>(EntityQuery entityQuery)
//You can add/remove multiple types at once too.
AddComponent(EntityQuery entityQuery, ComponentTypes types)
RemoveComponent(EntityQuery entityQuery, ComponentTypes types)
//It is natural that chunk component operation could be done to multiple chunks at the same time..
AddChunkComponentData<T>(EntityQuery entityQuery, T componentData) where T : unmanaged, IComponentData
RemoveChunkComponentData<T>(EntityQuery entityQuery)
//Swaps SCD value for the whole chunk without any data movement.
//Similar performance to tag component add-remove.
AddSharedComponentData<T>(EntityQuery entityQuery, T componentData) where T : struct, ISharedComponentData
//Very good one as it just throw the matched chunk away in a big unit.
DestroyEntity(EntityQuery entityQuery)
//There is NativeArray<EntityManager> but no EntityQuery overload for CreateEntity, for very logical reason.
VS NativeArray<Entity>
overload? These are much better, since things like add and remove is really happening on the whole chunk and not a single entity is moving. The data shape inside each entity will need to be moved around however if the component you add/remove is not a tag (has size).
As for AddSharedComponentData
why there is no NativeArray
version? Because this kind of component actually stays on the chunk and not on any individual entity. Having an EntityQuery
version is natural. If NativeArray
version exists it is likely not any better than you loop the array and do it one by one.
AddChunkComponentData
, RemoveChunkComponentData
these are obviously a chunk thing and only logical to have only EntityQuery
version and not NativeArray<Entity>
.
Analysis
You can read the analysis for NativeArray
version above, in the very end of each, they likely wanted to work in terms of an array of "chunk + start index + count". Now though, there is no preprocessing required to get that shape of workload, since EntityQuery
already get you chunks! Start index is 0. Count is that whole chunk. Therefore, I think it is quite safe to assume EntityQuery
is faster.
To illustrate the difference, both DestroyEntity
of EntityQuery
and NativeArray<Entity>
ended up using the same DestroyBatch
in EntityComponentStoreCreateDestroyEntities.cs
. However that method wants 3 things : which Chunk
, start index, and count. By EntityQuery
, of course it is a chunk and start at 0 to the end of chunk. Easy! In the case of NativeArray<Entity>
, you may have input some completely arbitrary entities, not in the same chunk, or maybe some of them in the same chunk? There is a code that try to figure out contiguous entities in the array so it could do chunk-start-length together. (Now you know how to at least use the NativeArray<Entity>
version fast, if you must.) But the point is operating in chunks should be faster in most case, it is just too rough for some surgical operations.
Read more about why tag component has better performance on some of these operation here : https://gametorrahod.com/tag-component/.
Together with ISharedComponentData
filter
It even works when the EntityQuery
has been .SetFilter
-ed too. So you can even do selective mass operation based on your SharedComponentData
or your .SetFilterChanged
criteria. Just don’t forget to .ResetFilter
when you want to make the query go back to normal.
As an example, I have 10000 of things to show but only a subset (1000) of them is visible+processed at any given time (governed by Process
tag component) and this subset move forward from 0,1000, 2000, 3000, ... until the end.
So for each 1000 entities I add Group
ISharedComponentData
with integer 1~10, when it is time to remove all Process
of the previous 1000 entities and add to the next 1000 at once, I could achieve that with 1 RemoveComponent
and 1 AddComponent
with filtered EntityQuery
(add filter, do remove, add new filter, do add, reset the filter if needed) instead of 2000 iterations.
Main article about SCD and its filter ability : https://gametorrahod.com/everything-about-isharedcomponentdata#filtering
EntityCommandBuffer
with EntityQuery
You can plan a deferred multi-chunk operation. The EntityQuery
required is a class
, so it is only for out-of-job EntityCommandBuffer
.
EntityManager
with EntityQuery
is already fast. But remember it creates a sync point right in the middle of frame that complete all jobs. If you don't need the operation's result right now then it is 100% better playback later in chunks. Nothing's better than a chunk based operation that also not disturbing running jobs. Sync point completes ALL jobs. (Not just jobs related to the operation that cause the sync, unfortunately.) Therefore if you decided to do so you are troubling all other systems too.
- You don't need the result entirely in this frame, because it signifies the completed work of this system or clean up of this system. A typical scenario is a system that work on newcomer entities once, then tag it so the next round they are not worked on anymore. (Has
ComponentType.Exclude
of that tag.) You know this is the only system that care about this tag, therefore you should make yourEntityCommandBuffer
fromBeginInitializationEntityCommandBufferSystem
. This target is the best considering the sync point position. In other words, make it effective the next frame. It is also common for system that looks to "consume" some message entity and make them disappear by destroying them by throwing chunks away. You can queue destroy command to the begin init target. - If you kinda need it for various systems that you are not yet think carefully about it, think now if those systems could be classified into
PresentationSystemGroup
or not. If you could, then target your playback onEndSimulationEntityCommandBufferSystem
. Keep in mind that if you do this, then no kind of jobs could survive the border from simulation to presentation update flow. The best position of sync point is always going to be the beginning of the next frame.
(There wasEndPresentationEntityCommandBufferSystem
available at some point in time, but since that is the same in meaning asBeginInitializationEntityCommandBufferSystem
Unity removed it.) - The last resort is to introduce your own
EntityCommandBufferSystem
, usually you do this when you want to paste[UpdateAfter(YourECBS)]
on systems among simulation group because you can't afford to not have this data available for more systems to compute on (that they may then useEndSimulationEntityCommandBufferSystem
this time), and finally produce something for folks at presentation group.
It is also usable in Entities.ForEach
or Jobs.WithCode
that ends with .Run()
and with .WithStructuralChange
.
Deferred query timing
But beware that deferred EntityQuery
command will consider the query at playback time, not at command enqueue time. (The command really remember the query and not query result.) For example, you just used ForEach
and you used WithStoreEntityQueryInField
to take the query out for use with EntityCommandBuffer
just below that target the begin init. You expect the things you just worked on in the ForEach
above to receive the command. But that may not be the case if by the time you arrive at the next frame, the entities received more changes and no longer match the query!
Here's a "tag flashing" pattern I used. The idea is that usually when you tag as a message so later system do something once, the message receiver has to remove the tag to signify the message has been received. If the tag is designed to be consumed by multiple systems, it could be a hassle whose responsibility to remove it and reduce modularity. Regardless, EntityQuery
overload is ideal for this kind of pattern since it is cheap to tag the chunk.
Several other tactics include the tagger need to clean it up the next frame so the message only works for one round. That also maybe a bit more hassle. Instead I could leave the removal work to ECB playback. Have the tagger enqueue the removal right at the line I tag and target the begin init to minimize job completion impact. Therefore the message last until the beginning of the next frame and "automatically disappear". It is now impossible to forget removing the tag and anyone later than this system in the same frame are ensured to receive the message once.
//Imagine a deserializer system that produce fresh entities.
//All entities just deserialized receive Deserialized tag.
//But there are more work
//to be done and could bloat this deserializer system. Instead,
//I have more systems that UpdateAfter this one to add in their own
//"post-deserialize" task. Therefore, I would like them to be able to
//just `RequireForUpdate` the `PostProcessNeeded` tag.
EntityManager.AddComponent<PostProcessNeeded>(deserializedQuery);
var ecb = ecbs.CreateCommandBuffer();
ecb.RemoveComponent<PostProcessNeeded>(deserializedQuery);
...
EntityManager.RemoveComponent<Deserialized>(deserializedQuery);
It sounds like a neat idea to "flash" the PostProcessNeeded
tag component to everything I just worked on in deserializedQuery
, except that I forgot that I remove the Deserialized
that make the query work right after. At the beginning of the next frame, the tag wouldn't be cleaned up since query no longer match. The tag stays and systems continue working on PostProcessNeeded
every frame.
Either you use non EntityQuery
version so command enqueue remember each individual Entity
instead. (They are now free to change archetype as they like, the command no longer cares. But it is no longer a batched command.) Or also use ECB for the RemoveComponent
with the EntityQuery
overload.
Where is the batched set?
So far everything deals with component type addition or removal. If it is an add, they all starts with default value. It's understandable though, since how could the API know which individual Entity
wants which value of your new component? It's true that all Set
variants of EntityManager
do not have EntityQuery
overload.
The closest is this AddComponentData(EntityQuery entityQuery, NativeArray componentArray) where T : struct, IComponentData
overload which could have data (though it is not a set, it is an add.) But its implementation is just getting EntityArray
of that query, then iterate setting each data of the same index. You think there must be a more "batch" way to do this.
These methods are available on EntityQuery
:
public void CopyFromComponentDataArray<T>(NativeArray<T> componentDataArray)
public void CopyFromComponentDataArrayAsync<T>(NativeArray<T> componentDataArray, out JobHandle jobhandle)
Typical usage of EntityQuery.CopyFromComponentDataArray
This is a method on EntityQuery
instead of EntityManager
that could potentially do what's equivalent to batched set.
It seemed to be designed symmetrically with ToComponentDataArray
so a NativeArray
of equal length that was copied from could be applied back with modifications. The apply back is quite efficient because the API will schedule a parallel unsafe
job for you. Each job copy a linear memory bluntly across boundary of each Entity
. The only thing it can't cross is to an another chunk, which the other threads are probably working on in parallel.
But this apply back is also quite brittle. Remember that the ECS database may be segmented into multiple chunks, but NativeArray
is linear. It is thanks to the fact that there is an entityOffset
of each chunk coming into IJobChunk
inside this CopyFromComponentDataArray
method that could be gymnastic-ed to map perfectly with linearized NativeArray
. So that's how it knows which value goes to which Entity
without even having a dictionary-style NativeHashMap<Entity, T>
in order to apply back. If you just made the array from ToComponentDataArray
, then apply back before any chunk changed, you are automatically good to go.
Advanced usage of EntityQuery.CopyFromComponentDataArray
However you are now going to use CopyFromComponentDataArray
without prior ToComponentDataArray
. First of all you will have to maintain a NativeArray
that has the same length as linearized entity of that EntityQuery
.
Of course you could do ToComponentDataArray
to kickstart this then proceed to add the same repeating value for a fast reset, or different values as you like. But its length will only be up to date only at this moment. For this solution to work, you must be very disciplined that no new entity of this archetype would be added. ( CopyFromComponentDataArray
has a throw on length mismatch.) If done correctly, the NativeArray
of component is your portal to batch set multiple components efficiently. You must know which element in the array goes to which Entity
though to perform this gymnastic.
I think this is quite dangerous because even the length matches but the entity switch places by whatever phenomenon (like Entity.Version
reuse), the copy back will pass but may not land on an Entity
you are expecting.
Bonus : memory deserialization
There is still one more way even faster than anything else, and it is only possible because of the beauty of data-oriented design. Unity could serialize chunk memory as-is to a file. If we deserialize back this file and put this memory back, you instantly get everything back in a way that even EntityManager
would be confused what just happened. It's almost like I could grab my brain out and put in an another person later and it instantly functions perfectly like me. This direct memory loading approach is also the basis of ECS Subscene system.
ECS serialization is currently not documented much maybe because it is still not stable, but you could access it from SerializeUtility
. For now I don't have time to benchmark this vs. the previous best performer CopyFromComponentDataArray
, but to have a memory to deserialize in the first place it means that the values must be known before runtime. It maybe less flexible.