Jobs aren't always the fastest for all operations.
EntityManager operation mostly have to cause structural change on the main thread. Let's see how we could speed this up.
Latest update : Entities 0.14 | 19/08/2020 (https://docs.unity3d.com/Packagesfirstname.lastname@example.org)
Don't schedule a job just to queue
It is true that worker threads are great but the mistake is that
EntityCommandBuffer. It may feel like you are doing things in jobs efficiently, however you are just delaying them for later.
EntityCommandBuffer is for remembering what to do, then work on them later simply because those are not possible in a job. Unity's "foolproof" C# Jobs System requires structural change on main thread to ensure safety.
When "played back", it would be like
DestroyEntity one by one in the main thread. It is not like they are being destroyed in a job. (Pay attention to the word "command buffer". The commands are not being executed just yet.)
EntityCommandBuffer.ParallelWriter won't help you much either, you are just remembering what to do in parallel. (And
ParallelWriter blocks the other thread on write if they clashed.)
The purpose of
EntityCommandBuffer is to defer
EntityManager commands, not to speed up
EntityManager commands. And therefore this job is a complete waste of time if you schedule a job just to add/remove/delete entities via
EntityCommandBuffer. You better off just do it without the job.
However if you are already in a job that is working on something else and you are in just the right
if branch to decide what command to do, that's the right spot to queue up commands to
EntityManager in the main thread
Even iterate destroying with
EntityManager directly, naively, will be faster because you didn’t schedule a job!
The strength of these methods is that you can put whatever
NativeArray<Entity> that you may allocate by yourself and handpicked
Entity in them, from
EntityQuery, or from
NativeSlice subset of
//Add-remove types on handpicked entities. //Adding has no data, they will start on default of that type. AddComponent(NativeArray<Entity> entities, ComponentType componentType) AddComponent<T>(NativeArray<Entity> entities) RemoveComponent(NativeArray<Entity> entities, ComponentType componentType) RemoveComponent<T>(NativeArray<Entity> entities) //Mass-create. One fills the length of input array, the other returns a new array based on your count. CreateEntity(EntityArchetype archetype, NativeArray<Entity> entities) CreateEntity(EntityArchetype archetype, int entityCount, Allocator allocator) //Like create variant but clones components. Has built-in behavior //to some special components such as Prefab and LinkedEntityGroup Instantiate(Entity srcEntity, NativeArray<Entity> outputEntities) Instantiate(Entity srcEntity, int instanceCount, Allocator allocator) Instantiate(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities) //Dumbly copy even Prefab or LinkedEntityGroup CopyEntities(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities) DestroyEntity(NativeArray<Entity> entities)
The inside of these methods are bursted, don't underestimate them!
Analysis : CreateEntity
Characteristic of this method is simple since there is no
Entity input. It starts with finding existing chunks of the target archetype you want with some space left then work on that contiguous space given, then deduct from the count of creation you want. If no existing chunk, then you pay for allocating a new, fresh chunk. Therefore performance depends on the archetype's capacity
% your creation count + how many existing chunks.
Analysis : Add/RemoveComponent
If you could picture ECS memory in your head, unlike setting component, adding/removing a component is not "just". It's huge because the chunk need to be moved entirely, rearranged such that there is now more capacity behind every contiguous data types. Picture the chunk :
AAA _ _ BBB _ _ CCC _ _ , if you remove
C though it came the last, now it has to be something like
AAA _ _ _ _ BBB _ _ _ _. See that even
BBB can't stay in the old location, those are now the spare capacity for
- If you added/removed zero-sized component (tag component), you can save time since the new archetype is layout-compatible. If that destination
Chunkalready exist, it is just a matter of moving a section of memory block to a new place!
- If entity count is 10 or below, it loop in a
forloop and remove one by one on main thread.
- If over 10 entities, it enters batching operation. As for why, it is quite troublesome and overkill if the amount is low. You can input any arbitrary
Entityjumbled in the array, but the algorithm inside is trying to see which
Chunkeach one is in, at which position, then finally sort them and output them in terms of chunk-startIndex-count. The actual worker then could do faster the more contiguous entities you want to add/remove you have.
Now at least you know how to speed up this operation by including entities that came from the same chunk and are contiguous as much as possible, but still, it is troublesome for the algorithm to be able to accept a wild
NativeArray<Entity>. "Just do it" sequentially without care about chunk and whatnot is better when entities are below 10 because the intelligent sorting outweighs the benefit.
Analysis : Instantiate/Copy
Instantiate(Entity srcEntity, NativeArray<Entity> outputEntities) Instantiate(Entity srcEntity, int instanceCount, Allocator allocator) Instantiate(NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities)
The futuristic version of
MonoBehaviour instantiation. The hope of mankind. All three variants uses the same innards. Even the version that instantiate a single entity but that's not fitting to this article's topic..
instanceCount is not entirely
O(n) on the expensive part, it could work on multiple instances at the same time to the chunk's boundary. For example if your chunk is at capacity 100, instantiating 1200
Entity will have to loop allocate 12 times to clone em. (Not 1 time, or not 1200 times.) That's some of the magic of contiguous memory!
But if you wanna destroy (or other EM operations) in the job for real, then use
ExclusiveEntityTransaction is like an inverse of what normally occurs. Normally to do things to
EntityManager we have to “come back” to the main thread for a moment (at
ExclusiveEntityTransaction, we can “lock the
EntityManager” for one thread to work on and prevent the main thread from using it. At the same time the main thread of that world can go on and do other things which do not touch
Main thread of other world can touch its
EntityManager though! Remember that
EntityManager is a singleton per world, not per Unity. It manages
Entity in one world.
So the use of
ExclusiveEntityTransaction is heavily geared towards having multiple worlds. It renders 1
World near unusable (
EntityManager became busy in-job) but your other worlds may still work on with their own
EntityManager and the remaining worker threads. Now you see something that only multiple worlds can achieve! Worker threads do work stealing automatically and is a shared resource for all
But to make multiple worlds useful to each other it would requires more careful planning how to communicate.
Cross-world entity operations
Overcome limitation of
EntityManager with multiple worlds. Unlock full power with
ExclusiveEntityTransaction. What's left is the final glue of moving entities from one
EntityManager to another. (That is, one
World to another.)
public void MoveEntitiesFrom(EntityManager srcEntities) public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities) public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping) public void MoveEntitiesFrom(EntityManager srcEntities, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping) public void MoveEntitiesFrom(EntityManager srcEntities, EntityQuery filter) public void MoveEntitiesFrom(EntityManager srcEntities, EntityQuery filter, NativeArray<EntityRemapUtility.EntityRemapInfo> entityRemapping) public void MoveEntitiesFrom(out NativeArray<Entity> output, EntityManager srcEntities, EntityQuery filter)
That's a ton of overloads...
Entity from the other world will not retain their
Version, since each
EntityManager manages its own increasing indexes. It would be disastrous if we move world and it randomly overrides
Entity that is already occupying the same index.
Therefore this operation will perform a remapping. After you read that article, it is now clear that the first 4 overloads are just : do you want the remapping result in
output? and do you want to provide your own "remapping workspace" of
entityRemapping or let it allocate then destroy it completely inside? (You may reuse the workspace array if you have a lot of moves to do.)
The last 2 overloads allows you to move only some entities, not all. The "some" is by selecting relevant chunks with
EntityQuery. Remember that
EntityQuery must be made from the world owning things you want to move!
EntityQuery is not interchangeable between worlds.
These methods from roughly looking, runs a job that cuts off queried chunk pointers (or everything if no
EntityQuery) and hand it to the 2nd
EntityManager . No copy or anything. This way you could "get rid" of entities quickly, or just setting up for
ExclusiveEntityTransaction. This is why moving has huge advantage over copy.
Though don't overestimate it, still the
Entity are technically new in the destination even though the method is called "move" and need some reservation. "Entities are just indexes", yes, but they need the right place for their data that the index could point to.
And also the filtered version (
EntityQuery version) is not just filtered in the entity selection phase. There are many more subtle details in moving that all needs to be filtered as well. The deepest code path of all version and filtered version is completely different though the code is similar! They are heavily jobified, and the jobs without filter is certainly leaner.
public void CopyEntitiesFrom(EntityManager srcEntityManager, NativeArray<Entity> srcEntities, NativeArray<Entity> outputEntities = default) public void CopyAndReplaceEntitiesFrom(EntityManager srcEntityManager)
Not to confuse with
CopyEntities which performs in the same manager.
The 1st overload... actually call that
CopyEntities to the same manager. But the copied entities are then tagged
IsolateCopiedEntities component and the rest you may already could deduce, it use the same code as
EntityQuery (that had been told the
Remember how "move" versions could be considered "copy" since
Entity are created newly then we try to move component data over. This 2nd overload
CopyAndReplaceEntitiesFrom is an on steroid version. Since "replace", unlike "move", does not have to care about remapping anymore, the operation can retain everything down to
Entity identity from the source world. The code comment said it could be used for deterministic rollback. We can easily backup and restore an entire world. It's quite a special method we got here.
EntityQuery overload of
Any time you use the
EntityQuery instead of
Entity or an array of it, you are doing it to everything in a whole chunk at the same time instead of to each entities one by one.
EntityQuery could return multiple chunks, if you have so many
Entity that it spans several chunks on one
EntityQuery. (I like to think that the type name is called "
Here’s all of them you can use.
//Add just the type without data, all data starts at default for that struct. //If tag component, no need to reshape the chunk's memory arragnement. AddComponent(EntityQuery entityQuery, ComponentType componentType) public void AddComponent<T>(EntityQuery entityQuery) //Add with tailor-made data for each entities matched in the EQ. //Requires extreme accuracy of your target entity for each data! //Length mismatch will throw. AddComponentData<T>(EntityQuery entityQuery, NativeArray<T> componentArray) where T : struct, IComponentData //If removing tag component, no need to reshape the chunk's memory arrangement. RemoveComponent(EntityQuery entityQuery, ComponentType componentType) RemoveComponent<T>(EntityQuery entityQuery) //You can add/remove multiple types at once too. AddComponent(EntityQuery entityQuery, ComponentTypes types) RemoveComponent(EntityQuery entityQuery, ComponentTypes types) //It is natural that chunk component operation could be done to multiple chunks at the same time.. AddChunkComponentData<T>(EntityQuery entityQuery, T componentData) where T : unmanaged, IComponentData RemoveChunkComponentData<T>(EntityQuery entityQuery) //Swaps SCD value for the whole chunk without any data movement. //Similar performance to tag component add-remove. AddSharedComponentData<T>(EntityQuery entityQuery, T componentData) where T : struct, ISharedComponentData //Very good one as it just throw the matched chunk away in a big unit. DestroyEntity(EntityQuery entityQuery) //There is NativeArray<EntityManager> but no EntityQuery overload for CreateEntity, for very logical reason.
NativeArray<Entity> overload? These are much better, since things like add and remove is really happening on the whole chunk and not a single entity is moving. The data shape inside each entity will need to be moved around however if the component you add/remove is not a tag (has size).
AddSharedComponentData why there is no
NativeArray version? Because this kind of component actually stays on the chunk and not on any individual entity. Having an
EntityQuery version is natural. If
NativeArray version exists it is likely not any better than you loop the array and do it one by one.
RemoveChunkComponentData these are obviously a chunk thing and only logical to have only
EntityQuery version and not
You can read the analysis for
NativeArray version above, in the very end of each, they likely wanted to work in terms of an array of "chunk + start index + count". Now though, there is no preprocessing required to get that shape of workload, since
EntityQuery already get you chunks! Start index is 0. Count is that whole chunk. Therefore, I think it is quite safe to assume
EntityQuery is faster.
To illustrate the difference, both
NativeArray<Entity> ended up using the same
EntityComponentStoreCreateDestroyEntities.cs. However that method wants 3 things : which
Chunk, start index, and count. By
EntityQuery, of course it is a chunk and start at 0 to the end of chunk. Easy! In the case of
NativeArray<Entity>, you may have input some completely arbitrary entities, not in the same chunk, or maybe some of them in the same chunk? There is a code that try to figure out contiguous entities in the array so it could do chunk-start-length together. (Now you know how to at least use the
NativeArray<Entity> version fast, if you must.) But the point is operating in chunks should be faster in most case, it is just too rough for some surgical operations.
Read more about why tag component has better performance on some of these operation here : https://gametorrahod.com/tag-component/.
It even works when the
EntityQuery has been
.SetFilter -ed too. So you can even do selective mass operation based on your
SharedComponentData or your
.SetFilterChanged criteria. Just don’t forget to
.ResetFilter when you want to make the query go back to normal.
As an example, I have 10000 of things to show but only a subset (1000) of them is visible+processed at any given time (governed by
Process tag component) and this subset move forward from 0,1000, 2000, 3000, ... until the end.
So for each 1000 entities I add
ISharedComponentData with integer 1~10, when it is time to remove all
Process of the previous 1000 entities and add to the next 1000 at once, I could achieve that with 1
RemoveComponent and 1
AddComponent with filtered
EntityQuery (add filter, do remove, add new filter, do add, reset the filter if needed) instead of 2000 iterations.
Main article about SCD and its filter ability : https://gametorrahod.com/everything-about-isharedcomponentdata#filtering
You can plan a deferred multi-chunk operation. The
EntityQuery required is a
class, so it is only for out-of-job
EntityQuery is already fast. But remember it creates a sync point right in the middle of frame that complete all jobs. If you don't need the operation's result right now then it is 100% better playback later in chunks. Nothing's better than a chunk based operation that also not disturbing running jobs. Sync point completes ALL jobs. (Not just jobs related to the operation that cause the sync, unfortunately.) Therefore if you decided to do so you are troubling all other systems too.
- You don't need the result entirely in this frame, because it signifies the completed work of this system or clean up of this system. A typical scenario is a system that work on newcomer entities once, then tag it so the next round they are not worked on anymore. (Has
ComponentType.Excludeof that tag.) You know this is the only system that care about this tag, therefore you should make your
BeginInitializationEntityCommandBufferSystem. This target is the best considering the sync point position. In other words, make it effective the next frame. It is also common for system that looks to "consume" some message entity and make them disappear by destroying them by throwing chunks away. You can queue destroy command to the begin init target.
- If you kinda need it for various systems that you are not yet think carefully about it, think now if those systems could be classified into
PresentationSystemGroupor not. If you could, then target your playback on
EndSimulationEntityCommandBufferSystem. Keep in mind that if you do this, then no kind of jobs could survive the border from simulation to presentation update flow. The best position of sync point is always going to be the beginning of the next frame.
EndPresentationEntityCommandBufferSystemavailable at some point in time, but since that is the same in meaning as
BeginInitializationEntityCommandBufferSystemUnity removed it.)
- The last resort is to introduce your own
EntityCommandBufferSystem, usually you do this when you want to paste
[UpdateAfter(YourECBS)]on systems among simulation group because you can't afford to not have this data available for more systems to compute on (that they may then use
EndSimulationEntityCommandBufferSystemthis time), and finally produce something for folks at presentation group.
It is also usable in
Jobs.WithCode that ends with
.Run() and with
Deferred query timing
But beware that deferred
EntityQuery command will consider the query at playback time, not at command enqueue time. (The command really remember the query and not query result.) For example, you just used
ForEach and you used
WithStoreEntityQueryInField to take the query out for use with
EntityCommandBuffer just below that target the begin init. You expect the things you just worked on in the
ForEach above to receive the command. But that may not be the case if by the time you arrive at the next frame, the entities received more changes and no longer match the query!
Here's a "tag flashing" pattern I used. The idea is that usually when you tag as a message so later system do something once, the message receiver has to remove the tag to signify the message has been received. If the tag is designed to be consumed by multiple systems, it could be a hassle whose responsibility to remove it and reduce modularity. Regardless,
EntityQuery overload is ideal for this kind of pattern since it is cheap to tag the chunk.
Several other tactics include the tagger need to clean it up the next frame so the message only works for one round. That also maybe a bit more hassle. Instead I could leave the removal work to ECB playback. Have the tagger enqueue the removal right at the line I tag and target the begin init to minimize job completion impact. Therefore the message last until the beginning of the next frame and "automatically disappear". It is now impossible to forget removing the tag and anyone later than this system in the same frame are ensured to receive the message once.
//Imagine a deserializer system that produce fresh entities. //All entities just deserialized receive Deserialized tag. //But there are more work //to be done and could bloat this deserializer system. Instead, //I have more systems that UpdateAfter this one to add in their own //"post-deserialize" task. Therefore, I would like them to be able to //just `RequireForUpdate` the `PostProcessNeeded` tag. EntityManager.AddComponent<PostProcessNeeded>(deserializedQuery); var ecb = ecbs.CreateCommandBuffer(); ecb.RemoveComponent<PostProcessNeeded>(deserializedQuery); ... EntityManager.RemoveComponent<Deserialized>(deserializedQuery);
It sounds like a neat idea to "flash" the
PostProcessNeeded tag component to everything I just worked on in
deserializedQuery, except that I forgot that I remove the
Deserialized that make the query work right after. At the beginning of the next frame, the tag wouldn't be cleaned up since query no longer match. The tag stays and systems continue working on
PostProcessNeeded every frame.
Either you use non
EntityQuery version so command enqueue remember each individual
Entity instead. (They are now free to change archetype as they like, the command no longer cares. But it is no longer a batched command.) Or also use ECB for the
RemoveComponent with the
Where is the batched set?
So far everything deals with component type addition or removal. If it is an add, they all starts with default value. It's understandable though, since how could the API know which individual
Entity wants which value of your new component? It's true that all
Set variants of
EntityManager do not have
The closest is this
AddComponentData(EntityQuery entityQuery, NativeArray componentArray) where T : struct, IComponentData overload which could have data (though it is not a set, it is an add.) But its implementation is just getting
EntityArray of that query, then iterate setting each data of the same index. You think there must be a more "batch" way to do this.
These methods are available on
public void CopyFromComponentDataArray<T>(NativeArray<T> componentDataArray) public void CopyFromComponentDataArrayAsync<T>(NativeArray<T> componentDataArray, out JobHandle jobhandle)
Typical usage of
This is a method on
EntityQuery instead of
EntityManager that could potentially do what's equivalent to batched set.
It seemed to be designed symmetrically with
ToComponentDataArray so a
NativeArray of equal length that was copied from could be applied back with modifications. The apply back is quite efficient because the API will schedule a parallel
unsafe job for you. Each job copy a linear memory bluntly across boundary of each
Entity. The only thing it can't cross is to an another chunk, which the other threads are probably working on in parallel.
But this apply back is also quite brittle. Remember that the ECS database may be segmented into multiple chunks, but
NativeArray is linear. It is thanks to the fact that there is an
entityOffset of each chunk coming into
IJobChunk inside this
CopyFromComponentDataArray method that could be gymnastic-ed to map perfectly with linearized
NativeArray. So that's how it knows which value goes to which
Entity without even having a dictionary-style
NativeHashMap<Entity, T> in order to apply back. If you just made the array from
ToComponentDataArray, then apply back before any chunk changed, you are automatically good to go.
Advanced usage of
However you are now going to use
CopyFromComponentDataArray without prior
ToComponentDataArray. First of all you will have to maintain a
NativeArray that has the same length as linearized entity of that
Of course you could do
ToComponentDataArray to kickstart this then proceed to add the same repeating value for a fast reset, or different values as you like. But its length will only be up to date only at this moment. For this solution to work, you must be very disciplined that no new entity of this archetype would be added. (
CopyFromComponentDataArray has a throw on length mismatch.) If done correctly, the
NativeArray of component is your portal to batch set multiple components efficiently. You must know which element in the array goes to which
Entity though to perform this gymnastic.
I think this is quite dangerous because even the length matches but the entity switch places by whatever phenomenon (like
Entity.Version reuse), the copy back will pass but may not land on an
Entity you are expecting.
Bonus : memory deserialization
There is still one more way even faster than anything else, and it is only possible because of the beauty of data-oriented design. Unity could serialize chunk memory as-is to a file. If we deserialize back this file and put this memory back, you instantly get everything back in a way that even
EntityManager would be confused what just happened. It's almost like I could grab my brain out and put in an another person later and it instantly functions perfectly like me. This direct memory loading approach is also the basis of ECS Subscene system.
ECS serialization is currently not documented much maybe because it is still not stable, but you could access it from
SerializeUtility. For now I don't have time to benchmark this vs. the previous best performer
CopyFromComponentDataArray, but to have a memory to deserialize in the first place it means that the values must be known before runtime. It maybe less flexible.