Memory awareness in data-oriented design (Part 1 : Overview)

When programming data-oriented you need to take this word more literally. You always think and know where is your data and how they are accessed by machine at every single step.

Memory awareness in data-oriented design (Part 1 : Overview)

When programming data-oriented you need to take this word more literally. You always think and know where is your data and how they are accessed by machine at every single step.

This sounds daunting, and you may even think it shouldn't be your jobs. (It should be compiler's? Why not spend time making an actual game? As in BaaS provider lingo.)

But that's OOP. The code is tuned for human for easier problem mapping, easier teaching, and so on. You feel like you are making games. The performance is left to what the compiler could do best from that code.

Here we are instead programming for the machine. Data is not the problem domain. The thinking you need to change to, is that your game is this data. So you are also making games in data-oriented and not wasting time. And the game comes out fast because machine like it, and therefore fun, then your players will like it too. Think always how machine would use your data.

This series is to be in 2 parts. This first part touch only lightly with Unity Entities (ECS) package, but in general about problems that data-oriented design would solve only if you know what the data looks like. Therefore in general it is usable with any data-oriented libraries e.g. should Unreal comes up with it in the future, or you would like to DOD with simple array in Unreal or even in non-games work of any programming languages.

The 2nd part will be heavy towards Unity ECS API, to highlight what kind of memory you get from it. (So you are now fully aware, like the title says.)

ECS : It helps dealing with DATA

Photo by Ali Yahya / Unsplash

I would like to add emphasis on data. It may sounds like ECS would help us map our problem easier, like in OOP. And while they sometimes serve that purpose you shouldn't rely on that.

Use data as it is. It doesn't have to represent a thing. In OOP, you can stop thinking about inner level as soon as you finished abstracting your theoretical object. Not here, you never stop thinking about data no matter how beautiful ECS library would convince you to take a break from data.

Real abilities of E C and S

  • E - Entity : Coming from OOP, you immediately think "alright, so this is my new thing".
    That works, but you rather think of Entity of just an indexer to a specific data, which at the same time . You are the data modeler, not imaginary world builder anymore.
    By thinking in OOP, you may feel uneasy when an actual thing would need more than 1 Entity to work efficiently. In reality, you may choose to use multiple Entity just for the reason that you want the data placed differently. This is perfectly valid data-oriented design. While in OOP the code would smell as you have a little fragment of what should have been the same object inside something else.
  • C - Component : Coming from OOP you also think this would be each attribute in the object, except detachable and reusable to other object. This one works OOP-style the most I think.
    But don't let this distract you from the actual fact that components are linearized after each other before going to the next. This is an actual most important property of the C that help you design data. Try not to design your little world-in-the-code like before.
  • S - System : Coming from OOP this would be your new game loop that somehow knows all objects (which is now Entity, in OOP thinking) in the game and could query them freely, then you can work on components of that object. Neat! In fact I know many that wanted to use ECS library not for performance but for this convenience of freely slotting in systems and have it work on any data.
    But again! The real property of system is that they could get a specific section of memory efficiently, and then perform read or write. (When read, maybe it could be in parallel with other systems.) You need to picture your cache as this happens. And it doesn't have to be related to Entity at all, so your OOP thinking mentioned is not perfect. The system just grab you linear memory of components you want. It is just that the Entity component is also available nearby if you want them. Surprise! Entity is not a thing anymore. It is one of the data. It is actually C! Though the API has many utilities to utilize this Entity that it deserve a special place.

I came into DOTS kind of thinking there would be convenient API that helps me abstract away and deal with actual problems easier.

No, those API deals with data exclusively. ECS is a level above pure data-oriented design, since we don't have to malloc a memory or array of everything machine friendly ourselves. And it helps with mix and matching data to solve the problem. But you should stop here. The API doesn't help convince you that data is a thing.

E is actually not a thing. C is actually not a data stored in that thing. (This word "in" alone is so OOP!) S is actually not getting a thing. But they sometimes work that way depending on how you think. Rather, E and C are both data. And S get those data. Everything is data! I know the official documentation may says otherwise, but the best mindset is that you are aware of what actaully happen, then you can be in whatever abstraction (E is a thing, etc.) you want while keeping performance a top priority.

3 hidden performances that were there all along

Photo by Aisha Askhadova / Unsplash

This is in my opinion the top appeal to use ECS. I could get more from my player's same device. The phone could do much more, but our game sucks so it couldn't. This article will be aiming towards maximizing all of them by being aware of data.

Caches are used unaesthetically

In OOP as we check this field to write that field of whatever object, this is a bunch of random access. When access is not random then cache is happy. Cache is explained in the next section. But in short we are going to really use them as we code this time. Not "accidentally" use them like in OOP. (Your code comment could even mention these kind of things if you are the type that like to type comments to talk with yourself.)

Threads are cold

Moore's law slowed down and instead more threads are added. It would be a shame to not use these. Game engine like Unity or Unreal often not letting us do game-specific work on threads explicitly.

Therefore how the threads are used is often by engine feature that your game "accidentally" touch. The engine may request number of threads (worker threads) and keep using them without requesting more.

In Unity, maybe culling or UI layout, object's transform, CPU particles, or some audio mixing works. Unreal seems to be able to thread your game better, since they have a codegen system Blueprint which generate the code you didn't have to write, therefore those code could be wired up to work on threads safely. In Unity, the main chunk of work in MonoBehaviour we write manually are unfortunately main thread only.

This changes with C# Jobs System which finally let you explicitly borrow worker threads to perform any jobs, with tight restrictions so it is still fool-proof to use. Unreal hadn't offer any manual thread usage like this yet other than spawn your own from C++, which is a nice direction to a fully threaded game made with game engine.

Threads are good because they work in parallel, but that's also why they are hard to automatically used because it is difficult for game engine to know which part of data could be parallelize or work at the same time, which need to wait for which. (Read/write dependency.) See the "data" appearing now? Data-oriented design therefore enables threading. We could orient them in a good shape for sharing work. No OOP concept could handle this line of thinking.

It then depends on strategy of ECS libraries how would we do that. The most primitive no-library approach is possible. For example a summing task over int[] nums could be "data-oriented" by you making int[] nums1 int[] nums2 instead (for no OOP reason, we are doing data-oriented), then you send one to the first thread and another to the 2nd thread. When both are done, sum the two together. Safe threading! But labor work. What if your game could automatically be like this all over the place?

With Unity ECS, we section ALL data into chunks. Chunk gives clear boundary of data to increase threading potential. (And also faster query) Even in the chunk, there are clear boundaries of linear component memories, and also we have a strong typing system that let the ECS backend knows which type are getting read or write at which moment.

All these combined, it allows systems to automatically work on data on threads safely. Data-oriented design enables threading, since threading successfully depends on how safe it is to work on which memory area. I assume you know roughly what is the chunk of Unity ECS. If not yet, you can visit this article or the official documentation.

Assembly codes are bad

When we write in OOP the compiler could only do much. The compiler won't get to see tight for loop over closely located memory like in data-oriented design. Many defensive algorithms will be used without the programmer even knowing, since that's OOP's strength in the first place. Best code for human, but just passable code for compiler. Logic remains correct after passed through the compiler, not sure about performance.

In data-oriented design we are already coding for machine. The compiler could see right away that this could be optimized further. for loop could be vectorized and loop in a bigger unit with SIMD instructions.

Unity Burst compiler along with other constraints solve this. We are not worrying about this much in this article. For in depth workings of how your could would become assembly in Unity ECS, check out this article.

Next, we need some primers for the first hidden performance, the cache.

Cache primer

Photo by Dawn Armfield / Unsplash

Cache are small but damn fast, and there are multiple levels of them before arriving at RAM (slow). Caches are baked with your CPU and you can't buy cache sticks to upgrade unlike RAM sticks. (But there is something similar.) They should be treasured as much as high clocked CPU. You can think of RAM as junk when compared to them.

When read or writing data, of course we choose the fastest destination. The closest cache would be checked first. But its size couldn't hold all memory locations needed for our app. Therefore it is instead just a part of bigger cache and eventually RAM. (All variable allocation in your app should fit in a RAM somehow, or then you can use the VERY SLOW disk.)

This "part of" therefore need some rules who will get to stay or go. Basically, when you request access of a memory address then that memory get to stay.

Write : At application's start you would write first or else there would be nothing to read from. This write maybe a data read from disk (not a scratch memory). This write therefore write into the closest cache which is still empty. If you write too much, the data need to be saved into larger cache and eventually RAM to make space for new write.

When want to read something, also we check the fastest cache of that memory. If it is still there then read it. If not, find it from larger cache and eventually RAM.

The read/write need a size. (You don't read 1 bit or 1 byte each time..) Of course there are many size of variables like int or long that suggest each read or write would be able to be this granular. But for efficiency there is a fixed size block of cache line. A cache line is typically 64 bytes. (The article assume this size from now on, but it is not always the case) It is then up to you to access a little part of it after requested the block.

The mentioned block lined up in grids. If you read an int (4 bytes), you don't always read this 4 bytes plus 60 bytes more after it. Imagine that this int is in the middle of grid,

Therefore if you read an int in a struct that contains float or bool right next to it, then if you also need that float or bool a bit after, you get them for free. (Free as in ensured already there in the fastest cache by the previous read/write.)

This is where you wasted a lot of free reads in OOP since you keep picking things out with . notation without caring about memory. You only care about objects and how it works, not how it performs.

The program may even predict and read more than you want (further than the cache line size you get for free) in place like for loop because it knows you are going to need them. Accessing memory next to each other and use them fully get you performance.

There are many policies of cache replacement. But some patterns form : if you access something often then it is likely to be fast. (the migration)  If you access something closer to the previous thing then it is also likely to be fast. (the cache line and prediction) This is temporal and spatial locality.

In ECS, we will exploit all these to the best of our abilities. Use the fastest cache, ensure less cache replacement, abuse localities. And by using the library with memory awareness we can get the performance.

Upcoming in part 2 : Unity ECS

With all knowledges in place, the next article will be by using Unity Entities package API while being aware of what's happening at every line of code. And then we would top it up with some example situations and see if you could reason about memory.

If you want to know more about data-oriented design in general without diving into the memory produced by Unity's API yet, check out this very good book :

Or you could read this previous article. I think it wasn't clear enough so it leads me to write this 2-part article as an upgrade. (still deciding if I should delete that one) But still I would like to say similar thing : you are designing memory area in DOD.