Ayfid: Ayfid: This is not true. Components that are not accessed are not loaded at all.

Not loaded by your code sure, but they are loaded into the CPU cache because the memory is adjacent, which eats up a significant amount of cache because you aren’t accessing that memory, which then makes accessing the rest even slower, as well as the strides of memory loaded in the closer cache’s won’t even hold a single entity at times, etc… etc…

Ayfid: Ayfid: Component changes do not happen frequently in heavily modded minecraft games.

Vanilla MC does not use an ECS. Vanilla MC PE uses EnTT interestingly, however I’ve no clue in what form they use it (I would expect quick component swapping in a variety of areas, especially the AI to determine actions based on various states, unless they went the java version way in which everything is being tested every single tick, yay efficiency…)/

Ayfid: Ayfid: A tiny percentage of the total entities in a scene will go through a state transition which is well represented via a component change within an individual frame, whereas every entity is iterated over numerous times. A typical FPS game, for example, does not have more than a fraction of a percent of its active entities go through some kind of major state transition which will cause it to change its behaviour, within a frame.

It’s just just behaviour changes, components also often hold various data like what it’s targeting, what’s targeting it, what navmesh path to use, lots of events, etc… etc… A lot of which can change extremely rapidly on a lot of things. With many components in use that means that a lot of data is being needlessly copied.

Ayfid: Ayfid: More importantly, I think you are overestimating the performance differences between the two designs when it comes to these kinds of use cases:

I’ve used both styles in the past (mostly between the late 90’s to early 2010’s, real life slowed down my work on such things the past 6 or so years), and archetype style is faster in certain styles of engines, however those engines are very inflexible when you start stepping outside their design as the performance hit becomes rather extreme, and if you are doing to do that pattern anyway it’s often better just to use the dataflow pattern straight as referenced in my prior post as it will have even less overhead.

Ayfid: Ayfid: In cases where your component add/remove will move the entity in or out of an (owned) group, the work swapping the entity’s components around the group boundry is fundamentally very similar to the work required to move the entity’s components between archetypes. In this case, the two designs converge into performing essentially the same operation.

They are definitely not similar as in archetype based you are moving all components of an entity and in a group based you are only moving referenced grouping that is not already ordered (which if something keeps getting enabled/disabled it’s often already in order anyway) on significantly fewer components.

Ayfid: Ayfid: You only gain fast component add/remove performance in cases where the component does not belong to a group. In such cases, you only retain high iteration performance in loops which only access that one component - something which will be extremely rare if you are using these high-frequency components as marker tags for state transitions. If a loop requires access to any other component, then it will perform indirection. The per-entity lookup into an indirection table, followed by an index into the actual component data array, will be significantly slower than a linear iteration through Option components with a branch inside the loop body. In comparison to that, insertions and deletions are also actually far slower than setting the option.

Not at all, components being added and removed is still extremely fast even when they are in groups. Only the groups the component is in is touched at all and it will only touch the components for those groups, which is generally always a significant memory amount different between an archetype model and a grouped model.

Such an indirection is not that expensive either (it is in mine because I’m not using unsafe code, though in an actual release then unsafe code would be used in a few of these parts to remove useless bounds checks); in the standard group view there are no conditionals, just a couple of offsets (additions) and a lookup in the index then main array. Specs for example does the lookups faster thanks to use of unsafe on certain calls (and I could get faster than it as this system knows more information about the structure). Even then secondary index lookups are comparatively rare as the hot paths should always be owned groups, which is a pattern that holds in users of EnTT (both games and engines based on EnTT).

Ayfid: Ayfid: So there is not really any realistic situation where the grouped design gives you better performance when you are using high-frequency marker types to transition entity state. Either you have good iteration performance, and insertion times converge in both designs, or you would be better off just using Option .

There are quite a significant number of realistic situations where the grouped design is faster when using high frequency marker types (do note, for a ZST marker type the secondary index ‘is’ the test store of if it exists or not, there is no other data stored so there is no indirection either, or if you only want to know if it exists without actually accessing the data as well) as there is significantly less data to copy on change as well as iteration is still faster than an archetype style due to significantly lower stride length (stride length == data size in a grouped ECS, stride length is significantly larger than data size in an archetype ECS, which is why the iteration benchmarks always show grouped is faster than archetype), among other design choices that can be made in certain situations that can’t be done in archetype styles.

Ayfid: Ayfid: It is actually easier to mix usage of Option s for high-frequency state marking and component addition/removal for low frequency states in order to perform better entity culling in the loop with an archetype model than a group model, because you don’t need to worry about any combination of states pushing you out of the fast path due to incompatible group definitions (not to mention the combinatorial explosion of such group definitions).

Using Option has a three-fold speed hit:

You are iterating over things you don’t care about. You are adding in a conditional to check if it is None or Some when a conditional is not needed (and a potentially very random conditional, so that’s awesome to blow the CPU pipeline out). You have yet even more data that you don’t care about to stride over in every-single-other-iteration elsewhere.

Ayfid: Ayfid: You most certainly would not want to do that! By attaching additional components onto an entity, you are associating both data and new behaviour with that entity. New behaviour which is not needed. Entities which don’t rotate are now all having their transforms calculated with a much more expensive computation than they need. All your entities with a position are now having their acceleration and velocity integrated, even though most entities in a scene are static and will have 0 velocity. The extra work being performed here will totally overshadow any performance differences between the two ECS architectures.

Transformations include rotation already, that’s part of being a transformation. Unless you meant a Translation component (linear translations without rotation or scale), which then means you have gotta pack it up into a transformation matrix anyway for rendering, which just makes that slower (and translating a transformation matrix isn’t any slower than translating a vector in entity space). I’ve never seen rotation separated from translation, I’ve always only seen it packed into a transformation matrix (generally with a changed flag in non-ECS’s or ECS’s that don’t have update indicators for updating to the GPU on change, and this step is much more costly when the values are separated).

Entities with a position would not have acceleration and velocity integrated. Combining Position and Orientation and Scale into a single Transformation component (say, a 4x4 matrix) does not mean combining in Velocity and Acceleration (which those two themselves would also generally be combined into a single PhysicsMotion component or so). And making a group of Position/Orientation/Transform just means that you have all the information needed to render the object, then making another group of Position/Orientation/Transform/Velocity/Acceleration just means that you can perform physics over everything that needs to have physics done (else remove the Velocity/Acceleration)

Where you say:

All your entities with a position are now having their acceleration and velocity integrated, even though most entities in a scene are static and will have 0 velocity.

Makes absolutely no sense or I’m confused about what you are saying. A grouping of Position/Orientation/Transform/Velocity/Acceleration does not mean that Position/Orientation/Transform is slower, and you absolutely should not have Velocity/Acceleration on things that are not moving at all. Remember that subgroups are just that, a group within a group, and since Position/Orientation/Transform already contains everything that Position/Orientation/Transform/Velocity/Acceleration has then Position/Orientation/Transform/Velocity/Acceleration can just sort those Position/Orientation/Transform/Velocity/Acceleration entities within the group (a sorted region within another group, which is just a sorted region in the overall thing, or as many layers as you want). It does not mean they are connected in any other way.

Considering you are saying things like:

All your entities with a position are now having their acceleration and velocity integrated, even though most entities in a scene are static and will have 0 velocity.

Makes me think there is a failure in understanding how grouping works. Entities that don’t move shouldn’t have such physics components, and I don’t understand why you would require them on everything with positions?

Ayfid: Ayfid: Additionally, if you placed all position/orientation/transform into a single component, you are going to cause most loops which access this component to pull in far more data than they need. Most logic which needs position, for example, only needs the float3 translation - not the orientation and certainly not a full 4x4 32bit float transformation matrix. That matrix alone is an entire cache line.

For something like, say, a Block Position in minecraft, you’d absolutely not have rotation (well, maybe a 24 state rotation for rendering and interaction purposes, but that’s parallel to this), and as such it would be it’s own component distinct from entities that have a full transformation (like a zombie). Even if for some reason someone gave an integral block position a full floating point transformation matrix the data stride still significantly easily fits into cache with loading times of less than 100ns per cache line load, which will happen much faster than working over that data considering it’s only 64 bytes, though you definitely should not combine those as they are conceptually very different in the game.

In addition, even on my 13 year old desktop CPU, it’s L1 cache stride length is 128 bytes that takes 3 cycles to fill, much faster than it matters, and newer CPU’s are much much better, especially for aligned data such as these.

Ayfid: Ayfid: Splitting data into granular pieces so that you only load data you need, and only iterating over the entities which actually need the computation you are performing are perhaps the central tenents of data oriented design that the ECS tries to facillitate. If your ECS is pushing you to organise your code sub-optimally in order to stay on its fast path, then that is a critical failure of the ECS’s design.

Which is precisely what archetype does wrong, all data is packed together with huge stride sizes. This is shown in benchmark case 8:

Entity-Iterate-10000/EnRS-OwningGroup/8 time: [71.216 us 71.800 us 72.502 us] Entity-Iterate-10000/Legion/8 time: [106.27 us 107.23 us 108.25 us]

(107.23:71.8 is almost 50% faster for the owned groups)

Which should be an entirely optimal case for legion, the reason’s mine is faster is:

Significantly smaller stride length as I can actually get loaded into cache just what I want. I don’t have to test the component sets on the chunks for disparate archetypes (not an issue on this test as every entity has identical components, which is what case 7 shows by having 2^5 (32) different archetypes:

Entity-Iterate-10000/EnRS-OwningGroup/7 time: [1.6935 us 1.6998 us 1.7059 us] Entity-Iterate-10000/Legion/7 time: [3.5821 us 3.6087 us 3.6373 us]

(3.6087:1.6998, which is ~212% faster)

This is a very important bit about archetype ECS’s:

And as you add more archetypes (of which there would be thousands if not tens of thousands in some of my engines then it will just get slower and slower uniformly throughout, even without doing anything different. Archetypes doesn’t matter for owned groups, what does matter is how good the programmer can structure the data (which isn’t at all hard once used to it). In comparison even the slowest of all accesses in grouped systems (secondary index on a completely unsorted pool) is still only constant (as are all the other accesses).

Ayfid: Ayfid: Excepting my above objection, which largely applies to this group layout, this is only possible in a very limited case where you can arrange your two loops such that one is running over a superset of the other. What about the other hundred loops in the game which access position? What about your AI code, gameplay logic, rendering, etc? A component can only be owned by one group, and all other cases perform indirection (some worse than others) and so are on the slow path. It is impossible to optimise for all cases, but the grouped design will eke out a small amount of extra performance in one (or two) loops at the expense of performing terrible everywhere else.

Excepting my above objection, which largely applies to this group layout, this is only possible in a very limited case where you can arrange your two loops such that one is running over a superset of the other.

This is an extremely common case in mine and others experiences in many games and engines (I invite you to the EnTT Gitter chat). Sure the programmer has to actually think about how to structure their data better, but that’s what they have to do anyway, and in exchange they get faster speed and more direct code.

What about the other hundred loops in the game which access position? What about your AI code, gameplay logic, rendering, etc?

This is actually why the transformation components are usually at the ‘bottom’ of a set of groups, so that you can group over them with a huge variety of other components as well, gaining perfect iteration with perfect strides the whole way. You can structure a surprisingly huge amount of the engine this way.

Ayfid: Ayfid: A component can only be owned by one group, and all other cases perform indirection (some worse than others) and so are on the slow path.

The ‘owned’ bit is very common. You can have many subgroups, not just a single layer, but potentially many, in addition looking up in a secondary index cross groups only involves a single indirection for anything accessed with that group, not one per component. In my benchmarks the ownedgroup test and the indirect test are literally the two extremes, best case and worst case scenarios, in reality it will fall between and most often very near the ownedgroup side as the hotpaths in the program should always be owned groups. It is under the programmers control instead of just hoping that something else takes care of it with its constantly growing access cost as the archetype count and stride sizes increase.

It is impossible to optimise for all cases, but the grouped design will eke out a small amount of extra performance in one (or two) loops at the expense of performing terrible everywhere else.

Not one or two, the significant amount of loops will have full owned performance. If only one or two out of the usually hundreds of iterations are owned then the engine is entirely backwards designed. Take for example a minecraft-like engine, you wouldn’t have a, say, a tube block constantly asking for the block at the next position, that would be extremely slow, not even the mods of Minecraft do that as it is dreadfully slow, they cache the pointer to the TE that sits beside them, just as you’d do in an ECS style version of the engine as well, you’d hold the entity that sits next to you and update it when it changes, just as MC mods do now.

Ayfid: Ayfid: They are an order of magnitude slower in your own benchmark.

Yes, one order, and that is because I’m using completely safe rust even when I absolutely know that accesses have no bounds issues but the compiler absolutely does not know that (confirmed by looking at the assembly). There are multiple things in these paths that I can do to speed them up by using unsafe that is entirely safe, I’m #![forbid(unsafe_code)] right now to speed it up as much as I can that way first, then and only then will I add unsafe code in areas I know to be entirely safe. The fact it’s already this speed when it’s hitting secondary indexes when it doesn’t need to be and more is still astounding to me, although the C++ version is still a lot faster on those accesses, at worst I should be able to match its speed (although my owned group access is faster than in C++, likely due to Rust’s aliasing guarantees). Even if they were never ever optimized they are still entirely sufficient for the non-hot-paths, else groups should be used.

Ayfid: Ayfid: That is certainly not how any of it is described in the EnTT blog posts. If you do not keep the components that are part of a group sync’d in the same order for each entity, then you cannot iterate through the slices together. You would need to look up the correct index in an indirection table of the sparce set and then perform a random access. Doing so throws away your performance, just the same as if an archetypal ECS somehow had a unique layout for literally every entity.

That is certainly not how any of it is described in the EnTT blog posts.

Wait what? Can you link that post? I know his blogs talk about a lot of Ideas and not necessarily his implementation, but when his blog posts speak of EnTT specifically he’s usually quite clear about it, so definitely link the article and where at in the article and I’ll tell him about it.

If you do not keep the components that are part of a group sync’d in the same order for each entity, then you cannot iterate through the slices together.

The ‘entire’ pools aren’t kept synced, only the (in my and EnTT’s implementations) the ‘end’ of the packed pools are kept sorted, only the part that the group cares about is sorted at all, the rest can be sorted an entirely different way or left unsorted.

just the same as if an archetypal ECS somehow had a unique layout for literally every entity.

Which can happen with enough permutations of enough components. ^.^

Ayfid: Ayfid: To my understanding, this is what unowned groups do. The only advantage of unowned components in a group vs not being in a group at all was, as far I as could see, that the iterator does not need to iterate over all components of one type (whichever has the fewest instances) and perform the indirection check against all other components. Instead the iterator can assume that all entities in the group have all components. The accesses are still out of order, though, so this still performs poorly.

Unowned groups are fairly rare, generally you will have some combination of owned and unowned (and excluded), but even in a pure unowned case the group still knows beyond any shadow of a doubt that the pools contain the entities, so the second index lookup becomes pure math, no conditionals, no CPU pipelining failures (although in my pure safe version there are bounds check, however the rest of my lookup code has no conditionals on those cases, I haven’t shown unowned benchmarks yet, need to make the coding API pretty for them still), which becomes ‘almost’ as fast as pure iterated for low component counts and can be as worse as 100cycle load per component access on them for the worst case of accessing components in a huge pool that doesn’t entirely fit in cache and accessing it in such a way so you only access what’s not in cache, which in real life is actually crazy rare, it’s normally fairly close to owned iteration times within a multiple or two on average.

So even pure unowned groups are a great deal faster than the worst case of an indirection lookup and test for existence every time like my Indirect test does (quite literally the worst case access in entirely safe rust, lots and lots of conditionals and tests and lookups all over the place, I’m honestly surprised at how fast it is already, it definitely has quite a number of optimization opportunities with unsafe code).

Ayfid: Ayfid: No, that is not extremely common. It is common to have entities with maybe 50 or so components because, say, your AI nagivation code alone uses a set of 12 components (and there are a few modules interacting with that entity at a similar scale). However, most entities with this AI navigation uses those same 12 components, or only 2 or 3 different variants. The number of unique layouts is not anywhere near that large, and you typically have a healthy number of entities in each chunk, with only a very small number of poorly occupied chunks.

For the AI systems I’ve dealt with in the past there were around 150 components just for handling AI, significantly not 12 (even minecraft, which is not an ECS, allocates almost a hundred classes to handle its comparatively meager AI, each mob type has a different ‘archetype’ of them, which in modded worlds is many hundreds in most setups, or in some cases like infernal mobs can be thousands to tens of thousands of ‘archetypes’ as the effects it adds are dozens of different AI handlers that can be added in any permutation randomly on any spawned entity).

Ayfid: Ayfid: Take for example your average FPS. You might have an entity which represents, say, a chair prop. That chair will likely have a transformation matrix component, a model, perhaps a handle to a rigid body in the physics engine, and maybe a component which indicates the material for the sound engine to play the correct sound effects if the player bumps into it. That is about it; a handful of components. Entities like this make up the majority of entities in the scene. There are going to be many other props, in different locations and with different models and physics representations, all with the same layout. That layout also virtually never changes. Maybe if the player shoots the chair, it breaks. This would likely be implemented by deleting the entity and replacing it with a few “chair parts” entities.

I’m not much for FPS games but I would imagine a chair like a Static entity with no functionality, sure it’s common but it’s not where the cost is going to be as it does nothing anyway until, say, interacted with or receives a physics event or so (which I’d imagine would then ‘add’ physics functionality on it, rather then giving it from the start and letting it sit there eating its tiny bit of the physics simulation, but eh, I’m not an FPS’er so unsure what is common there). And nicely these kinds of things would be in an owned subgroup, so they’d be iterated perfectly, no chunk jumps or anything needed (although why they’d need to be iterated all over I’m unsure…).

Ayfid: Ayfid: Even highly dynamic entities do not typically go through component layout changes. Say your game is a multiplayer game with 20 players. They are all firing machine guns simultaneously. You represent every bullet fired with an entity (which is totally reasonable). Each frame, you might have at most 20 new bullets fired, with more likely just 1 or 2. You create a handful of bullet entities. All of the active bullet entities update their positions, they update their transforms, they are rendered by a particle system, the audio engine plays the next frame of their audio and spacalises it at their current position. Some of those bullets hit something in this frame. New “bullet hit event” entities are spawned and those bullets are deleted. At no point was a component added or removed from an existing entity. However, multiple loops in multiple different modules iterated through all of those bullets.

Even highly dynamic entities do not typically go through component layout changes.

Highly dynamic entities are generally swapping components on and off them all the time for various state handling. Like for an FPS I’d imagine that, for example, there is a Health component, that holds their max health, and a Damaged component that doesn’t exist when they aren’t damaged, that component would hold how damaged they are and perhaps when they last took damage. There could be a system that iterates over entities that have the Health and Damaged component but not the Regenning component and check the time, if it’s more than, say, 2 seconds (or whatever FPS games time for HP regeneration to start nowadays) then add a Regenning component, of which another system operates over entities with a Health, Damaged, and Regenning component and heals the damage until it’s fully healed, at which point it removes the damaged and regenning components.

And that example is just one of many many that I would forsee being on just a normal FPS player entity. All kinds of events and states and actions should be components, so systems only operate on them if they are interested in them, you want to remove as many conditional checks as possible, especially the more random ones, and reduce memory loading (which an archetype ECS fails horribly horribly at due to loading huge amounts of unwanted data into the cache due to the stride lines).

Ayfid: Ayfid: I am not sure what kind of games you build for these toy games, they sound like they might being doing something rather interesting. But they are certainly not typical.

I tend to make sandboxes like heavily modded minecraft (though mostly in 2d, only ever made a 3d one with procedurally generated cubic planets and solar systems and such down to life on each thing) or have made factorio style things long in the past. Mostly I’ve made self-running simulations, I like watching things unfold and evolve under their own code and genetic algorithms. ^.^

So yeah, perhaps not typical, but intensely fun for me and my friends and family. I need to get my old CVS server running again to pull off its old projects and migrate my SVN server to git someday…

Ayfid: Ayfid: Allocating memory in pages is something orthogonal to grouped vs archetypal.

Quite true.