-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Contiguous access #21984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Contiguous access #21984
Conversation
|
This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).
This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.
No, the returned slices do not have any specific (other than rust's) alignment requirements. |
|
The solution looks promising to solve issue #21861. If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that. |
hymm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a full review, but onboard with the general approach in this pr. Overall this is fairly straightforward. I imagine we'll eventually want to have some simd aligned storage, but in the meantime users can probably align their components manually.
|
You added a new example but didn't add metadata for it. Please update the root Cargo.toml file. |
|
It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note. Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes. |
hymm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a review of the changes to ThinSlicePtr. I'll try to review more later.
crates/bevy_ptr/src/lib.rs
Outdated
| #[cfg(debug_assertions)] | ||
| assert!(len <= self.len, "tried to create an out-of-bounds slice"); | ||
|
|
||
| // SAFETY: The caller guarantees `len` is not greater than the length of the slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this safety comment you should also be asserting why all of these bullets points are met for from_raw_parts. The caller is only covering for the last bullet point. https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety
data must be non-null, valid for reads for len * size_of::() many bytes, and it must be properly aligned.
data must point to len consecutive properly initialized values of type T.
The memory referenced by the returned slice must not be mutated for the duration of lifetime 'a, except inside an UnsafeCell.
The total size len * size_of::() of the slice must be no larger than isize::MAX, and adding that size to data must not “wrap around” the address space. See the safety documentation of pointer::offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now
crates/bevy_ptr/src/lib.rs
Outdated
| } | ||
|
|
||
| /// Casts the slice to another type | ||
| pub fn cast<U>(&self) -> ThinSlicePtr<'a, U> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this needs to be unsafe. This has a safety requirement that T and U have the same layout and valid bit representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless converted to reference, the pointer's type doesn't have any impact. The NonNull::cast is safe as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made cast unsafe (and renamed to cast_unchecked) due to get_unchecked, as_slice_unchecked and as_mut_unchecked not having the requirements of ensuring that the bytes under slice's pointer are valid.
crates/bevy_ptr/src/lib.rs
Outdated
| /// # Safety | ||
| /// | ||
| /// - There must not be any aliases to the slice | ||
| /// - `len` must be less or equal to the length of the slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sound? we're basically converting a &[UnsafeCell<T>] to a &mut [T]. Feels like that shouldn't be sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests that mutate the change ticks? I would expect miri to catch this if we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnsafeCell<T> has the same in-memory representation as the underlying type T, hence the both slices must be represented the same as well. A non-mutable reference to UnsafeCell<T> is sound to be converted to a mutable reference of the underlying type T in an unsafe environment. One of the tests I added should now also change the ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is considered UB. This comment on UnsafeCell says that you should only get a *mut T from a UnsafeCell through get or raw_get. https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html#memory-layout
edit: I guess it might be fine since we're going through UnsafeCell::raw_get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I genuinely don't know whether it is even considered to be safe (de facto it is safe for now) to cast &[UnsafeCell<T>] to &[T] (It might be a problem when &[bool] is represented as a bitset (which is in theory possible in the future versions of rust) because &[UnsafeCell<T>] (according to the current semantics) doesn't allow the outer type to change the memory owned by the cell, thus &[UnsafeCell<T>] is not a bitset but an array of u8 each of which represent a single bool). I don't know how reasonable it is to actually take that into the consideration.
I changed the casting thing (which shouldn't have been a problem) but it still might be not fully safe in the future at least (because of the thing I said).
To actually solve this safety thing (I mentioned first) for sure, we would have to introduce a new type which wraps UnsafeCell<T> and then implements Deref<T> and DerefMut<T> (for mutable references) (both of which calls UnsafeCell's methods, which always have defined behavior) and acts like a reference to T but is an UnsafeCell<T> (basically a transparent impl of UnsafeCell<T>), which (as to my knowledge) should also work in all possible versions of rust, but then all slices of components (i.e., &[T]) will have to be &[UnsafeCellRef<T>], which adds another abstraction layer (The same for mutable slices of components &mut [T] to &mut [UnsafeCellRef<T>]) (UnsafeCellRef for now is a placeholder name of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To solve the thing you mentioned I made cast (called cast_unchecked previously) method only work for UnsafeCell<T>'s slices. The method now uses UnsafeCell::raw_get method.
It is also not marked unsafe, because as for know the program assumes that &[UnsafeCell<T>] is always the same as &[T], which is true (at least for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the question is whether I need to introduce this new transparent type or it is sufficient to convert &[UnsafeCell<T>] to &[T]?
The wrapper type shouldn't mess up with how the compiler auto-vectorizes code
crates/bevy_ecs/src/query/iter.rs
Outdated
| } | ||
| } | ||
|
|
||
| /// Returns a contiguous iter or [`None`] if contiguous access is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to make the check on the compilation stage is to make Component::STORAGE_TYPE constant be a sealed type (like Component::Mutability does) which would allow to implement ContiguousQueryData only for table components without sparse set components (Basically components are the only reason [as to my knowledge] why it has to be checked in the runtime).
Is it worth to make the Component trait more complex or is it okay for it to stay that way (i.e., be checked in the runtime)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it okay for it to stay that way (i.e., be checked in the runtime)?
I think QueryBuilder and transmute_lens will force this to be a runtime check no matter what. It's valid to create a Query<&DenseComponent> that uses sparse iteration by building with a sparse filter or transmuting from a query with a sparse filter. So it's not possible to know for certain at compile time that a query is dense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall implementation looks good, I just have a few nits and some ideas about framing.
Maybe this could be better framed as iterating over component storages, rather than specifically accessing slices of components? I think then we could support sparse components and wildcard queries as well, with a bit of work. The returned query items might also be better as typed lenses over Column (or over ReadFetch/WriteFetch maybe?), rather than transmuted into slices? That might help clean up ContiguousColumnTicks etc.
I also think the "Contiguous" naming, while accurate, adds a bit of a learning curve when we already teach users about how the ecs storage works. I think naming the things iter_storages(), StorageQuery, etc might make it them fit with the engine more generally? I care much less about this though and I'm not a maintainer, so feel free to ignore my opinion here 🙂
benches/benches/bevy_ecs/iteration/iter_simple_contiguous_avx2.rs
Outdated
Show resolved
Hide resolved
The problem with sparse components is that their iteration corresponds more to iterating over pointers to the actual data (i.e., Wildcard queries (I assume these are queries for all components of a single entity) may only be implemented for table components, because as I said either sparse set components would have to be copied or references to them would have to be put into some kind of a vector. So the current implementation uses slices because it is kind of a "first class citizen" within rust so it has a lot of functions implemented for it (slices). It might be better to make some kind of a wrapper over
In the current implementation "Contiguous" might say more (in my opinion) than just "Storage" because the whole point is that data returned lies contiguously. I have no idea of naming conventions within the engine though. Maybe "Column" might be an even better naming, because we effectively return an iterator over columns (But I find contiguous to be the one that describes the reasoning behind adding the feature in the first place the best) |
chescock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, cool, this seems like it will be really powerful!
| /// } | ||
| /// ``` | ||
| /// | ||
| /// ## Adding contiguous items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is a little hard to understand without context. It might be more clear to phrase it as something like "Supporting contiguous iteration" and then include a doc link to the contiguous_iter method.
| /// ## Adding contiguous items | ||
| /// | ||
| /// To create contiguous items additionally, the struct must be marked with the `#[query_data(contiguous(target))]` attribute, | ||
| /// where the target may be `all`, `mutable` or `immutable` (see the table above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to support creating contiguous items for only mutable or only readonly versions? Do we have any use cases where one can be contiguous and the other can't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all types T within the engine, if type T implements ContiguousQueryData then type T::ReadOnly does implement the trait as well, holds true. But:
- We cannot be sure that it is possible for all possible types implementing
ContiguousQueryDatafrom without the engine. - I did it to give an option to reduce amount of structs created by the macro.
| #[cfg(debug_assertions)] | ||
| unreachable!(); | ||
| #[cfg(not(debug_assertions))] | ||
| core::hint::unreachable_unchecked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a SAFETY comment to the unreachable_unchecked() calls? It's safe because fetch_contiguous has a safety requirement that set_table is called first, right?
| /// - The first element of [`ContiguousQueryData::Contiguous`] tuple represents all components' values in the set table. | ||
| /// - The second element of [`ContiguousQueryData::Contiguous`] tuple represents all components' ticks in the set table. | ||
| unsafe impl<T: Component<Mutability = Mutable>> ContiguousQueryData for &mut T { | ||
| type Contiguous<'w, 's> = (&'w mut [T], ContiguousComponentTicks<'w, true>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a named type instead of a tuple? Users will interact with this type during iteration, and there's no good place to document the meaning of tuple fields in an output type like this.
These are essentially slice versions of Ref and Mut, so maybe they could be named RefSlice and MutSlice?
crates/bevy_ecs/src/query/iter.rs
Outdated
| } | ||
| } | ||
|
|
||
| /// Returns a contiguous iter or [`None`] if contiguous access is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it okay for it to stay that way (i.e., be checked in the runtime)?
I think QueryBuilder and transmute_lens will force this to be a runtime check no matter what. It's valid to create a Query<&DenseComponent> that uses sparse iteration by building with a sparse filter or transmuting from a query with a sparse filter. So it's not possible to know for certain at compile time that a query is dense.
crates/bevy_ecs/src/query/iter.rs
Outdated
| // `tables` belongs to the same world that the cursor was initialized for. | ||
| // `query_state` is the state that was passed to `QueryIterationCursor::init` | ||
| // SAFETY: Refer to [`Self::next`] | ||
| unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the unsafe block to cover just the unsafe calls? When it's broad like this, it's hard to match up the safety comments to the actual calls. I think it's set_table and fetch_contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now
crates/bevy_ecs/src/query/iter.rs
Outdated
| query_state: &'s QueryState<D, F>, | ||
| ) -> Option<D::Item<'w, 's>> { | ||
| if self.is_dense { | ||
| // NOTE: If you are changing this branch's code (the self.is_dense branch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you merge this with the giant "NOTE" above that lists all the other places that need to be updated together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good
If we're bikeshedding, how about |
Objective
Enables accessing slices from tables directly via Queries.
Fixes: #21861
Solution
One new trait:
ContiguousQueryDataallows to fetch all values from tables all at once (an implementation for&Treturns a slice of components in the set table, for&mut Treturns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match thefetchimplementation))A method
as_contiguous_iterinQueryItermaking possible to iterate using these traits.Macro
QueryDatawas updated to support contiguous items whencontiguous(target)attribute is added (a target can beall,mutableandimmutable, refer to thecustom_query_paramexample)Testing
sparse_set_contiguous_querytest verifies that you can't usenext_contiguouswith sparse set componentstest_contiguous_query_datatest verifies that returned values are validbase_contiguousbenchmark (file is namediter_simple_contiguous.rs)base_no_detectionbenchmark (file is namediter_simple_no_detection.rs)base_no_detection_contiguousbenchmark (file is namediter_simple_no_detection_contiguous.rs)base_contiguous_avx2benchmark (file is namediter_simple_contiguous_avx2.rs)Showcase
Examples
contiguous_query,custom_query_paramExample
Benchmarks
Code for
basebenchmark:Iterating over 10000 entities from one table and increasing a 3-dimensional vector from component
Positionby a 3-dimensional vector from componentVelocitybypass_change_detection()methodUsing contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster
Things to think about
offsetparameter inContiguousQueryData