Missed autovectorization for slice.iter.fold, works for slice.iter.copied.fold

I was looking into optimizing a function that checks that all values in a slice are in range. It is not that surprising that the version with `all` does not get optimized because returning early (although in theory rust should be allowed to read more elements from the slice before breaking), but it is surprising that adding `copied` before folding makes a difference in autovectorization.

Sample code (https://rust.godbolt.org/z/5eznWbMcf):

```rust
pub fn check_range_all(keys: &[u32], max: u32) -> bool {
    keys.iter().all(|x| *x < max)
}

pub fn check_range_fold(keys: &[u32], max: u32) -> bool {
    keys.iter().fold(true, |a, x| a && *x < max)
}

pub fn check_range_copied_fold(keys: &[u32], max: u32) -> bool {
    keys.iter().copied().fold(true, |a, x| a && x < max)
}
```

- `check_range_all` compares one element per loop iteration, using `copied` does not change the assembly at all (both functions are merged)
- `check_range_fold` unrolls the check 8 times, each iteration it branchless, but does not use any vector instructions 
- `check_range_copied_fold` uses `avx` instructions and checks 32 elements per loop iteration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed autovectorization for slice.iter.fold, works for slice.iter.copied.fold #113789

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missed autovectorization for slice.iter.fold, works for slice.iter.copied.fold #113789

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions