Rust 2019

The Rust team encouraged people to write blog posts reflecting on Rust in 2018 and proposing goals and directions for 2019. Here’s mine.

This is knowingly blatantly focused on the niche that is immediately relevant to my work. I don’t even pretend this to represent any kind of overall big picture.

Rust in 2018

In my Rust 2018 post, I had these items:

simd-Style SIMD
Rust bool in FFI is C _Bool
Debug Info for Code Expanded from Macros
Non-Nightly Benchmarking
GUI for rr replay
Tool for Understanding What LLVM Did with a Given Function

As far as I know, the kind of tool I wanted for understading what LLVM did does not exist in a way that does not involve extracting a minimized case with the dependencies for copying and pasting to rust.godbolt.org. After one goes through the effort of making a Compiler Explorer-compatible extract, the tool is great, though. I don’t know if the feature existed a year ago, but Compiler Explorer now has tooltips that explain what assembly instructions do, so I’d rate this old wish half fulfilled. (Got the asm explanations but didn’t get to avoid manually extracting the code under scrutiny.)

I’ve been told that GUIs for rr exist and work. However, I got stuck with cgdb (launch with rr replay --debugger=/usr/bin/cgdb --no-redirect-output; thanks to Thomas McGuire and David Faure of KDAB for that incantation), because it has worked well for me, and the Python+browser front end that was recommended to me did not work right away. (I should try again.)

Also, Rust bool is now documented to have size_of 1 and the proposal to make the compiler complain about bool in FFI has been abandoned. 🎉

Cool Things in 2018 That I Did Not Ask For

Looking back at 2018 beyond what I wrote in my Rust 2018 post, I am particularly happy about these features making it to non-nightly Rust:

Non-lexical lifetimes
align_to
chunks_exact

Non-lexical lifetimes is a huge boost for the ergonomics of the language. I hope the people who previously turned away from Rust due to the borrow checker will be willing to try again.

align_to makes it easier to write more obviously correct optimizations that look at byte buffers one register at a time. A bit disappointly, the previous sentence cannot say “safe code”, because align_to is still unsafe. It would be nice if there was a safe version with a trait bound on the types requiring types whose all bit patterns are defined and then having primitive integers and SIMD vectors with primitive integer lane types implement the relevant marker trait. (I.e. exposing endianness would be considered safe like integer overflow is considered safe.)

I expect chunks_exact to be relevant to writing safe SIMD code.

Carry-Overs from 2018

Some items from a year ago are not done.

Non-Nightly Benchmarking

The library support for the cargo bench feature has been in the state “basically, the design is problematic, but we haven’t had anyone work through those issues yet” since 2015. It’s a useful feature nonetheless. Like I said a year ago, it’s time to let go of the possibility of tweaking it for elegance and just let users use it on non-nighly Rust.

Debug Info for Code Expanded from Macros

No news on this RFC.

Portable SIMD

A lot of work has been done on this topic in the past year, which is great. Thank you! Instead of the design of the simd crate, the design and implementation is proceeding in the packed_simd crate. I wish that packed_simd with its into_bits feature enabled becomes code::simd / std::simd and available on non-nightly Rust in 2019.

A year ago I wished that core::arch / std::arch did not become available on non-nightly Rust before core::simd / std::simd out of concern that vendor-specific SIMD shipping before portable SIMD would unnecessarily skew the ecosystem towards the incumbent (Intel). I think it is too early to assess if the concern was valid.

New Items

In addition to reiterating the old items, I do have some new ones, too.

Compiling the Standard Library with User Settings

At present, when you compile a Rust artifact, your own code and the crates your code depends on get compiled, but the standard library is taken as a pre-compiled library. This is problematic especially with SIMD functionality moving to the standard library.

32-bit CPU architectures like x86, ARM, PowerPC and MIPS introduced SIMD during the evolution of the instruction set architecture. Therefore, unlike in the case of x86_64, aarch64 and little-endian POWER, generic 32-bit targets cannot assume that SIMD support is present. If you as an application developer decide to scope your application to support only recent enough 32-bit CPUs that you can assume SSE2/NEON/AltiVec/MSA to be present and want to use packed_simd / std::simd to use the SIMD capability of the CPU, you are going to have a bad time if the Rust standard library has been compiled with the assumption that the SIMD unit does not exist.

For 32-bit x86 and SSE2 Rust solves this by providing two targets: i586 without SSE2 and i686 with SSE2. Currently, the ARMv7 (both Thumb2 and non-Thumb2) targets are without NEON. I am hoping to introduce Thumb2+NEON variants in 2019.

Adding targets won’t scale, though. For example, even in the x86_64 case you might determine that it is OK for you application to require a CPU that supports SSSE3, which is relevant to portable SIMD by providing arbitrary shuffling as a single instruction. (At present, the SSE2 shuffle generation back end for LLVM misses even some seemingly obvious cases like transposing each of the eight pairs of lanes in u8x16 by lane-wise shifting by 8 to both directions in an u16x8 interpretation and bitwise ORing the results.)

I hope that in 2019, Cargo gains the Xargo functionality of being able to compile the standard library with the same target feature settings that are used for compiling the user code and the crate dependencies.

Better Integer Range Analysis for Bound Check Elision

Currently, LLVM only elides the bound checks when indexing into slices if you’ve made the most obvious comparison previously between the index and the slice length. For example:

if i < slice.len() {
    slice[i] // bound check elided
}

Pretty much anything more complex results in a bound check branch, and the performance effect is measurable when it happens in the innermost loop. I hope that rustc and LLVM will do better in 2019. Specifically:

LLVM should become able to eliminate the second check in code like:
```
if a + C < b {
    if a + D < b {
    	// ...
    }
}
```
…if a, b, C, and D are all of type usize, a and b are run-time variables, C and D are compile-time constants such that D <= C and a + C can be proven at compile time not to overflow.
LLVM should become able to figure out that a + C didn’t overflow if it was written as a.checked_add(C).unwrap() and execution continued to the second check.
rustc should become able to tell LLVM that a small constant added to slice.len() or a value previously checked to be less than slice.len() does not overflow by telling LLVM to assume that the maximum possible value for a slice length is quite a bit less than usize::max_value().
Since a slice has to represent a possible allocation, the maximum possible value for len() is not usize::max_value(). On 64-bit platforms, rustc should tell LLVM that the usize returned by len() is capped by the number of bits the architecture actually uses for the virtual address space, which is lower than 64 bits. I’m not sure if Rust considers it permissible for 32-bit PAE processes allocate more than half the address space in a single allocation (it seems like a bad thing to allow in terms of pointer difference computations, but it looks like glibc has at least it the past allowed such allocations), but even if it considered permissible, it should be possible to come up with a slice size limit by observing that a slice cannot fill the whole address space, because at least the stack size and the size of the code for a minimal program have to be reserved.
LLVM should become able to figure out that if a: ufoo and a >= C, then a - C < ufoo::max_size() + 1 - C and, therefore, indexing with a - C into an array whose length is ufoo::max_size() + 1 - C does not need a bound check. (Where C is a compile-time constant.)

`likely()` and `unlikely()` for Plain `if` Branch Prediction Hints

The issue for likely() and unlikely() has stalled on the observation that they don’t generalize for if let, match, etc. They would work for plain if, though. Let’s have them for plain if in 2019 even if if let, match, etc., remain unaddressed for now.

No LTS

Rust has successfully delivered on “stability without stagnation” to the point that Red Hat has announced Rust updates for RHEL on a 3-month frequency instead of Rust getting stuck for the duration of the lifecycle of a RHEL version. That is, contrary to popular belief, the “stability” part works without an LTS. At this point, doing an LTS would be a stategic blunder that would jeopardize the “without stagnation” part.