Gamedev, graphics, open source. Shuffling bytes and swizzling vectors since 2004...
27 August 2022
I have just finished up a linear algebra maths library in Rust and it’s available on crates.io. It contains the usual implementation of vectors, matrices and quaternions but also tons of useful intersection, distance functions, point tests, graphs, utility functions and ergonomic decisions to hopefully make this fun and nimble to use for gamedev and graphics coding. I have been spending small chunks of time writing functions and tests over the summer, it has been quite enjoyable and I have learned a lot more about the Rust programming language, especially going into more detail with traits and trait bounds than I have previously, and also my first real work with macros.
There are already many other Rust maths libraries available on GitHub or Crates.io. This website are we game yet has a list of gamedev libraries for Rust. I tried a few of them, such as cgmath and nalgebra in my graphics engine for a short while but I ended up wanting more and being interested in how I would implement one myself. A lot of maths libraries out there; for all the languages you can think of, usually implement vectors, matrices and quaternions. What is less common is a comprehensive collection of intersection tests, distance and utility functions… in fact SIMD support is probably more common in a maths library than a ray triangle intersection function. I had already been through this process and, over a number of years, had accumulated a decent set of functionality in my C++ library, so I decided to essentially port that functionality to Rust. My initial plan was to use an existing Rust library for the vector, matrix and quaternion implementations and then just implement the intersection and utility functions, but after a while I wanted to make changes to try and get my Rust library stylistically closer to the C++ one. At first I wasn’t sure if what I wanted would be possible, but in the end I am happy with the results. You can take a look at the full documentation or readme, which give a detailed overview of the feature set.
As a gamedev with a strong focus towards graphics a maths library is an essential tool to have in your toolbox. Since I started coding and working on games I slowly built up a maths library in C++, at this point it has had many changes and I have accumulated a lot of functions over the years from different sources, books, websites, and blogs. One of the biggest influences was this blog post by Nathan Reed. He outlined how to make a vector library that is templated by both type and size to allow n-dimensional vectors and only needing to implement one set of functions for the entire thing. Prior to this I had never been a huge fan of templates due to the escalation of complexity once you start nesting and combining them, and in general the unreadable hard to follow C++ standard library. But here was a really concrete example of something a little more complex than a container class which added value. I adopted this approach and started to enjoy template meta-programming. I went to great lengths to implement these swizzles which look and feel just like writing maths code in a shader.
vec4f swizz = v.wzyx; // construct from swizzle
swizz = v.xxxx; // assign from swizzle
swizz.wyxz = v.xxyy; // assign swizzle to swizzle
vec2f v2 = swizz.yz; // construct truncated
swizz.wx = v.xy; // assign truncated
swizz.xyz *= swizz2.www; // arithmetic on swizzles
vec2 v2 = swizz.xy * 2.0f; // swizzle / scalar arithmetic
For writing maths code, shaders are somewhat of a gold standard to me, functions such as dot
and cross
are built in and the maths code is just part of the language. With my C++ library I wanted the same feeling so you can do stuff like this, with function overloads and the ability to operate on different sized vectors and scalars with function calls:
float m = min(f, f); // min of float
float3 m3 = min(v, v); // min of float3
float dp3 = dot(v, v); // dot on float3
float dp4 = dot(v4, v4); // dot on float4
// and so on..
Writing graphics algorithms, gameplay code, or procedural generation code can build up quickly and become quite verbose. This is why I like commonly used functions such as dot
or cross
just to be in scope. I have seen things in maths libraries which end up with vector.dotProduct(other)
all over the place and find this sort of thing hard to read and follow. A hill I will die on (and I know it’s an unpopular opinion in some circles but here we go anyway) is that single letter / short variable names are OK. They can help with readability if they make the code more compact, and with comments you can make the overall algorithm more readable where the focus is on the operations instead of variable names. Having a million myVector.dotProduct(someOtherVector)
generates so much noise and could be equivalent to dot(v, o)
… A lot of mathematical notation is just single greek symbols so it comes with territory, and check out a lot of stuff on shadertoy, this is full of single letter variables too… if you hate that kind of thing maybe don’t check shadertoy you might have a heart attack.
I set out to see if it was possible to get what I wanted out of Rust. The aim here was to get something as close as possible to my C++ library that looked and felt like a shader language. This would also make it easier to port code between my C++ code base, shaders and Rust, but also make something that is ergonomic and fun to use. At this point I want to stress that most of this work was to get something that looks and feels how I wanted it to. I am not worrying about SIMD from the start, I will look into it in the future but I was happy with a scalar implementation to begin with; my C++ library is also a simple scalar implementation. I do have interest in performance so wanted to keep an eye on it, but the primary focus was ergonomics. For really heavy computations I would be inclined to use compute shaders or write specialised SIMD routines, but this library is aimed more toward game mechanics or procedural generation.
The other maths libraries around take different approaches to the internals of a vector struct. Some of them implemented concrete types of Vec2
, Vec3
and Vec4
, while some others go for an entirely n-dimensional approach for wider linear algebra. For the kind of thing I am working on I want to be able to access .x
or .y
members; I tend to do a lot of this for gameplay code, flatting movement onto an xz-plane by setting vec.y = 0.0
so this made the n-dimensional approach less appealing. I do only need 2, 3 and 4 dimensional vectors so having 3 implementations isn’t too bad, but it is a bit of repetition so I wanted to try and consolidate this like I did with C++ templates. It is possible to get the n-dimensional style vector Index
operator [i]
to get the best of both worlds.
// this allows for generic sized vectors of type `T`. But no acces to members such as v.x, v.y, v.z.
pub struct VecN<T, const N: usize> {
v: [T; N]
}
In C++ it’s possible to use an un-named union to have a vector which is both a struct of array and a struct of members. This way for 1-4 dimensional vectors you can access .xyzw
data members.
template <typename T>
struct Vec<3, T>
{
union {
T v[3];
struct
{
T x, y, z;
};
struct
{
T r, g, b;
};
swizzle_v3;
};
}
Rust has this feature proposal for anonymous unions, so in future this could become possible and allow diect access to data array or members via a union. But for the time being that is not possible, so we can use the Index
operator.
/// this allows for fixed sized vectors access to members such as v.x, v.y, v.z
pub struct Vec3<T> {
x: T,
y: T,
z: T
}
impl<T> IndexMut<usize> for Vec3<T> {
fn index_mut(&mut self, i: usize) -> &mut T {
match i {
0 => x
1 => y
2 => z
_ => z // clamp out of bound access?
}
}
}
fn test(v: Vec3) {
// access like array
let fx = v[0];
// access as member
let fx = v.x;
}
Debug performance is important and I wanted to try and keep the codegen as simple and as lean as possible. Coming from a C++ background I have a natural intuition toward compiler code generation in different scenarios. I set up a small example on godbolt to illustrate this of 2 dot product functions, one which is directly operating on a float and another which uses a template function. Before writing this I expected both to come out the same because all of the template work is done at compile time. The un-optimised version is not too far from optimisation level 1.
I was not prepared for what came when I started to do the same in Rust, again here is an example on godbolt. Even switching to very basic implementations I found un-optimised rust code generation when using generics to be significantly more bloated than a more direct implementation, even though all of the generics are being handled at compile time. In the Rust version you can switch the compiler optimization level to 1 and see that the resulting code generation for the dot product functions is identical… In the C++ example both result in the same code generation, regardless of optimisation level.
I was a little disappointed about the complexity of the code generation in debug builds, I am yet to build anything large scale so time will tell. I decided in the end to continue on the generic path and just to suck up the need for optimisation, I will profile the code in some real world scenarios when I get the chance.
I discovered that macros could be used to generate the concrete implementations for fixed sized vectors. This acted a bit like C++ templates which was somewhat of a surprise. Until this point I had associated Rust generics to be the equivalent to C++ templates, but in reality I came to understand that C++ templates are quite different. A C++ template won’t compile until it is instantiated, this allows you to write any old code inside a templated function:
struct TestStruct {
float member;
};
template<typename T>
float function(T s) {
return s.member;
}
Say you have a struct member s.member
for every T
you want to instantiate - you can ensure that the struct has a member called member
and it’s type is a float
everything will be OK. If you have a struct that does not have a member
then you will get a compile error when you try to create an instance of T
, but until that time comes you can implement the template function.
Rust generics don’t allow this - you cannot access raw data members and you need to create traits, associated methods or functions and supply trait bounds to access them in generic functions. This prevents you from creating functions that do not satisfy trait bounds in the first place. Rust declarative macros work a bit more like C++ templates; you can write any old code inside the macro and then you only know if it works or fails to compile when you try to instantiate the macro.
This was my first time using macro; at first they were quite tricky to get my head around but I eventually got used to the syntax. I tried to use them as much as possible but found when needing nested repetitions things got quite complicated, so I did opt for some manual work instead of packing everything inside macros:
// macro implementation of vec struct for Vec2, Vec3 and Vec4
macro_rules! vec_impl {
($VecN:ident { $($field:ident, $field_index:expr),* }, $len:expr, $module:ident) => {
#[derive(Debug, Copy, Clone)]
#[repr(C)]
pub struct $VecN<T> {
$(pub $field: T,)+
}
//... more macro code in / implementations in here
}
}
vec_impl!(Vec2 { x, 0, y, 1 }, 2, v2);
vec_impl!(Vec3 { x, 0, y, 1, z, 2 }, 3, v3);
vec_impl!(Vec4 { x, 0, y, 1, z, 2, w, 3 }, 4, v4);
// manual implementation of dot products for Vec2, Vec3 and Vec4
/// trait for dot product
pub trait Dot<T> {
/// vector dot-product
fn dot(a: Self, b: Self) -> T;
}
impl<T> Dot<T> for Vec2<T> where T: Number {
fn dot(a: Self, b: Self) -> T {
a.x * b.x + a.y * b.y
}
}
impl<T> Dot<T> for Vec3<T> where T: Number {
fn dot(a: Self, b: Self) -> T {
a.x * b.x + a.y * b.y + a.z * b.z
}
}
impl<T> Dot<T> for Vec4<T> where T: Number {
fn dot(a: Self, b: Self) -> T {
a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w
}
}
I ran into trouble with repetitions and horizontal operations on a vector. Having to add +
to chain together repetitions means on something like a dot product you get v.x + v.y + v.z +
; the trailing +
kills compilation. For some other things, such as struct initialization, this is OK because Rust allows trailing ,
. I had a look into the ?
operator to run a repetition only once but couldn’t get it to work, maybe I am doing something wrong. I will revisit this at some point to try and get a better result. For the Eq
op I also had the same issue; wanting to chain horizontal checks with &&
here inside the macro I just whack a true on the end and then close the parenthesis:
macro_rules! vec_impl {
($VecN:ident { $($field:ident, $field_index:expr),* }, $len:expr, $module:ident) => {
impl<T> Eq for $VecN<T> where T: Eq {}
impl<T> PartialEq for $VecN<T> where T: PartialEq {
fn eq(&self, other: &Self) -> bool {
$(self.$field == other.$field &&)+
true // redundant check just to get it to compile
}
}
}
}
For the dot product the same could be achieved by adding a T::zero()
at the start or the end to make the repetition work. Now because it’s a zero and should be a const then this would be optimised away, but maybe not in debug? I did some tests to see - indeed it ended up generating 42 more assembly instructions in an un-optimised build and even a few extra instructions in optimisation level 1. In the end I decided to create and implement a Dot
trait to avoid this and so I did not have to rely on compiler optimisation; at this point though with the amount of code that gets generated from the trait implementation it did feel a little like pissing in the wind but it’s better than nothing?
Another hurdle was to get around Rust’s lack of function overloading. Of the other libraries I trialled I noticed they implemented the dot
as so: v.dot(x)
so the dot function implemented as an associated method dot(self, other: Self)
. While it’s a small difference, it is not what I wanted, which is dot(v, x)
which could be implemented as an associated function dot(a: Self, b: Self)
. The main problem with this then becomes how can I call dot(x, v)
without having to qualify it for the vector type Vec3::dot(x, v)
. The Vec3::
adds to code verbosity and I was keen to eliminate this.
Generics come to the rescue here by allowing a generic dot
to be implemented that can take any width of vector. For any types implementing the Dot
trait (which is all of the vector sizes I care about) we can make a single function which uses the Dot
as a trait bound.
/// returns the vector dot product between a . b
pub fn dot<T: Number, V: VecN<T>>(a: V, b: V) -> T {
V::dot(a, b)
}
It then dawned on me, I could share traits between vectors and scalar numerical types so that I could implement common functions such as min, max, clamp
etc. In order to do this the number of traits exploded a little, but the end result was the ability to have tons of useful generic functions that could be called on any types (trait bounds permitting) which gave the same look and feel as a shader language.
/// returns the maximum of a and b
pub fn max<T: Number, V: NumberOps<T>>(a: V, b: V) -> V {
V::max(a, b)
}
At first I was slightly confused why Rust didn’t implement its own numerical traits, I noticed a few crates which implemented traits for numerical types and operations but I wasn’t sure which ones I should use or why. There was also historical mention of numerical traits in rust, I later discovered they were removed from the standard library. In a bid to navigate this minefield I just decided to implement my own traits to add only exactly what I needed. I had to implement a Base
trait (implemented by both vectors and scalars). Number
for floats and ints, Float
for floats, SignedNumber
for signed types (signed integers and floats). Along with those operations that can be performed on those types NumberOps
, SignedNumberOps
and FloatOps
. The base type traits are really just aggregations of arithmetic ops so they can be used as trait bounds inside other traits. The Ops
types I added supply traits for things such as floor
or ceil
and round
on floats. min
, max
and clamp
on numbers etc. The vectors also implement the NumberOps
, FloatOps
and SignedNumberOps
where the base T
used in Vec<T>
supports the operations.
This is where I became very familiar with where
clauses. It’s really quite cool to implement NumberOps
for Vec<T>
where T: NumberOps
or FloatOps
for Vec<T> where T: FloatOps
. So we are saying for any vector of i32
we get number ops and for any vector of float
we get both number ops and float ops. After implementing the various combinations of traits this gives the flexibility to supply trait bounds to generic functions, which allows me to use scalar or vector types!
let f : f32 = min(1.0, 2.0);
let v = min(vec2f(1.0, 1.0), vec2f(2.0, 2.0));
I found the From
trait to be quite useful. We can implement From
multiple times with generic arguments From<T>
so I used ‘From’ to construct different sized vectors from one another; truncating them when assigning to smaller sizes or extending with zeros to larger sizes. One thing that isn’t possible to do though is to allow the function to take multiple values. The trait expects a single parameter passed into the fn from(other: T)
function.
Tuples can provide almost the same functionality, when I thought of this idea I thought it was a cool hack! Just adding the need to supply an extra pair of parentheses to allow From
multiple values. By using tuples in From
functions I was able to create various combinations of constructors for vectors of different sizes combined together or with scalar values.
let v4 = Vec4f::from((v2, v2)); // vec4 from 2x v2's
let v3 = Vec3f::from((v2, 1.0)); // vec3 from 1x v2 and 1x scalar
let v2 = Vec2f::from((5.0, 6.0)); // vec2 from 2x scalars
let v4 = Vec4f::from((v2, 0.0, 1.0)); // vec4 from 1x v2 and 2x scalars
let v4 = Vec4f::from(v2); // vec4 from vec2 (splat 0's)
let v2 = Vec2f::from(v4); // vec2 from vec4 (truncate)
// construct rows from tuples
let m3v = Mat3f::from((
vec3f(1.0, 2.0, 3.0),
vec3f(4.0, 5.0, 6.0),
vec3f(7.0, 8.0, 9.0)
));
Tuples also worked well to construct matrices from rows of vectors, or scalar values.
After ironing out the main structure of the API with all of the numerical traits, operations, vector and scalar combinations I started to do most of the grunt work implementing functions. For a maths library there are quite a lot of things you need to implement, but one thing that is quite nice about them is they are very small and very unit testable pieces of work. I went pure TDD here in a lot of cases, making the test first and then implementing functionality. Not strictly in all cases (I did write some functions first and then the tests later, soz… sue me!), but due to the nature of the code and iterating with tests this process was really enjoyable. In total there are currently 110 tests covering most areas, there are still a few missing pieces I will be adding over time and I hope to get some code coverage tools working to aid that process. You can take a look at the current tests here.
I am lucky enough to have a few different work laptops, one of which is the M1 MacBook Air that I had previously just been using for building and testing compatibility with the M1 and x86 builds. As I was going away a few times over the summer I decided to bring this laptop with me as it was small and lightweight compared to my MacBook Pro 16”. This was a revelation, I was able to code in all sorts of places: planes, trains and even by the pool! Combining this with the easily unit testable nature of maths code I made light work of the whole thing. I spent time on holiday, just a few minutes here and there, to implement another couple of tests and another couple of functions, over a 9-week period I implemented this whole thing, chipping away at it a small piece at a time.
For a lot of the tests I wrote some simple examples by hand, this included all of the vector and matrix arithmetic, constructors and so forth. These kinds of things are fairly easy to write down on paper. For intersection tests things get a bit more interesting; it’s easy to come up with a few trivial example tests (such as a line intersection with 2 axis aligned lines), which I added, but for more cases I already had a pretty comprehensive set that I had generated from this visual demo of my C++ library. I made interactive 3D samples of all of the available intersection tests, verified their correctness visually and then used some code generation injection to generate the tests. I made a python script that was able to convert the C++ test code into Rust test code.
A great benefit of having tests is the ability to refactor, as things progressed I saw new opportunities to make things more generic, which required refactoring some traits. I also was able to maintain good consistency in the API after introducing small issues such as in functions point_inside_aabb
where I had ordered the arguments as (aabb_min: V, aabb_max: V, p: V)
this doesn’t read so well as the arguments are in the opposite order to the function name. Some of these inconsistencies came from my C++ maths library implementation which is in use in a few projects making it more difficult to refactor, it was nice here to unify all of these details.
Vector swizzling is a handy feature in shader language and getting some support into this Rust library feels like coming full-circle. I wrote my first Rust program permute whilst on holiday in 2019; the program can output all permutation combinations from some given inputs and an output format. I used the library in 2019 to generate C++ template code for vector swizzles in my C++ maths library.
I adapted the source slightly to output swizzles for Rust. I couldn’t quite get the swizzles to the same shader-style degree as the C++ implementation. It might be possible in future with support for [unnamed unions], but for the time being I generated traits and functions to return swizzled vectors of various sizes and a collection of set
methods as well.
// swizzling
let wxyz = v4.wxyz(); // swizzle
let xyz = v4.xyz(); // truncate
let xxx = v4.xxx(); // and so on..
let xy = v3.yx(); // ..
// mutable swizzles
let mut v = Vec4f::zero();
x.set_xwyz(v); // set swizzle
x.set_xy(v.yx()); // assign truncated
x.set_yzx(v.zzz()); // etc..
In order to support left hand side scalar multiplication with vectors (as supported in shader languages) I had to implement arithmetic on foreign types to make this sort of thing possible:
// multiplying a scalar by vector results in a vector
float3 v = 1.0 * float3(1.0, 2.0, 3.0);
Initially I tried to do this once for all vectors of type <T>
but this was not permitted, resulting in the following error:
impl<T> Add<Vec2<T>> for T {
// ^ type parameter `T` must be covered by another type when it appears before the first local type (`Vec2<T>`)
}
I had to implement the ops for each primitive type I wanted. A macro allows for a single implementation but here I had to commit to concrete vector types I may want, I used a macro here to optimise the process.
/// macro to stamp out all arithmetic ops for lhs scalars
macro_rules! vec_scalar_lhs {
($VecN:ident { $($field:ident),+ }, $t:ident) => {
impl Add<$VecN<$t>> for $t {
type Output = $VecN<$t>;
fn add(self, other: $VecN<$t>) -> $VecN<$t> {
$VecN {
$($field: self + other.$field,)+
}
}
}
// other ops go here...
}
}
I also added shorthand constructors which look like glsl
- again here I needed to stamp out a concrete implementation, so I committed to the following types for both constructors and for left hand side scalar arithmetic:
vecf
= 32-bit float
vecd
= 64-bit float
veci
= 32-bit signed integer
vecu
= 32-bit unsigned integer
I also added a macro which creates From
traits between these various primitive types so you can cast between vector of int to vector of float.
macro_rules! vec_cast {
($VecN:ident { $($field:ident),+ }, $t:ident, $u:ident) => {
impl From<$VecN<$u>> for $VecN<$t> {
fn from(other: $VecN<$u>) -> $VecN<$t> {
$VecN {
$($field: other.$field as $t,)+
}
}
}
impl From<$VecN<$t>> for $VecN<$u> {
fn from(other: $VecN<$t>) -> $VecN<$u> {
$VecN {
$($field: other.$field as $u,)+
}
}
}
}
}
Some people may not require all of these types, so I have exposed the macros to create the constructors and arithmetic operation implementations for primitive types and added a feature in the Cargo.toml
to disable these features should they wish.
During this process I discovered many small inconsistencies and gaps in my C++ library so I took the opportunity to note these down and will revisit that when I get a chance.
There is still some more work to do in order to complete the project. I intend on using the library now to create a graphical demo in Rust using my in-progress graphics library, which can showcase the maths library’s features in a visual way, much like my C++ libraries live demo. This process was taking a while so I decided at this point to publish the project and add the graphical demo later so that I could write up this blog post while thoughts were fresh in my mind.
The final step was to publish on crates.io, the process is very simple… I added metadata to the Cargo.toml
file, added a readme with some examples, added document comments, fixed all cargo clippy
warnings and finally hit publish. The culmination of roughly 10 weeks of consistent work. It felt satisfying.