Gamedev, graphics, open source. Shuffling bytes and swizzling vectors since 2004...
03 March 2023
Following on from the part 2 post a little while ago, I have been continuing work on my graphics engine hotline in Rust. My recent focus has been on plugins, multi-threaded command buffer generation and hot reloading for Rust code, hlsl
shader code and pmfx
render configs. I have made decent progress to the point where there is something quite usable and structured in a way I am relatively happy with. This leg of the journey has been by far the most challenging though, so I wanted to write about my current progress and detail some of the issues I have faced.
Here are the results of my first sessions actually using the engine. I created these primitives and used the hot reloading system to iterate on them to get perfect vertices, normals, and uv-coordinates. My intention is to use this tool for graphics demos and procedural generation so the focus is on making a live coding environment and not an interactive GUI editor. The visual client
provides some feedback to the user and information but it is not really an editor as such, the editing goes in source code and data files which is reflected in the client
. In time I may decide to add more interactive editor features but for now it’s all about coding. Here’s a demo video of some of the features:
I am using a single screen with vscode on the left and hotline on the right; by launching the hotline client executable from the vscode terminal it prints errors for hot reloading in the terminal and the line number links are clickable to automatically go to error lines.
I had previously created gfx
, os
and av
abstraction API’s that currently have Windows-specific backend implementations, but the API’s are designed to easily add more platforms in the future. At this time I am trying to push as far ahead as possible on a single platform because I did spend a lot of time working on cross platform support in my C++ game engine or the engines I have worked on for my day job. Cross platform maintenance can become time consuming, so for a little while I have decided just to focus on feature development.
I have also been on a few side quests that have fallen under the umbrella of this graphics engine project, but I wrote about those separately. They were implementing imgui with viewports and docking, and maths-rs a linear algebra library I have been working on while away from my Windows desktop machine. A few people asked about maths-rs and why don’t I just use any existing library? I simply wanted something to work on using my laptop in the spare time I had and a maths library was the first thing I thought of. This is a downside of having a Windows only engine, my opportunity to work on it is limited to being chained to a machine in my house. I already had a C++ maths library that I ported a lot of the code from, but while I was there I improved the API consistency, added more overlap, intersection, distance functions to both libraries, and added more tests to assist porting to maths-rs
. Now I’m at the point where I can use all the libraries to start building graphics demos.
You can use hotline as a library and use the gfx
, os
, av
, imgui
, pmfx
, and any other modules it provides. It is now available on crates.io. To my dismay the crates.io registry for the name “hotline” was already taken as a placeholder by someone else. The same happened to maths, so for both my projects on crates.io I had to call them hotline-rs
and maths-rs
. It’s a bit disappointing that people claim the names and then haven’t produced any code yet. I’d be fine with someone claiming the name first if they actually had a decent, usable package.
I had some trouble with crates.io because the package size exceeded the lofty limit of 10mb! Most of my repository size was a result of some executables I am using to build data and shaders. I have these tools from prior work so I am still using the python based build system (built into executables with help from PyInstaller). To reduce the repository size I moved the executables and data files for the hotline examples into their own GitHub repository hotline-data, which is cloned inside hotline
as part of a cargo build
.
This data feature is optional and enabled by default, but it means you could bring your own build system or use the one provided, and more importantly this feature can be disabled from package builds / publishing to crates.io.
I also had some compilation issues when publishing the package to crates.io, because currently Windows is the only supported platform. The core API’s are generic using compile time traits, but samples and plugins need to instantiate a concrete type of GPU Device
or operating system Window
and there are no macOS or Linux supported backends yet. In time I would like to add a stub implementation for each module. I worked around this for now by making the entire files that require concrete types as Windows only.
The main work I have been focusing on for the last few weeks is making live reloadable code work through a plugin system, where plugins can be loaded dynamically at run time with no modifications required to the client
executable. The client
provides a very thin wrapper around a main loop, it creates some core resources such as os::App
, gfx::Device
and so forth. It provides a core loop that will submit command lists and swap buffers and makes it easy to hook in your own update or render logic. With the client
running, plugins
can be dynamically loaded from dylibs
and code changes can be detected causing the library to be rebuilt and reloaded with the client still running. I am using hot-lib-reloader to assist the lib reloading, although I need to bypass some of it’s cool features like the hot_functions_from_file
macro because I wanted to remove the dependency on the client
knowing about the plugins.
Creating a new plugin is quite easy, first you need a dynamic library crate type. The plugins directory in hotline has a few different plugins that can be used as examples. But you basically just need a Cargo.toml
like this
[package]
name = "ecs"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["rlib", "dylib"]
[dependencies]
hotline-rs = { path = "../.." }
Inside a dynamic library plugin you can choose to get hooked into a few core function calls from the client each frame by implementing the Plugin
trait:
use hotline_rs::prelude::*;
pub struct EmptyPlugin;
impl Plugin<gfx_platform::Device, os_platform::App> for EmptyPlugin {
fn create() -> Self {
EmptyPlugin {
}
}
fn setup(&mut self, client: Client<gfx_platform::Device, os_platform::App>)
-> Client<gfx_platform::Device, os_platform::App> {
println!("plugin setup");
client
}
fn update(&mut self, client: client::Client<gfx_platform::Device, os_platform::App>)
-> Client<gfx_platform::Device, os_platform::App> {
println!("plugin update");
client
}
fn unload(&mut self, client: Client<gfx_platform::Device, os_platform::App>)
-> Client<gfx_platform::Device, os_platform::App> {
println!("plugin unload");
client
}
fn ui(&mut self, client: Client<gfx_platform::Device, os_platform::App>)
-> Client<gfx_platform::Device, os_platform::App> {
println!("plugin ui");
client
}
}
hotline_plugin![EmptyPlugin];
The hotline_plugin!
macro creates a c-abi wrapper around the Plugin
trait. I initially tried to use a Box<dyn Plugin>
which was returned from the plugin library to the main client executable so the trait functions could be called, but when trying to lookup the function in the vtable
memory seemed to be garbage. After some investigation this seems to be because Rust does not have a stable-abi so I created the macro to work around this by allocating the plugin trait on the heap, pass a FFI pointer back to the client and then pass the FFI pointer into a c-abi which the macro generates. I expected with the same compiler that I wouldn’t need to make the wrapper API but was unable to get it working.
#[macro_export]
macro_rules! hotline_plugin {
($input:ident) => {
// c-abi wrapper for `Plugin::create`
#[no_mangle]
pub fn create() -> *mut core::ffi::c_void {
let ptr = new_plugin::<$input>() as *mut core::ffi::c_void;
unsafe {
let plugin = std::mem::transmute::<*mut core::ffi::c_void, *mut $input>(ptr);
let plugin = plugin.as_mut().unwrap();
*plugin = $input::create();
}
ptr
}
// ..
}
}
All plugins do not necessarily have to implement the Plugin
trait. Plugins can extend others in custom ways. I started on a basic ecs
that uses bevy_ecs and the bevy Scheduler
to distribute work onto different threads. The reason for this plugin-ception
kind of approach is to be able to edit the core ecs
while the client is running, as well as extension plugins, but also trying to make the whole thing as flexible as possible to allow freedom to implement totally different types of plugins and be able to work on them with hot-reloading.
To load functions from other libraries you can access the libs
currently loaded in the hotline client
. These are just a wrapper around libloading that allows you to retrieve a Symbol<T>
by name.
/// Finds available demo names from inside ecs compatible plugins, call the function `get_system_<lib_name>` to disambiguate
fn get_demo_list(&self, client: &PlatformClient) -> Vec<String> {
let mut demos = Vec::new();
for (lib_name, lib) in &client.libs {
unsafe {
let function_name = format!("get_demos_{}", lib_name).to_string();
let list = lib.get_symbol::<unsafe extern fn() -> Vec<String>>(function_name.as_bytes());
if let Ok(list_fn) = list {
let mut lib_demos = list_fn();
demos.append(&mut lib_demos);
}
}
}
demos
}
The core ecs
provides some functionality to create setup
, update
or render
systems. You can add your own system functions inside different plugins
and have the ecs
plugin locate these systems to build schedules for different demos
. All system stages get dispatched concurrently on different threads, so in time it’s likely more stages will be added to setup
, update
and render
. Defining custom systems is quite straightforward. These are just bevy_ecs
systems:
// update system which takes hotline resources `main_window`, `pmfx` and `app`
#[no_mangle]
fn update_cameras(
app: Res<AppRes>,
main_window: Res<MainWindowRes>,
mut pmfx: ResMut<PmfxRes>,
mut query: Query<(&Name, &mut Position, &mut Rotation, &mut ViewProjectionMatrix), With<Camera>>) {
let app = &app.0;
for (name, mut position, mut rotation, mut view_proj) in &mut query {
// ..
}
// ..
}
Rendering systems get generated from render graphs specified through the pmfx
system, and hook themselves into a bevy_ecs
system function call.
// render system which takes hotline resource `pmfx` and a `pmfx::View`
#[no_mangle]
pub fn render_meshes(
pmfx: &bevy_ecs::prelude::Res<PmfxRes>,
view: &pmfx::View<gfx_platform::Device>,
mesh_draw_query: bevy_ecs::prelude::Query<(&WorldMatrix, &MeshComponent)>) -> Result<(), hotline_rs::Error> {
// ..
}
In order to dynamically locate and call these functions we need to supply a bit of boiler plate to look up the functions by name.
/// Register demo names for this plugin which is called `ecs_demos`
#[no_mangle]
pub fn get_demos_ecs_demos() -> Vec<String> {
demos![
"primitives",
"draw_indexed",
"draw_indexed_push_constants",
// ..
]
}
/// Register plugin system functions
#[no_mangle]
pub fn get_system_ecs_demos(name: String, view_name: String) -> Option<SystemDescriptor> {
match name.as_str() {
// setup functions
"setup_draw_indexed" => system_func![setup_draw_indexed],
"setup_primitives" => system_func![setup_primitives],
"setup_draw_indexed_push_constants" => system_func![setup_draw_indexed_push_constants],
// render functions
"render_meshes" => render_func![render_meshes, view_name],
// I had to add this `std::hint::black_box`!
_ => std::hint::black_box(None)
}
}
I hope to use #[derive()]
macros to reduce the need for the boilerplate code, but I haven’t really looked into it in much detail yet. I had to add std::hint::black_box
around the None
case in the get_system_
functions. I am here getting away without calling these functions without a c-abi wrapper so that might be the reason. Everything is working for the time being but I am prepared to address this if need be.
Another core engine feature I have been working on is pmfx
which is a high level platform agnostic graphics API that builds on top of the lower level gfx
API. The idea here is that the gfx
backends are fairly dumb wrapper API’s and pmfx
can bring that low level functionality together in a way which is shared amongst different platforms, pmfx
is also a data driven rendering system where render pipelines, passes, views, and graphs can be specified in jsn config files to make light work of configuring rendering. This is not new code and it’s something I have worked on and used in other code bases, but it is currently undergoing an overhaul to bring it more inline with modern graphics API architectures. The main pmfx-shader repository contains the data side of all of this.
So how does it work? You can write regular hlsl
shaders and then supply pmfx
files, which are used to create pipelines
, views
, textures
(and render targets) and more. views
are like render passes but with a bit more detail, such as a function that can be dispatched into a render pass with a camera for example.
textures: {
main_colour: {
ratio: {
window: "main_window",
scale: 1.0
}
format: "RGBA8n"
usage: ["ShaderResource", "RenderTarget"]
samples: 8
}
main_depth(main_colour): {
format: "D24nS8u"
usage: ["ShaderResource", "DepthStencil"]
samples: 8
}
}
views: {
main_view: {
render_target: [
"main_colour"
]
clear_colour: [0.45, 0.55, 0.60, 1.0]
depth_stencil: [
"main_depth"
]
clear_depth: 1.0
viewport: [0.0, 0.0, 1.0, 1.0, 0.0, 1.0]
camera: "main_camera"
}
main_view_no_clear(main_view): {
clear_colour: null
clear_depth: null
}
}
The pmfx
config files supply useful defaults to minimise the amount of members that need initialising to setup render state, and pmfx
can parse hlsl
files with extra context provided through pipelines
to generate shader reflection info, descriptor layouts, and more, which is yet to come.
pipelines: {
mesh_debug: {
vs: vs_mesh
ps: ps_checkerboard
push_constants: [
"view_push_constants"
"draw_push_constants"
]
depth_stencil_state: depth_test_less
raster_state: cull_back
topology: "TriangleList"
}
}
You can supply render graphs which are built at run-time with automatic resource transitions and barriers inserted based on dependencies, this is still in early stages because my use cases are currently quite simple but in time I expect this to grow a lot more:
render_graphs: {
mesh_debug: {
grid: {
view: "main_view"
pipelines: ["imdraw_3d"]
function: "render_grid"
}
meshes: {
view: "main_view_no_clear"
pipelines: ["mesh_debug"]
function: "render_meshes"
depends_on: ["grid"]
}
wireframe: {
view: "main_view_no_clear"
pipelines: ["wireframe_overlay"]
function: "render_meshes"
depends_on: ["meshes", "grid"]
}
}
}
You can take a look at a simple example of pmfx supplied with the hotline repository. Based on this file a reflection info file is generated, as well as recompiling the hlsl
source into byte code with DXC
. You can supply compile time flags that are evaluated and will generate shader permutations. Shaders which share the same source code, even though the permutation flags may differ, are hashed and re-used so as few as possible shaders are generated and compiled. pmfx
also carefully tracks all shaders and render states so only minimal changes get reloaded.
The Rust side of pmfx
uses serde to serialise and deserialise json into hotline_rs::gfx
structures so they can be passed straight to the gfx
API. pmfx
tracks source files that are dependencies to build shaders or pmfx
render configs and then re-builds are triggered when changes are detected.
All of the render config states and objects have hashes exported along with them and these hashes can be used to check for changes against live resources in use inside the client
. Only changed resources get re-compiled and reloaded. With checks to minimise shader rebuilds and checks on reloads this will hopefully mitigate compilation costs where combinatorial explosion can occur due to having many shader permutations.
The pmfx
API can be used to load pipelines
and render_graphs
and then those resources can be found by name. Ownership of the resources remains with pmfx
itself and render systems can borrow the resources for a short time on the stack to pass them into command buffers.
// load and create resources
let pmfx_bindless = asset_path.join("data/shaders/bindless");
pmfx.load(pmfx_bindless.to_str().unwrap())?;
pmfx.create_pipeline(&dev, "compute_rw", swap_chain.get_backbuffer_pass())?;
pmfx.create_pipeline(&dev, "bindless", swap_chain.get_backbuffer_pass())?;
// borrow resources (we need to get a pipeline built for a compatible render pass)
let fmt = swap_chain.get_backbuffer_pass().get_format_hash();
let pso_pmfx = pmfx.get_render_pipeline_for_format("bindless", fmt).unwrap();
let pso_compute = pmfx.get_compute_pipeline("compute_rw").unwrap();
// use resource in command buffers
cmdbuffer.set_compute_pipeline(&pso_compute);
// ..
cmdbuffer.set_render_pipeline(&pso_pmfx);
Views are a pmfx
feature that start to lean into the bevy_ecs
which contains entities such as cameras
and meshes
that can be used to render world views. A view contains a command buffer that can be generated each frame, they also have a camera (view constants) that can be bound for the pass and a render pass to render into. This is all passed to a bevy_ecs
system function, which is dispatched on the CPU concurrently with any other render systems. Each view has its own command buffer and the jobs are read-only (aside from writing the command buffer), so they can be safely dispatched on different threads at the same time. You can build command buffers and make draw calls like this:
#[no_mangle]
pub fn render_meshes(
pmfx: &bevy_ecs::prelude::Res<PmfxRes>,
view: &pmfx::View<gfx_platform::Device>,
mesh_draw_query: bevy_ecs::prelude::Query<(&WorldMatrix, &MeshComponent)>) -> Result<(), hotline_rs::Error> {
let pmfx = &pmfx.0;
let fmt = view.pass.get_format_hash();
let mesh_debug = pmfx.get_render_pipeline_for_format(&view.view_pipeline, fmt)?;
let camera = pmfx.get_camera_constants(&view.camera)?;
// setup pass
view.cmd_buf.begin_render_pass(&view.pass);
view.cmd_buf.set_viewport(&view.viewport);
view.cmd_buf.set_scissor_rect(&view.scissor_rect);
view.cmd_buf.set_render_pipeline(&mesh_debug);
view.cmd_buf.push_constants(0, 16 * 3, 0, gfx::as_u8_slice(camera));
for (world_matrix, mesh) in &mesh_draw_query {
view.cmd_buf.push_constants(1, 16, 0, &world_matrix.0);
view.cmd_buf.set_index_buffer(&mesh.0.ib);
view.cmd_buf.set_vertex_buffer(&mesh.0.vb, 0);
view.cmd_buf.draw_indexed_instanced(mesh.0.num_indices, 1, 0, 0, 0);
}
// end / transition / execute
view.cmd_buf.end_render_pass();
Ok(())
}
Resource transitions are an important part of modern graphics API and I am aiming to make this as smooth as possible. In the pmfx
file you can provide render_graphs
, which will automatically insert transitions based on state tracking. As mentioned above, all of the view render functions are dispatched concurrently on the CPU, but on the GPU they get executed in a specific order based on the render graph’s dependencies, with appropriate transitions inserted in between. This is still quite bare bones because I am not doing anything overly complicated yet, but I expect this aspect of pmfx
to require a lot more attention as the project progresses. pmfx
also provides the ability to insert resolves for MSAA resources which is like a special kind of transition.
This leg of the project has been by far the most challenging. I started to hit more difficulties with memory ownership than I had until now, and with the addition of bevy_ecs
and multi-threading introduces different scenarios to handle. Mostly the difficulties are to do with memory ownership, borrowing and mutability. Sometimes the borrow checker can be brutal and small tasks to refactor code can send you down a wormhole you didn’t expect.
Refactoring in general I have found more difficult at times in Rust than in any other language. I tend to start things quite quickly and get something working; this typically means creating separate objects on the stack inside main
and then that means the data is a bit more favourable to avoid overlapping mutability / borrowing issues.
When performing what initially seems like a simple refactor to bring that code more inline with where you want it to be or where your mental model is, you can hit a load of borrow checker errors and it turns out to be a more challenging task than you thought due to the mutual exclusion property of mutable references or just trying to move something to a thread, maybe some types can’t be Send
and then this means you have to re-think how your data is grouped together or how it is synchronised across threads.
Until this point I had mostly been dealing with objects on the stack that had been quite easy to either move or pass as reference through the call stack. Here are some examples of more complicated memory ownership:
I have a couple of places where I am using lifetimes, but have tried to steer away from them as much as possible. The current place I am using them is when passing info
structures to create resources from a module backend.
/// Information to create a pipeline through `Device::create_render_pipeline`. where the shaders will be visible in the current stack
pub struct RenderPipelineInfo<'stack, D: Device> {
/// Vertex Shader
pub vs: Option<&'stack D::Shader>,
/// Fragment Shader
pub fs: Option<&'stack D::Shader>,
// ..
}
/// The shader lifetime lasts long enough to pass to `Device::create_render_pipeline`
let vsc_info = gfx::ShaderInfo {
shader_type: gfx::ShaderType::Vertex,
compile_info: None
};
let vs = device.create_shader(&vsc_info, &vsc_data)?;
let psc_info = gfx::ShaderInfo {
shader_type: gfx::ShaderType::Vertex,
compile_info: None
};
let fs = device.create_shader(&psc_info, &psc_data)?;
let pso = device.create_render_pipeline(&gfx::RenderPipelineInfo {
vs: Some(&vs),
fs: Some(&fs),
// ..
})?;
I have considered moving the objects that need lifetimes out of the structure and just passing them instead to the function so that lifetimes are not needed, but that means you lose the ability for defaults
so I’m not sure. There is a situation to handle when passing a resource into a command buffer so that resource is to be used by the GPU as the resource can be dropped before it is used but I will cover that later.
Overlapping mutability has been tricky to get around at times. That is taking 2 mutable references to data that overlaps. It’s interesting to have to tackle this because it’s not something that you need to think about in C or C++, yet it is happening all the time, load-hit-stores occur when aliasing memory by two pointers as function arguments because they cannot be guaranteed to be different at compile time. Using restrict
was something I used to do in the past when thinking about performance, but it’s not really something I think all that much about these days because the memory aliasing concept is quite abstracted. Rust forbids it and as a result you end up in difficult situations when grouping data in different ways.
It’s a natural instinct to want to bundle things together, maybe coming from a C background with context passing has got me leaning this way. But in hotline
one of the more difficult scenarios I hit was when creating the Client
. It felt to me natural that the Client
could bundle together some common core functionality and be something passed around between plugins. But trouble arises when you need to borrow 2 members at the same time.
// call plugin ui functions
for plugin in &mut self.plugins {
// ..
self = ui_fn(self, plugin.instance, imgui_ctx); // cannot move self because it is borrowed as mutable (`for plugin in &mut self.plugins`)
}
I was able to work around this particular instance by moving the plugins into another vector, but also then I had to separate what members were part of the Plugin
so that the libs
could be accessed inside plugin functions to allow them to find functions to call.
// take the plugin mem so we can decouple the shared mutability between client and plugins
let mut plugins = std::mem::take(&mut self.plugins);
// call plugin ui functions
for plugin in &mut plugins {
// ..
self = ui_fn(self, plugin.instance, imgui_ctx); //now we can move self
}
This illustrates to me how data ownership and grouping is quite a different beast in Rust to what I am used to.
I found myself breaking apart algorithms and doing things like finding data that needs to be mutated in one pass, gathering the results then iterating over the results in a separate pass to separate the mutability.
// iterate over `pmfx_tracking` to check for changes, and reload data
for (_, tracking) in &mut self.pmfx_tracking {
let mtime = fs::metadata(&tracking.filepath).unwrap().modified().unwrap();
if mtime > tracking.modified_time {
// perform a reload
self.shaders.remove(shader); //!! this is not possible as `self` is already borrowed (`for (_, tracking) in &mut self.pmfx_tracking`)
// ..
// update modified time
tracking.modified_time = fs::metadata(&tracking.filepath).unwrap().modified().unwrap();
}
}
My instinct initially went for imperative style loops, but since then I started to adopt the iterator patterns using filter
, map
, fold
, and collect
. This means that creating a mutable collection and inserting into that collection can be replaced with an immutable collection.
// first collect paths that need reloading
let reload_paths = self.pmfx_tracking.iter_mut().filter(|(_, tracking)| {
fs::metadata(&tracking.filepath).unwrap().modified().unwrap() > tracking.modified_time
}).map(|tracking| {
tracking.1.filepath.to_string_lossy().to_string()
}).collect::<Vec<String>>();
// iterate over the paths we want to reload
for reload_filepath in reload_paths {
if !reload_filepath.is_empty() {
// repeat similarly inside, collecting resources that need updating first
// find textures that need reloading
let reload_textures = self.textures.iter().filter(|(k, v)| {
self.pmfx.textures.get(*k).map_or_else(|| false, |src| {
src.hash != v.0
})
}).map(|(k, _)| {
k.to_string()
}).collect::<HashSet<String>>();
// ..
// reloading outside of any iterator tied to self (self here is mutable)
self.recreate_textures(device, &reload_textures);
}
}
I have started to think this way a bit more, but it’s a little alien to me when I can achieve the same thing with a simple loop.
For the plugin libs and for bevy_ecs
particularly I need to pass the hotline modules as resources to the ecs
systems. In the end I settled on moving the entire hotline Client
into the plugin functions, into the ecs World
and back out again. The core hotline modules also need to be wrapped up to be a Resource
for bevy_ecs
.
This feels quite nice in a way, that in each plugin you have full ownership of hotline
and can do what you like, which makes it possible to do things like asynchronous system updates through bevy Scheduler
and let that have full control.
// move hotline resource into world
self.world.insert_resource(session_info);
self.world.insert_resource(DeviceRes(client.device));
self.world.insert_resource(AppRes(client.app));
self.world.insert_resource(MainWindowRes(client.main_window));
self.world.insert_resource(PmfxRes(client.pmfx));
self.world.insert_resource(ImDrawRes(client.imdraw));
self.world.insert_resource(UserConfigRes(client.user_config));
// update systems
self.schedule.run(&mut self.world);
// move resources back out
client.device = self.world.remove_resource::<DeviceRes>().unwrap().0;
client.app = self.world.remove_resource::<AppRes>().unwrap().0;
client.main_window = self.world.remove_resource::<MainWindowRes>().unwrap().0;
client.pmfx = self.world.remove_resource::<PmfxRes>().unwrap().0;
client.imdraw = self.world.remove_resource::<ImDrawRes>().unwrap().0;
client.user_config = self.world.remove_resource::<UserConfigRes>().unwrap().0;
self.session_info = self.world.remove_resource::<SessionInfo>().unwrap();
It requires a small amount of hokey-cokey to do so, which is also a little strange but I kind of like it, I had to break out the different modules inside the client to avoid overlapping mutability when using the SwapChain
, Device
and CmdBuf
.
I am aware I could wrap everything in an Arc
to get interior mutability and that might remove the need for the moves, but I decided to try and use them only where necessary. I am using Arc
and Mutex
anywhere inter-thread synchronisation is necessary. I quite like lockless data structures in C. I will take a look at tokio when I get a chance but for now I was going with a heavy handed approach in a few places just to get the program structured how I would like. I have this Reloader
and ReloadResponder
setup that watches files and flags when changes have occurred, triggers a rebuild and reloads, the ReloadResponder
is also the only place using dyn
dispatch, there is still more work to do in that area as I struggled with trying to achieve polymorphic behaviour that I would implement in C++.
Another place using an Arc
is in pmfx
because Views
need to be mutable in render functions and they are located within a HashMap
, so it’s not possible to borrow mutable from pmfx
itself without interior mutability. This led me to go with an Arc
, however a move should be viable because only a single render system will ever write to a single view, so here it does feel unnecessary to require an Arc
but I also had to work with bevy_ecs
systems here which forced my hand slightly. A RefCell
might also be a better option in this scenario.
Within a multi-buffered GPU rendering system, while the CPU is building command buffers for the current frame, another previous frame is being executed concurrently on the GPU. This introduces issues where if we decide to drop
a resource on the CPU side, it may still be in-use on the GPU and by dropping the resource this will cause a D3D12 validation error and can lead to a device removal. I encountered this issue first with textures used for videos in the av
API, so I added a destroy_texture
function that passes ownership of the texture to the GPU Device
and a function cleanup_resources
will check the resources are no longer in use before dropping
them. This goes a little against Rust’s memory model with the need to explicitly drop
at the right time.
// swap the texture to None, and pass ownership of texture to the device. Where it will be cleaned up safely
let mut none_tex = None;
std::mem::swap(&mut none_tex, &mut self.texture);
if let Some(tex) = none_tex {
device.destroy_texture(tex);
}
In some places where full reloads are taking place there is a useful function on a SwapChain
, which can wait for the last submitted frame to complete on the GPU and then any drops
happening will be guaranteed to be safe before any new frames are submitted.
// check if we have any reloads available
if self.reloader.check_for_reload() == ReloadState::Available {
// wait for last GPU frame so we can drop the resources
swap_chain.wait_for_last_frame();
self.reload(device);
self.reloader.complete_reload();
}
This solution is much nicer because the drop
can just happen naturally. It might not be possible or desired to perform this hard sync with the GPU, so in future I expect to have to use the destroy
functions more (and add them for different resource types).
Build times are currently the biggest problem; a plugin
takes around 6 seconds to build with a little extra to complete the reload, so live code editing does not feel hugely responsive. Reloading shaders or render configs is very fast though, so that balances it out a bit if you work across code and shaders, and is all the more reason to use more GPU driven techniques / compute. The build times in full debug builds are much slower, but because of an issue with more than 65535 symbols exported from a plugin, which is not supported by the MSVC toolchain, I am forced to switch to 01
optimization for debug and that has similar performance to release.
= note: LINK : fatal error LNK1189: library limit of 65535 objects exceeded
This post has lots of detailed info about build times. I have tried to profile the build with the cargo build -Z timings
option but it is only available on the nightly
channel. Switching to nightly made it possible to run with the flag but I couldn’t see any output cargo-timings
files, I wonder if the -Z timings
is not available on Windows? As a result I am shooting in the dark a little here. I have done some exploration to figure out what might work best.
I tried to separate out the plugins over more libs so that the core hotline lib did not have to depend on bevy_ecs
. This didn’t make much of a positive difference because it created the requirement for an additional plugin with shared code. This attempt ended up with 5 total build artefacts, which ended in 13 second build times.
hotline_rs.dll
client.exe
ecs_base.dll
ecs.dll
ecs_demos.dll
Each library or executable that requires building adds a noticeable constant cost which seems like the link time. So reducing the amount of libs actually helped improve build times and adding more only increased the build time. I moved the ecs_base
plugin into hotline
, which reduces the number of build artefacts and brings me to 6 second build times. If I build a single lib and executable the build time is around 10 seconds, so the live building is an improvement, if still not where I would like it to be but maybe this can improve in time.
For an end user the desired result is that they would not need to modify the client
, the hotline
lib or the core ecs
and only work inside a plugin such as ecs_demos
. So this has about a 6 second build time, which is not too bad, however when working on the core engine itself care needs to be taken to make sure the plugins and the core libs are in sync so that means building more artefacts. Building from clean and switching between release and debug also added a cost, so keeping the number of libs down was the way to go.
Due to the build times being fairly long I had to ensure that any builds were not being triggered erroneously when they did not need to. With shaders inside the main repository it causes cargo build
to think that hotline
needs rebuilding when shaders have changed, even though this should not affect any of the libs or the executables. This is particularly painful because if modifying a shader the client is able ro rebuild and reload the shaders and associated pipelines very quickly, however the next time code is modified in a plugin it causes the plugin to rebuild the main lib
as well, causing the total build time to reach about 10 seconds. If the shaders are excluded from the package in Cargo.toml
then the issue does not occur, but the data is necessary to ship to users.
Due to other constraints I ended up moving the data into a separate repository, which mitigates the issue of rebuilding the main library. For now the problem is kept at bay, but it is a bit more work to maintain changes in the data repository. I need to find a more long term solution to this. Even editing the todo.txt
file I have inside the repository causes a cargo build to take a few seconds. I know you can exclude directories from the package, which resolves the issue, but also I would like these things to publish to crates.
Debugging in general is more difficult in the plugin
environment, if you are attached to the debugger plugin
rebuilds will fail because the .pdb
is locked. So sometimes it means resorting to println!
debugging when you need to debug the hot reload process itself.
I added support for serialisation of the basic program state, which is synchronised between release and debug builds. It keeps the camera position and the currently selected demos, so even from a full restart you are right back where you left off. This makes those times where something goes terribly wrong, or the times you need to edit the core engine, just a little bit easier.
For the convenience that having hot reloaded plugins aims to provide, developing in that environment is quite tricky, so currently it feels like having plugins is an extra burden to carry around. It would be nice to be able to switch between statically linked and dynamically linked plugins and that would also be a good option for a final packaged build of an application, where the hot reloading would not be required. This is something I will look into when I get a chance.
I have spent quite a lot of time handling errors and propagating them in a way to allow the client to continue running gracefully should something go wrong. It’s quite easy to quickly get things working just using unwrap
to panic if something is missing, fix the issue and leave the unwrap there. That’s how I tend to like working in other code bases; if something is missing, assert and then fix that before moving on. In certain situations, like a game for instance, missing data should not be present in a final build so having all of the code to gracefully handle it always felt like extra baggage. But in some situations like this code base and in tools, you need to allow things to go wrong and interactively resolve them.
Luckily Rust is really good at error handling and it actively wants you to do so, even in cases where I would hit a panic for a missing shader or some other data which may have had a typo in or an incorrect path. When I was hitting the panic I would know exactly where and then returning Result
from a function allows the use of ?
, which makes a significant improvement to the readability of the code by reducing the need to unwrap. Here’s a bloated messy initial setup, partly down to no being sure how to handle errors in bevy_ecs
systems.
#[no_mangle]
pub fn render_meshes(
pmfx: bevy_ecs::prelude::Res<PmfxRes>,
view_name: String,
mesh_draw_query: bevy_ecs::prelude::Query<(&WorldMatrix, &MeshComponent)>) {
// this is just code needed to get gfx resources and unwrap them to use in command buffer generation
let arc_view = pmfx.get_view(&view_name);
if arc_view.is_none() {
return;
}
let arc_view = arc_view.unwrap();
let view = arc_view.lock().unwrap();
let fmt = view.pass.get_format_hash();
let mesh_debug = pmfx.get_render_pipeline_for_format("mesh_debug", fmt);
if mesh_debug.is_none() {
return;
}
let mesh_debug = mesh_debug.unwrap();
let camera = pmfx.get_camera_constants(&view.camera);
if camera.is_none() {
return;
}
let camera = camera.unwrap();
// ..
}
And with the proper result propagation it looks much better, I was also able to unwrap and pass the View
into the function instead of fetching it by name because now I was calling the function from a closure which gives a bit more control.
#[no_mangle]
pub fn render_meshes(
pmfx: bevy_ecs::prelude::Res<PmfxRes>,
view_name: String,
mesh_draw_query: bevy_ecs::prelude::Query<(&WorldMatrix, &MeshComponent)>) -> {
let fmt = view.pass.get_format_hash();
let mesh_debug = pmfx.get_render_pipeline_for_format(&view.view_pipeline, fmt)?;
let camera = pmfx.get_camera_constants(&view.camera)?;
// ..
}
There are still lots of combinations and things to test when it comes to error handling so I have added some initial tests to try and catch things that might go wrong, but I foresee this as ongoing work and need to get in the habit of thinking of that earlier on instead of quickly getting something working and refactoring.
I’m pretty happy with the overall program structure and also the stability has been great. I think that’s a good affirmation that all of the hard work playing ball with the borrow checker pays off in the long run. I still think there’s a lot to think about in terms of memory ownership, this is one area I’m not as certain about as anything I have worked on for a long time. Next up I will be starting to add lighting and shadows, firstly I need to add these concepts into the ecs
and then plan to work on clustered lighting and virtual shadow maps. There are a few bits of pmfx
I need to add and hookup to make that possible, but in general the graphics side of the engine is really coming along.
I posted about this both on twitter and mastodon, I was keen to move to mastodon but still finding much more engagement on twitter. Give me a follow if you’re interested and check out the GitHub or crates.io page.