Skip to content

The Fast Texture Loading Rewrite#1031

Open
Wartori54 wants to merge 31 commits intoEverestAPI:devfrom
Wartori54:ftl-rewrite
Open

The Fast Texture Loading Rewrite#1031
Wartori54 wants to merge 31 commits intoEverestAPI:devfrom
Wartori54:ftl-rewrite

Conversation

@Wartori54
Copy link
Member

@Wartori54 Wartori54 commented Nov 30, 2025

As the title suggests, this PR attempts to rewrite the entirety of the good old FTL, into a more maintainable and future proof codebase, splitting code into a new class TextureContentHelper to keep file sizes small.
The main intention of this PR is to make VirtualTexture fully thread safe, and re-enginneer an implementation of FTL.

For context FTL has simply the following goal: decode the texture data from disk to a CPU buffer asynchronously, then upload to GPU memory on the main thread. This speeds up loading times and fixes some crashes on Nvidia gpus. Theres a writeup about most of the significant FTL details at the top of the VirtualTexture file diving deep into this and other topics.

As for performance, it should be on par, this PR does not have any intentional speed improvements (in fact, it may be slower due to the heavier synchronization in VirtualTexture but it should be fairly small). I have some plans and some ongoing tests to improve performance and bound memory usage even lower, but those will land in a follow up PR once ready that will build off of this one. (Runtime atlasing is still on the TODO list, I did not forget!)

Finally CI will fail, specifically the SJ TAS test since this is breaking for CollabUtils2. Luckily that mod has been updated already to be compatible and the only required extra steps for local testing are: clone the repo and dotnet build -c Release; mostly because it has not yet been released on gamebanana.
An update has been pushed to CollabUtils2 making it compatible with the changes of this PR.

@maddie480-bot maddie480-bot added the 1: review needed This PR needs 2 approvals to be merged (bot-managed) label Nov 30, 2025
Copy link
Member

@microlith57 microlith57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass, picking the low hanging fruit first! nothing here is blocking

@Wartori54
Copy link
Member Author

@microlith57 all feedback has been addressed!

@maddie480
Copy link
Member

/azp run

Copy link
Contributor

@DashingCat DashingCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Linux and works as expected, without significant changes in loading times compared to the previous "Fast Texture Loading" implementation.

[SettingNeedsRelaunch]
[SettingInGame(false)]
[SettingIgnore] // TODO: Show as advanced setting.
public bool? FastTextureLoadingPoolUseGC { get; set; } = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe FastTextureLoadingPoolUseGC should be a bool instead of a bool?, as both null and false are equivalent when using the property in Celeste.Mod.mm/Mod/Helpers/TextureContentHelper.cs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind having a default third state is the ability to change what it actually defaults to if we ever need to, because if the default were to be false we would not be able to switch all the users who didn't manually disable the feature (since we could not distinguish that).
And if you look further _FastTextureLoading and ThreadedGL have been doing this for a while now.

@Wartori54 Wartori54 added the rewrite An existing feature needs a rewrite label Jan 8, 2026
Copy link
Member

@Popax21 Popax21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have the time (or setup) to test this locally / give it more than a passing glance at times, but still gonna leave my feedback here; in total this is definitively a step in the right direction, but I am still against merging this in its current state, and think a more "radical" rewrite is needed to fully address all the shortcomings of the old system.

// Flush the main thread queue to make sure all tasks related to FTL are completed
// Note: this will often be empty already but on extreme scenarios (massive amount of textures or
// very old hardware) it may not
MainThreadHelper.Schedule(() => MainThreadHelper.Boost = 0, true).AsTask().Wait();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not actually "flush" the queue, since tasks enqueued before this one may enqueue continuations which would run after this one.


namespace Monocle {
// We may have concurrent usage of this class due to FTL
[MakeAllMethodsSynchronized]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure what this is protecting against, but this seems like an incredibly jank workaround at the best of times. If the issue is FNA not being able to cope with concurrent asset creation, add explicit locks / checks to ensure it only happens from the main thread. If the issue is something Monocle-internal, add explicit locks to said accesses. Relying on implicit synchronization like that should be an absolute last resort, and only with thorough documentation why it is needed & the best solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern here is the static member assets which is a simple List<T>. Making VirtualTexture completely thread-safe implies that methods called CreateVirtualTexture should also be thread-safe to me, and this was the easiest way out to make it safe, other than patching every single method here to add locking around that list. Changing the type of assets to a thread-safe collection felt risky due to it being from the vanilla game already.

I will clarify the motivation behind this change in the comment.

mod.LoadContent(firstLoad);

patch_VirtualTexture.StopFastTextureLoading();
// There's no need to stop Ftl if it started in the first place ;)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment will make little sense outside of blames once the PR lands.

* -ade
*/
if (patch_VirtualTexture.FtlToggle) return true;
if (CoreModule.Settings.FastTextureLoading ?? Environment.ProcessorCount >= 4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire logic should probably be revisited given that it seems to have mainly been intended for / designed to prevent OOMs on 32bit .NET FW installs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that since it had not caused any trouble for a long time (that we know of) it might be worth to not worry about and roll with it. It is just a heuristic, maybe we can fine tune it after the rewrite is settled.

inGC = CoreModule.Settings.FastTextureLoadingPoolUseGC ?? false;
} else {
inGC = false;
// Looking for a cleaner way, adding an Initialize method would make ugly the other methods and fields (need to check if init-ed on each call)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck.
(also a bunch of typos in the comments here)

}
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped giving this a look at this point; first impression is that is replacing one really jank, obtuse system, for another extremely jank, obtuse system, that retains a bunch of smelly code fragments from the old system while also reinventing the wheel a bunch (custom memory management without any actual justification for it, reinventions of System.Buffers.[Array|Memory]Pool<T>, trying to support both GCed and unmanaged memory even tho the only performance difference would be during (de)allocation and not during actual texture loading, etc etc).

IMHO this is definitely the wrong path / approach, and an FTL rewrite should not include any attempts at whatever this is when the STL already supports a lot of what we need. If it turns out that we do need custom memory management for performance reason this is definitively not the way to do it, and instead it should actually be a system that gets dedicated pages and applies some sort of simple bump/slab allocator tuned for our use case, not just further divide system heap chunks handed out by AllocHGlobal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you point out the smelly code fragments that you are referring to? As far as I'm aware everything I kept from the old code base is just code that already was good as is (reading the ".data" file format, reading a png through FNA's APIs, redirecting to the correct loader method when loading from a path), or was the FTL heuristics, which I kept as is because of what I said in my other comment.
Design wise, one of the old similarities to the old system is defining a solid upper bound to the FTL memory usage: for PNG textures we cannot really know how memory will be used because stb_image wont allow us to pass in a custom allocator or get any insight on it, for the .data textures we use that memory manager to utilize preallocated buffers and not rely on the GC to allocate the memory for us. This felt like a nice way to guarantee that FTL can be as aggressive as possible while also not overloading the system, compared limiting the amount of textures loaded asynchronously at any point in time, which wont be nearly as optimal. (But, if you come up with any other way to limit FTL's aggressiveness I'm open to discussing that.)
Limiting the maximum amount of memory used by FTL also motivates writing my own simplistic allocator, AFAIR I could not find anything in the standard library that fulfilled all my requirements so I just made my own. (As a side note, currently most textures that are loaded are in a PNG format, but I was planning and hoping for that to change with followup PRs about per-mod atlasing to load all mod textures from a single atlas, which would be stored in a ".data" file for decoding speed and to allow better memory management, this is still on the testing phase though.)
And with that came wanting to be able to switch between managed or unmanaged memory, in case that could matter for performance (reducing GC stress?) but benchmarks turned out to be very similar in either case. I can strip this functionality away if you think it is just code bloat, I kept it in case it could matter for other systems.

public int Width { get; internal set; }
public int Height { get; internal set; }

// Making Width and Height virtual is a breaking change, so lets just add new virtual properties and make the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing the properties with virtual ones (which kills performance on hot paths), instead introduce a virtual HandleSizeChange() method that gets called from within the non-virtual setters (if that's even needed, I am 99% sure that the size of assets does not actually change post-creation).

//
// Finally, this class is also tasked with the headless mode loading optimizations, where all textures which can be preloaded
// will have its Texture2D set to a 1x1 texture, this is purely for performance’s sake. Textures which cannot be preloaded will
// be loaded as usual.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also definitively want to introduce a helper method somewhere to "force" the load of a lazy-loaded texture as well.

IMO the whole API surface here needs some simplifying / nailing down: ideally we want to give modders a way to say "hey, don't load this (set of) texture(s) ahead of time; I'll tell you when you need to load it" (with a logged warning + stutter from on-demand lazy loading when they forget to do so). In a hypothetical future world this API could then also be extended to allow for auto-atlas-ification of these texture sets, which should AFAICT also hold a much bigger performance boost than lazy loading should provide (since the overhead of uploading textures over the bus is almost certainly gonna be our bottleneck on discrete GPUs, so minimizing the amount of unique textures is a much bigger deal - I assume this PR was tested on those, and not just on iGPUs where texture uploads are really cheap because of unified memory).

Copy link
Member Author

@Wartori54 Wartori54 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an API for that, currently it is an event that's fired on each VirtualTexture creation that allows any mod to force the lazy loading of the texture, it was mainly designed to be directly compatible with CU2.
I'm open to suggestions on this API, are you referring to some sort of similar system to MonoMod's using-DetourConfigContext (as seen here)?

}
}

private static extern void orig_cctor();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this MonoMod override now that it's unused.

/// <exception cref="AggregateException">Thrown if the reload happened asynchronously and there was an exception during it.</exception>
[MonoModLinkFrom("Microsoft.Xna.Framework.Graphics.Texture2D Monocle.VirtualTexture::Texture")]
public Texture2D Texture_Safe {
public Texture2D? Texture_Safe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to TextureContentHelper; while this is no doubts better than the old (undocumented) code, it still replaces one jank, obtuse system with another similarly jank, obtuse, but this time documented / slightly better structured system. IMO it still inherits a lot of design baggage related to somehow constructing its own mechanisms for synchronizing concurrent loads without making proper use of the available STL primitives.

I don't have the time to give all of this a full proper read, so I don't doubt that in practice it's gonna be more complicated than I will describe, but it should be possible to boil down all of this sync / locking logic down into a single [Value]Task stored in the VirtualTexture that is running either on the main thread or an FTL thread(pool) / might just be a Task.FromResult, which should greatly simplify the logic / remove a ton of jank.

@maddie480-bot maddie480-bot added 2: changes requested This PR cannot be merged because changes were requested (bot-managed) and removed 1: review needed This PR needs 2 approvals to be merged (bot-managed) labels Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2: changes requested This PR cannot be merged because changes were requested (bot-managed) rewrite An existing feature needs a rewrite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants