Skip to content

Texture Streaming#949

Open
bjornbytes wants to merge 5 commits into
devfrom
texture-streaming
Open

Texture Streaming#949
bjornbytes wants to merge 5 commits into
devfrom
texture-streaming

Conversation

@bjornbytes

@bjornbytes bjornbytes commented May 28, 2026

Copy link
Copy Markdown
Owner
  • Add stream flag to lovr.graphics.newTexture/newModel. This will load the texture(s) asynchronously on a separate GPU queue when possible, avoiding interrupting the rendering work happening on the main queue.
  • Add Texture/Model:isReady to check if the asynchronous transfer is complete and the texture is ready to use. It is an error to use a Texture before it's ready.
    • You can use Model:isReady to see if all of its textures are ready, or you can check individual textures and draw the ready ones with Pass:drawPart to do a progressive per-mesh load.
  • Vulkan details:
    • core/gpu texture creation takes a CPU pointer instead of GPU buffer
    • Texture upload uses VK_EXT_host_image_copy when available/optimal (for all textures).
    • Otherwise, falls back to separate transfer queue when available (and the texture is streaming).
    • Otherwise, falls back to doing the transfer on the graphics queue (original behavior).

Main "reason" to do it is that WebGPU really wants to take a CPU pointer for texture data.

Still unclear if this is worth it or a good idea, but it sure is cool!

- Add stream flag to newTexture/newModel.  These will load the texture
  asynchronously on a separate GPU queue when possible, avoiding
  interrupting the rendering work happening on the main queue.
- Add Texture/Model:isReady to check if the asynchronous transfer is
  complete and the texture is ready to use.  It is an error to use a
  Texture before it's ready.
- Vulkan details:
  - core/gpu texture creation takes a CPU pointer instead of GPU buffer
  - Texture upload uses VK_EXT_host_image_copy when available/efficient.
  - Otherwise, falls back to separate transfer queue when available.
  - Otherwise, falls back to doing the transfer on the graphics queue.

Unclear if this is worth it or a good idea, but it sure is cool!

Main reason to do it is that WebGPU really wants to take a host pointer
for texture data.
@DonaldHays

Copy link
Copy Markdown
Contributor

Does this pull the textures off disk asynchronously, too, or is it just the GPU upload that's asynchronous?

@bjornbytes

Copy link
Copy Markdown
Owner Author

This change just affects the GPU upload. You can wrap texture creation in a task too:

lovr.task.start(function()
  texture = lovr.graphics.newTexture('file', { stream = true })
end)

This gets pretty much all the overhead off of the main CPU thread / GPU queue, including the file read, image decode, GPU memory allocation, and GPU transfer.

@bjornbytes

Copy link
Copy Markdown
Owner Author

Performance story here is a little more muddled than I hoped

  • VK_EXT_host_image_copy
    • Can be slower on the CPU, because it does the texture swizzling on the CPU instead of the GPU.
    • When a host image copy texture is created on a background thread, this is a really good way to upload textures.
      • There's a question of whether LÖVR should encourage you to just put your newTexture call in a task, or whether the graphics module should engage in internal heroics to offload texture creation onto worker threads.
    • One of the main wins it that it doesn't require a staging buffer, which reduces peak memory usage and can avoid some costly overhead of vkAllocateMemory/vkFreeMemory (which appears to be very real on NVIDIA, see below).
  • Using a separate transfer queue
    • Pretty clear win on AMD iGPU. Texture uploads on graphics queue cause stutters, transfer queue is stutter free. Still profiling to quantify/understand the benefits better.
    • On NVIDIA, not good. Submission to the transfer queue can be insanely slow (~50ms), even when only 1 thread is doing submits.
      • In the current architecture, this causes hitches because the transfers are submitted alongside the main graphics queue submit.
      • I think this might actually be caused by GPU memory allocation, but not entirely sure.
      • LÖVR might need to make its staging buffer allocation more sophisticated to get better performance on NV.
      • Or the transfer submit could be moved off the main thread. However, a previous version of this branch was doing that and I was still seeing tens-of-ms stalls on the graphics vkQueueSubmit. I originally thought it was due to transfer submit contending with graphics submit, but maybe it has something to do with memory management in the driver.

If I was thinking about whether this can/should be merged, the story might be like:

  • Host image copy is good, but can cause lovr.graphics.newTexture to take a little longer on the CPU, but this is okay because it's already slow. lovr.task.start is a good tool to solve this problem, and it's fine to add a little extra pressure to wrap texture creation in tasks.
  • Transfer queue is good on some drivers, bad/neutral on other drivers, and the code is kinda complicated. The bad driver thing can probably be fixed in the future by improving LÖVR's memory allocation or threading. So it's probably okay to merge, tentatively/reluctantly.
  • There's a question of whether { stream = true } makes sense to expose.
    • On host image copy, it's a noop, unless LÖVR does the internal threading heroics and marks the texture as ready once a background thread finishes the copy.
    • It's still a useful signal for people to provide to LÖVR:
      • false is "I literally need this texture this frame, don't use a transfer queue and/or please wait for the host copy to finish before rendering"
      • true is "this can wait, I don't want frame stutters, transfer queue is good here and/or don't block on the host image copy"
    • On the other hand, you could actually use the task system for it:
      • newTexture is async.
      • If you call it outside of a task, you get a synchronous texture.
      • Inside a task, you'll get a streaming texture, but you can .wait if you ever want to block on it.
    • If { stream = true } is not exposed, all textures either need to be synchronous (transfer queue code makes no sense) or async-with-implicit-stall (too much unnecessary magical machinery, not worth implementing).
    • Given this is a pretty niche feature, it's probably fine to defer/change the exact API design later and just leave { stream = true } as an experimental flag to play with???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants