I have a few doubts about Metal rendering with regards to both correctness and efficiency. The following is an overview of the processing pipeline in question:
Writer side, driven by AVCaptureOutput
- A
CVPixelBuffer
is acquired fromcaptureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)
- A
MTLTexture
textureA
backed by this pixel buffer is created usingCVMetalTextureCacheCreateTextureFromImage
- A command buffer is created from command queue #1. A number of “filter” shaders are encoded into this command buffer, each of which reads from
textureA
, performs some calculations, and writes back totextureA
in series. One of these filters is reliant on an auxiliaryMTLTexture
to store intermediate results, which is created once at startup and reused. - This command buffer is also used to write the contents of
textureA
into anotherMTLTexture
,textureB
. This texture is not backed by a pixel buffer, so it can be kept around as long as necessary without holding onto the provided pixel buffer (to avoid causingAVCaptureOutput
to drop frames). The reference to this texture is stored in a shared location, and a marker is set to signify that new data is available to be accessed by the “reader” (described below) - The original
CVPixelBuffer
(modified by way oftextureA
) is passed toAVAssetWriterInputPixelBufferAdaptor.append()
for writing to a video file
Reader side, driven by MTKViewDelegate.draw(in:)
- Check if an updated
textureB
is available, and if so use it in the following steps. Also mark that this data has been handled so we don’t render it again in the nextdraw
invocation if no new data has been provided by the writer in the interim. Otherwise do nothing - Using a command buffer created from a different command queue #2, encode a few more filters that modify this texture
- Using the same command buffer, present the results to the view’s drawable if available
This structure was chosen to account for the fact that the MTKView
‘s frame rate may be either slower or faster than the rate at which pixel buffers are handled and delivered by the writer. It also seeks to keep as much of the “unimportant” reader steps away from the much more important video processing steps performed by the writer.
My questions are as follows:
- Should
textureB
be created anew with each repetition of the process? That certainly works, but it seems wasteful. Maybe it should be triple-buffered? On the other extreme end, what would be the implications of always using the same texture fortextureB
? I can only assume that would be bad, but would it just lead to unnecessary bottlenecks in which the writer’s GPU commands cannot proceed while the reader is accessing the shared texture, or would there be issues with correctness? This also leads to the next question: - Will Metal’s built-in hazard tracking still ensure that reading and modifying
textureB
from the reader (command queue #2) only happens when modification by the writer (command queue #1) is not in progress? Conceptually similar to this unanswered question - If not, is there a better approach to achieve a similar result? Am I better off just using one command queue for both the reader and writer ends?
- The single reused auxiliary texture in step (3) of the writer side must mean that each new frame must wait for the previous frame’s commands to finish, correct? Everything currently happens fast enough that this shouldn’t matter, but in general, is it good practice to triple-buffer these helper textures?
- Are there any other fundamental misunderstandings or issues evident in this setup?
Bonus question: is AVAssetWriter
(or AVAssetWriterInputPixelBufferAdaptor
) doing something internally to wait for the GPU to finish modifying any textures backed by the pixel buffer I’m feeding it? It seems as though it might be, unless I am missing some side effect of another part of my render pipeline. Even if I forego a waitUntilCompleted
or addCompletedHandler
before step (5) in the writer, I don’t get unmodified (or partially modified) pixel buffers written to the output video file as one might expect. To verify, I added a hugely expensive step to one of the shaders that modifies each incoming pixel buffer, and set up log entries to record when the pixel buffer is appended and when the command buffer modifying that pixel buffer actually completes. The log looks something like this, showing that received pixel buffers are appended to the AVAssetWriter
well before their respective GPU commands finish, yet the output file has all the rendering effects.
Appending <CVPixelBuffer 0x282b6acb0> // <--- Happens almost immediately after pixel buffer is received
Finished rendering <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Appending <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Appending <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer 0x282b6acb0> // <--- Happens almost a half second later, once the extremely slow shader finishes executing, yet the output file contains the results of this rendering, not the unmodified pixel buffer!
Thanks in advance to anyone who reads and helps with any part of this. Happy to provide clarification on any points.