I have a few doubts about Metal rendering with regards to both correctness and efficiency. The following is an overview of the processing pipeline in question:
Writer side, driven by AVCaptureOutput
- A
CVPixelBufferis acquired fromcaptureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) - A
MTLTexturetextureAbacked by this pixel buffer is created usingCVMetalTextureCacheCreateTextureFromImage - A command buffer is created from command queue #1. A number of “filter” shaders are encoded into this command buffer, each of which reads from
textureA, performs some calculations, and writes back totextureAin series. One of these filters is reliant on an auxiliaryMTLTextureto store intermediate results, which is created once at startup and reused. - This command buffer is also used to write the contents of
textureAinto anotherMTLTexture,textureB. This texture is not backed by a pixel buffer, so it can be kept around as long as necessary without holding onto the provided pixel buffer (to avoid causingAVCaptureOutputto drop frames). The reference to this texture is stored in a shared location, and a marker is set to signify that new data is available to be accessed by the “reader” (described below) - The original
CVPixelBuffer(modified by way oftextureA) is passed toAVAssetWriterInputPixelBufferAdaptor.append()for writing to a video file
Reader side, driven by MTKViewDelegate.draw(in:)
- Check if an updated
textureBis available, and if so use it in the following steps. Also mark that this data has been handled so we don’t render it again in the nextdrawinvocation if no new data has been provided by the writer in the interim. Otherwise do nothing - Using a command buffer created from a different command queue #2, encode a few more filters that modify this texture
- Using the same command buffer, present the results to the view’s drawable if available
This structure was chosen to account for the fact that the MTKView‘s frame rate may be either slower or faster than the rate at which pixel buffers are handled and delivered by the writer. It also seeks to keep as much of the “unimportant” reader steps away from the much more important video processing steps performed by the writer.
My questions are as follows:
- Should
textureBbe created anew with each repetition of the process? That certainly works, but it seems wasteful. Maybe it should be triple-buffered? On the other extreme end, what would be the implications of always using the same texture fortextureB? I can only assume that would be bad, but would it just lead to unnecessary bottlenecks in which the writer’s GPU commands cannot proceed while the reader is accessing the shared texture, or would there be issues with correctness? This also leads to the next question: - Will Metal’s built-in hazard tracking still ensure that reading and modifying
textureBfrom the reader (command queue #2) only happens when modification by the writer (command queue #1) is not in progress? Conceptually similar to this unanswered question - If not, is there a better approach to achieve a similar result? Am I better off just using one command queue for both the reader and writer ends?
- The single reused auxiliary texture in step (3) of the writer side must mean that each new frame must wait for the previous frame’s commands to finish, correct? Everything currently happens fast enough that this shouldn’t matter, but in general, is it good practice to triple-buffer these helper textures?
- Are there any other fundamental misunderstandings or issues evident in this setup?
Bonus question: is AVAssetWriter (or AVAssetWriterInputPixelBufferAdaptor) doing something internally to wait for the GPU to finish modifying any textures backed by the pixel buffer I’m feeding it? It seems as though it might be, unless I am missing some side effect of another part of my render pipeline. Even if I forego a waitUntilCompleted or addCompletedHandler before step (5) in the writer, I don’t get unmodified (or partially modified) pixel buffers written to the output video file as one might expect. To verify, I added a hugely expensive step to one of the shaders that modifies each incoming pixel buffer, and set up log entries to record when the pixel buffer is appended and when the command buffer modifying that pixel buffer actually completes. The log looks something like this, showing that received pixel buffers are appended to the AVAssetWriter well before their respective GPU commands finish, yet the output file has all the rendering effects.
Appending <CVPixelBuffer 0x282b6acb0> // <--- Happens almost immediately after pixel buffer is received
Finished rendering <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Appending <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Appending <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer [some other address]>
Finished rendering <CVPixelBuffer 0x282b6acb0> // <--- Happens almost a half second later, once the extremely slow shader finishes executing, yet the output file contains the results of this rendering, not the unmodified pixel buffer!
Thanks in advance to anyone who reads and helps with any part of this. Happy to provide clarification on any points.




