Playing AAC stream in AVAudioEngine


The challenge is scheduling an AAC buffer in a AVAudioPlayerNode which only consumes uncompressed linear pcm buffers.

The function playAudio takes a Data() object that contains a chunk of AAC being streamed to the client, normally a few kilobytes in size. The process goes like this:

  1. Write AAC chunk to a AVAudioCompressedBuffer.
  2. Create a pcm buffer and convert the AVAudioCompressedBuffer into it.
  3. Schedule the buffer in the Player Node

This is my attempt. However it does not work, the audio just does not play at all in the speakers. I wrote a bunch of debugging statements across the functions to figure out what is going on.

    func preparePlayer() {
        guard let engine = audioEngine else {
            print("Audio engine is not initialized")
            return
        }

        // Initialize a player node and attach it to the engine
        playerNode = AVAudioPlayerNode()

        engine.attach(playerNode!)
        print("Output Node Format: \(engine.outputNode.outputFormat(forBus: 0))")

        pcmFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 48000, channels: 1, interleaved: false)
        
        engine.connect(playerNode!, to: engine.outputNode, format: pcmFormat)

        var asbd = AudioStreamBasicDescription()
        asbd.mSampleRate = 24000  // 24 kHz
        asbd.mFormatID = kAudioFormatMPEG4AAC // AAC
        asbd.mChannelsPerFrame = 1  // Mono
        asbd.mBytesPerPacket = 0  // Varies (compressed format)
        asbd.mFramesPerPacket = 1024 
        asbd.mBytesPerFrame = 0  // Varies (compressed format)
        asbd.mBitsPerChannel = 0  // Varies (compressed format)

        // Create an AVAudioFormat with the AudioStreamBasicDescription
        sourceFormat = AVAudioFormat(streamDescription: &asbd)
        
        // Initialize the audio converter with the source (AAC) and destination (PCM) formats
        audioConverter = AVAudioConverter(from: sourceFormat!, to: pcmFormat!)
    }
    
    func playAudio(audioData: Data) {
        guard let playerNode = self.playerNode else {
            print("Player node is not initialized")
            return
        }
        guard let engine = audioEngine else {
            print("Audio engine is not initialized")
            return
        }
        
        let compressedBuffer = AVAudioCompressedBuffer(format: sourceFormat!, packetCapacity: 1024, maximumPacketSize: audioConverter!.maximumOutputPacketSize)
        compressedBuffer.byteLength = AVAudioPacketCount(audioData.count)
        print("audioData contains \(audioData.count) counts")
        let middleIndex = audioData.count / 2
        let middleRangeAudioData = middleIndex..<(middleIndex + 10)
        print("Middle bytes of audioData: \(Array(audioData[middleRangeAudioData]))")

        
        audioData.withUnsafeBytes {
            compressedBuffer.data.copyMemory(from: $0.baseAddress!, byteCount: audioData.count)
        }
        print("compressedBuffer contains \(compressedBuffer.packetCount) packet counts")
        print("compressedBuffer contains \(compressedBuffer.byteCapacity) byte capacity")
        print("compressedBuffer contains \(compressedBuffer.byteLength) valid bytes")
        print("compressedBuffer contains \(compressedBuffer.packetCapacity) packet capacity")
        let bufferPointer = compressedBuffer.data.bindMemory(to: UInt8.self, capacity: audioData.count)
        let bufferBytes = Array(UnsafeBufferPointer(start: bufferPointer, count: audioData.count))
        let middleRangeCompressedBuffer = middleIndex..<(middleIndex + 10)
        print("Middle bytes of compressedBuffer: \(Array(bufferBytes[middleRangeCompressedBuffer]))")


        // Create a PCM buffer
        let pcmBuffer = AVAudioPCMBuffer(pcmFormat: pcmFormat!, frameCapacity: 1024)

        let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
            outStatus.pointee = AVAudioConverterInputStatus.haveData
            return compressedBuffer
        }

        var error: NSError?
        let conversionResult = audioConverter!.convert(to: pcmBuffer!, error: &error, withInputFrom: inputBlock)
        if conversionResult == .error {
            print("Conversion failed with error: \(String(describing: error))")
        } else {
            print("Conversion successful")
            print("buffer contains \(pcmBuffer?.frameLength ?? 123456) frames")
        }
                
        if let frameLength = pcmBuffer?.frameLength, frameLength > 0 {
            let channelCount = pcmFormat?.channelCount ?? 0
            for channel in 0..<channelCount {
                if let channelData = pcmBuffer?.floatChannelData?[Int(channel)] {
                    let channelDataPointer = UnsafeBufferPointer(start: channelData, count: Int(frameLength))
                    let firstFewSamples = Array(channelDataPointer.prefix(10))
                    print("First few samples of pcmBuffer in channel \(channel): \(firstFewSamples)")
                }
            }
        }

        
        if !engine.isRunning {
            do {
                try engine.start()
            } catch {
                print("Error starting audio engine: \(error)")
            }
        }

        playerNode.scheduleBuffer(pcmBuffer!, completionHandler: nil)
    }

This is an example debugging log:

audioData contains 1369 counts
Middle bytes of audioData: [250, 206, 86, 76, 254, 10, 221, 187, 190, 243]
compressedBuffer contains 0 packet counts
compressedBuffer contains 4096 byte capacity
compressedBuffer contains 1369 valid bytes
compressedBuffer contains 1024 packet capacity
Middle bytes of compressedBuffer: [250, 206, 86, 76, 254, 10, 221, 187, 190, 243]
Conversion successful
buffer contains 0 frames

The converter does not output any errors, but the pcm buffer contains 0 frames. Looking backwards this is probably because the compressedBuffer also does contain 0 frames. The audio gets written to the compressedBuffer but there is where things stop making sense. I have worked almost always with pcm audio before and since it is linear I can interpolate how many frames it has per packet based on the audio settings, no big deal. In compressed audio that is not possible because bitrate is generally variable. Maybe this issue is related to having 0 frames, or maybe something else.

Latest articles

spot_imgspot_img

Related articles

Leave a reply

Please enter your comment!
Please enter your name here

spot_imgspot_img