ios – Swift SIMD operands slower than a simple while loop, array times a scalar multiplication


I have a buffer of bytes, I want to multiply each byte be another byte like 0x20. One way is to simply iterate over the buffer and multiply each byte. This is obviously suboptimal, SIMD can do this much faster. But using SIMD in Swift is much slower.

On a MacBook Pro M1 Max:
SIMD: 180ms for 100k iterations (operating on 64 bytes at a time)
Loop: 35ms for 6.4M iterations (operating at a single byte)

Here is the code:

let inBytes = Data(repeating: 0x20, count: 6400000).withUnsafeBytes { bufferPointer in
    // 100K iterations of the outer loop
    // Empty while loop takes about 2ms
    while(iteration < 6_400_000 / SIMD64<UInt8>.scalarCount) {
        let assumed = bufferPointer.assumingMemoryBound(to: SIMD64<UInt8>.self)
        let batch = assumed[0] // Will use the same batch all the time for testing purposes

        // This takes 180ms for 100k iterations (6_400_000 bytes / 64 bytes size of the simd)
        let spaceMask = batch &* 0x20
        /*
         Looking to do all these operations much faster, they are all slow
           let spaceMask = batch .== 0x20
           let result = batch &* 0x20
           let tabMask = batch .== 0x09
           let combinedMask = (spaceMask .| tabMask)._storage
       */
        
        // Using this loop, it takes 35ms total, running 6.4 million iterations in total
        var i = 0
        while(i < 64) {
            let batchNumber = batch[i] &* 0x20
            i += 1
        }

        iteration += 1

    }
}

I would expect the SIMD version to be at least 10x faster than a while loop, instead I got 5 times slower.

Latest articles

spot_imgspot_img

Related articles

Leave a reply

Please enter your comment!
Please enter your name here

spot_imgspot_img