10x faster implementation of swapEndiannessInplace that does not mutatate the original buf array.