Apple Silicon Docker amd64/x86 emulation via Rosetta is fast(er)

Apple Silicon Docker amd64/x86 emulation via Rosetta is fast(er)
Photo by Ian Taylor / Unsplash

Summary: Phoronix 7zip compression test benchmarks around 2-4 times faster when running via Rosetta rather than QEMU.

Recently, Docker released a beta feature for Docker Desktop that allows for x86/AMD64 images to be run via Rosetta rather than emulated on QEMU. QEMU gets the job done, but the performance overhead of emulating an AMD64 container on ARM64 is costly. From personal experience, I've seen cross-architecture Docker builds take around 5-10 times longer when built for different architectures. This problem is much more apparent on Apple Silicon Macs, which use ARM64 rather than the more common x86/AMD64. Apple's solution is Rosetta, which is a dynamic binary translator that translates Intel/x86 instructions to Apple Silicon/ARM64 instructions on-the-fly with a small performance hit. This performance loss I've heard sits around 80% the speed of Apple Silicon-native instructions.

After learning about the new option, I decided that I would test performance differences between QEMU and Rosetta emulation in Docker. This was done via Phoronix Test Suite, a suite of test tools, tests, and test suites meant to profile computer performance. I made a simple Dockerfile and script to run Phoronix, which you can checkout here. I chose to run the Prime Sieve test since it ran relatively quickly and is CPU-based. Between the two tests, I switched Docker's beta emulation setting from QEMU (default) to Rosetta.

The QEMU container performed worse than the Rosetta container. The output is in the form of time to execute the test (seconds). The Docker virtualization engine was limited to all 8 cores, 4 GB memory, and 1 GB swap. The test when ran natively was not limited and also ran across all 8 cores. Summary results for the tests are below:

Apple Silicon ran the test in around 26 seconds on average. Rosetta took on average 32 seconds. QEMU took around 254 seconds.

Here is a plot showing how the two emulation methods compare to native execution in terms of the native run time.

Prime sieve runtime compared to running the test natively. Rosetta ran roughly 20% slower than native, and QEMU ran around 85% slower than native.

And here are the actual averages and distributions of the tests:

  • Apple M1: avg 26.476, stdev 0.642
  • Rosetta 2: avg 32.276, stdev 0.152
  • QEMU: avg 253.651, stdev 0.364

Run the rests for yourself:

GitHub - patthomasrick/Docker-Phoronix: Run Phoronix in Docker
Run Phoronix in Docker. Contribute to patthomasrick/Docker-Phoronix development by creating an account on GitHub.

Below are additional tests comparing 7zip compression and decompression speeds between Rosetta and QEMU (no native baseline).

Phoronix - compress-7zip output, MacOS Docker backed by QEMU
Phoronix - compress-7zip output, MacOS Docker backed by QEMU - phoronix-qemu.txt
Phoronix - compress-7zip output, MacOS Docker backed by Rosetta
Phoronix - compress-7zip output, MacOS Docker backed by Rosetta - phoronix-rosetta.txt