Why don’t podcasts use VBR MP3s? Because iOS and macOS don’t accurately seek them
Filed as Apple bug (Radar) 27848317. The problem, in short:
AVFoundation, the low-level audio/video framework in iOS and macOS, does not accurately seek within VBR MP3s, making VBR impractical to use for long files such as podcasts. Jumping to a timestamp in an hour-long VBR podcast can result in an error of over a minute, without the listener even knowing because the displayed timecode shows the expected time.
Why VBR?
VBR encoding is far more space-efficient and better-sounding than constant-bitrate (CBR) encoding. It’s especially pronounced in podcasts, where VBR makes most podcasts 20–50% smaller AND better-sounding than the 64 kbps CBR encoding that most podcasters are forced to use today.
VBR could save podcast listeners massive amounts of data transfer over time. (And therefore money, and battery life, and precious storage space on phones.)
Without accurate seeking, streaming and web audio players don’t work properly, including share-at-timestamp links that are becoming key drivers of the sharing and spreading of podcasts.
Why can’t podcasters use it?
I explained how MP3s work, and why this is a problem, on Accidental Tech Podcast last week — see that? That’s a share-at-timestamp link, and if that file was VBR, it wouldn’t seek to the correct time.
See for yourself: here’s that same podcast in VBR. Note that the file is 25% smaller and the theme song (at 1:22:47 in the original file) sounds way nicer in the VBR version. But if you seek to the same timestamp as the above share link — 1:24:30 — you’ll hear the wrong audio. The player will say 1:24:30, but you’re actually hearing the audio at 1:25:16.
That’s 46 seconds off, and that’s enough to break timestamp sharing, and that’s enough to ensure that nobody ever uses VBR files, and podcasts keep transferring more bytes to sound worse for the foreseeable future.
We fixed this in the same year the Backstreet Boys released “I Want It That Way”
Three simple solutions to accurate VBR stream-seeking have existed for almost twenty years to embed seek-offset tables at the start of VBR MP3s for precise seeking:
- “MLLT” ID3 tag, circa 1999 (preferred, most precise)
- Fraunhofer VBRI frame, circa 2003 (moderately precise)
- Xing/LAME frame (too imprecise for long files)
But AVFoundation supports none of them. VBRI and legacy Xing frames are read, but only the duration is used from each, not the seek table. MLLT tags are seemingly ignored.
It appears that AVFoundation simply estimates byte offsets with the simple ratio of (timestamp / duration) × totalBytes
, but that assumes a constant average bitrate over the file, which is incorrect and an unsafe assumption for VBR encoding. (ABR maintains an average bitrate over the whole file, but doesn’t achieve a better enough size-to-quality ratio than CBR for most podcasts.)
Supporting either MLLT or VBRI at the AVFoundation level (therefore affecting Safari, HTML5 <audio>
, Apple’s Podcasts app, and more) would instantly make VBR podcasts practical, allowing much smaller files and better sound without sacrificing shareability and stream-seeking.
I’ll be adding MLLT support to Overcast, but without a way to embed podcasts in the web player to preserve share-at-timestamp links, VBR files will continue to be practically unusable for podcasters.
Know anyone in engineering at Apple? I’d appreciate any attention you can draw to this issue, which I’ve filed as bug 27848317.