ffmpeg, this under rated tool
April 29, 2021•545 words
A few days ago, I needed to convert a recording from .mkv to .mp4. On Google you find a bunch of free online tools for format conversion, but uploading an hour long video just to do a format conversion, and then downloading it back seems overkill.
Then I remembered this CLI I heard about a while ago: ffmpeg
. As per the the official documentation:
FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. No matter if they were designed by some standards committee, the community or a corporation.
Lots of quick edits and format conversions can be achieved through ffmepg
, limiting the need for advanced tools like Adobe Premiere Pro or Media Encoder. You can change the format, trim a video, add subtitles, adjust audio, convert a video to a gif 🙂 (or even a gif to a video 🙃 ). All of this from the command line.
The complete doc can be overwhelming but many GitHub gists propose the most commonly used commands in an easy to read fashion, see this ffmpeg cheat sheet
In my case, turned out I only needed to run:
ffmpeg -i input.mkv output.mp4
If a cheat sheet solves the task at hand, it's worth investing some time in understanding the concepts at play when dealing with audio/video format.
Below are some concepts that helped me decipher the ffmpeg
doc.
Container
A container format is a file format that enables multiple data streams to be embedded into a single file. A codec is required to decode the content of the container (several codecs actually, as audio and video each are encoded with different codecs).
For example, mp4 is a container format.
Muxer / Demuxer
aka multiplexing / demultiplexing
- multiplexing: going from several signals towards a unified signal. (e.g., from a video + an audio signals to a single output signal)
- demultiplexing: separating a single signal into several sub-signals. (e.g., from a single output to an audio plus a video output signal)
Some popular codecs for audio...
- mp3
- aac
...and for videos
- h264
- h265
- vp9
- av1
✨ Bonus Adaptive bitrate / Adaptive streaming
Quick refresher, the bitrate corresponds to the number of bits required to encode one second of video. The higher the video quality, the higher its bitrate.
One challenge when streaming videos from mobile devices is the variation of the network quality. Let's say you start watching a 1080p video from the city center with good 4G network. Then, you take the train to the countryside. The network degrades to a capricious 3G and the video buffers or stops.
Adaptive bitrate got your back!
With adaptive bitrate, the video is exposed on multiple formats: 240p, 720p, 1080p, 4K, etc. on the server. The client downloads a table of the mapping quality -> video chunks in a given quality and starts playing the best quality it can afford given the available bandwidth. In the city center, the client consumes the video chunks from the 1080p url, and in the train, as the network degrades, the device starts consuming video chunks from the 720p url. Pretty cool!
See this great article for more details on adaptive streaming:
Laukens, N. (2011). Adaptive Streaming — a brief tutorial. Retrieved May 16th, 2021, from https://tech.ebu.ch/docs/techreview/trev_2011-Q1_adaptive-streaming_laukens.pdf.