Run audio encoding tasks in serial or in parallel on AMD64?
Q I'm running Ubuntu 6.10 64-bit on AMD64 and I do a ton of audio encoding. I set up a small test to see what was more effective: encoding four directories worth of FLAC files (four files to each directory, all the same size) to OGG in serial or in parallel. I wrote two Bash scripts to attempt to measure the performance. The first script takes around nine minutes to execute (just over two minutes per directory) while the second script also takes roughly nine minutes, even though each folder contains nine minutes' worth of encoding. I'm sure that there's a point at which running all of the tasks in parallel runs slower than running them one at a time. Watching the output from top shows four instances of flac running, each taking approximately 20% of the CPU's capacity when running in parallel. While running in serial, a single flac process uses much more CPU power.
Are there any benchmarks or guidelines to follow? Without further testing I'm left wondering whether I could be saving a lot of my time one way or the other when I need to encode tons of files.
A There is some overhead in running tasks in parallel, because of the extra task switching and memory management involved, but this is insignificant for small numbers of tasks. Had you tried to run 20 or 30 encoding processes in parallel you would have noticed a reduction in speed, especially if you started to use swap space. Encoding files from hard disk to hard disk places a heavy load on the CPU and memory while demanding little of your disks - this is what the techies call a 'compute-bound' or 'CPU-bound' task. On the other hand, ripping data from a CD or DVD is largely dependent on the speed of the transfer while asking little of the CPU - this is called 'IO bound' So running two CPU-bound, or two IO-bound, processes in parallel is likely to have little benefit over running them in serial, but running one of each in parallel will give a large speed benefit. If the audio that you're encoding is coming from optical discs, or any other source that gives relatively slow transfer speeds, you will see a great improvement in parallelling the processes, as in the following:
- Rip track 1
- Encode track 1 in the background
- Rip track 2
There are a number of CD ripper/encoders that do just this, including my favourites: Grip (www.nostatic.org/grip) for GUI operation; and Abcde (www.hispalinux.es/~data/abcde.php) for console use. If your audio files are already on your hard disk, you may as well keep the number of encoding processes low, but be sure to use at least two - a single process will always be subject to interruption. The only really useful benchmark is one that closely mirrors your own usage, which normally means running your own tasks and timing them, as you have already done. Bear in mind that your encoding will take place in the background, so unless you do a huge amount, or each job is urgent, you could easily spend more time on benchmarking than you would save by improving your machine's performance. You have already established that there is little discernible difference for a small number of processes. Higher numbers will not improve things - unless you're running multiple multi-core processors.
Follow us on Identi.ca or Twitter