How it works: Linux audio explained

Linux

There's a problem with the state of Linux audio, and it's not that it doesn't always work. The issue is that it's overcomplicated. This soon becomes evident if you sit down with a piece of paper and try to draw the relationships between the technologies involved with taking audio from a music file to your speakers: the diagram soon turns into a plate of knotted spaghetti. This is a failure because there's nothing intrinsically more complicated about audio than any other technology. It enters your Linux box at one point and leaves at another.

If you've had enough of this mess and want to understand just how all the bits fit together, we're here to help - read on to learn exactly how Linux audio works!

If we were drawing the OSI model used to describe the networking framework that connects your machine to every other machine on the network, we'd find clear strata, each with its own domain of processes and functionality. There's very little overlap in layers, and you certainly don't find end-user processes in layer seven messing with the electrical impulses of the raw bitstreams in layer one.

Yet this is exactly what can happen with the Linux audio framework. There isn't even a clearly defined bottom level, with several audio technologies messing around with the kernel and your hardware independently. Linux's audio architecture is more like the layers of the Earth's crust than the network model, with lower levels occasionally erupting on to the surface, causing confusion and distress, and upper layers moving to displace the underlying technology that was originally hidden.

The Open Sound Protocol, for example, used to be found at the kernel level talking to your hardware directly, but it's now a compatibility layer that sits on top of ALSA. ALSA itself has a kernel level stack and a higher API for programmers to use, mixing drivers and hardware properties with the ability to play back surround sound or an MP3 codec. When most distributions stick PulseAudio and GStreamer on top, you end up with a melting pot of instability with as much potential for destruction as the San Andreas fault.

Here's a simplified view of the audio layers typically used in Linux. The deeper the layer, the closer to the hardware it is.

Here's a simplified view of the audio layers typically used in Linux. The deeper the layer, the closer to the hardware it is.

ALSA

INPUTS: PulseAudio, Jack, GStreamer, Xine, SDL, ESD

OUTPUTS: Hardware, OSS

As Maria von Trapp said, "Let's start at the very beginning." When it comes to modern Linux audio, the beginning is the Advanced Linux Sound Architecture, or ALSA. This connects to the Linux kernel and provides audio functionality to the rest of the system. But it's also far more ambitious than a normal kernel driver; it can mix, provide compatibility with other layers, create an API for programmers and work at such a low and stable latency that it can compete with the ASIO and CoreAudio equivalents on the Windows and OS X platforms.

ALSA was designed to replace OSS. However, OSS isn't really dead, thanks to a compatibility layer in ALSA designed to enable older, OSS-only applications to run. It's easiest to think of ALSA as the device driver layer of the Linux sound system. Your audio hardware needs a corresponding kernel module, prefixed with snd_, and this needs to be loaded and running for anything to happen. This is why you need an ALSA kernel driver for any sound to be heard on your system, and why your laptop was mute for so long before someone thought of creating a driver for it. Fortunately, most distros will configure your devices and modules automatically.

ALSA is responsible for translating your audio hardware's capabilities into a software API that the rest of your system uses to manipulate sound. It was designed to tackle many of the shortcomings of OSS (and most other sound drivers at the time), the most notable of which was that only one application could access the hardware at a time. This is why a software component in ALSA needs to manages audio requests and understand your hardware's capabilities.

If you want to play a game while listening to music from Amarok, for example, ALSA needs to be able to take both of these audio streams and mix them together in software, or use a hardware mixer on your soundcard to the same effect. ALSA can also manage up to eight audio devices and sometimes access the MIDI functionality on hardware, although this depends on the specifications of your hardware's audio driver and is becoming less important as computers get more powerful.

This screenshot of Alsa Mixer shows off everything that's wrong with Linux audio - it really doesn't need to be this complicated.

This screenshot of Alsa Mixer shows off everything that's wrong with Linux audio - it really doesn't need to be this complicated.

Where ALSA does differ from the typical kernel module/device driver is in the way it's partly user-configurable. This is where the complexity in Linux audio starts to appear, because you can alter almost anything about your ALSA configuration by creating your own config file - from how streams of audio are mixed together and which outputs they leave your system from, to the sample rate, bit-depth and real-time effects.

ALSA's relative transparency, efficiency and flexibility have helped to make it the standard for Linux audio, and the layer that almost every other audio framework has to go through in order to communicate with the audio hardware.

PulseAudio

INPUTS: GStreamer, Xine, ALSA

OUTPUTS: ALSA, Jack, ESD, OSS

If you're thinking that things are going to get easier with ALSA safely behind us, you're sadly mistaken. ALSA covers most of the nuts and bolts of getting audio into and out of your machine, but you must navigate another layer of complexity. This is the domain of PulseAudio - an attempt to bridge the gap between hardware and software capabilities, local and remote machines, and the contents of audio streams. It does for networked audio what ALSA does for multiple soundcards, and has become something of a standard across many Linux distros because of its flexibility.

As with ALSA, this flexibility brings complexity, but the problem is compounded by PulseAudio because it's more user-facing. This means normal users are more likely to get tangled in its web. Most distros keep its configuration at arm's length; with the latest release of Ubuntu, for example, you might not even notice that PulseAudio is installed. If you click on the mixer applet to adjust your soundcard's audio level, you get the ALSA panel, but what you're really seeing is ALSA going to PulseAudio, then back to ALSA - a virtual device.

At first glance, PulseAudio doesn't appear to add anything new to Linux audio, which is why it faces so much hostility. It doesn't simplify what we have already or make audio more robust, but it does add several important features. It's also the catch-all layer for Linux audio applications, regardless of their individual capabilities or the specification of your hardware.

PulseAudio is powerful, but often derided for making Linux audio even more complicated.

PulseAudio is powerful, but often derided for making Linux audio even more complicated.

If all applications used PulseAudio, things would be simple. Developers wouldn't need to worry about the complexities of other systems, because PulseAudio brings cross-platform compatibility. But this is one of the main reasons why there are so many other audio solutions. Unlike ALSA, PulseAudio can run on multiple operating systems, including other POSIX platforms and Microsoft Windows. This means that if you build an application to use PulseAudio rather than ALSA, porting that application to a different platform should be easy.

But there's a symbiotic relationship between ALSA and PulseAudio because, on Linux systems, the latter needs the former to survive. PulseAudio configures itself as a virtual device connected to ALSA, like any other piece of hardware. This makes PulseAudio more like Jack, because it sits between ALSA and the desktop, piping data back and forth transparently. It also has its own terminology. Sinks, for instance, are the final destination. These could be another machine on the network or the audio outputs on your soundcard courtesy of the virtual ALSA. The parts of PulseAudio that fill these sinks are called 'sources' - typically audio-generating applications on your system, audio inputs from your soundcard, or a network audio stream being sent from another PulseAudio machine.

Unlike Jack, applications aren't directly responsible for adding and removing sources, and you get a finer degree of control over each stream. Using the PulseAudio mixer, for example, you can adjust the relative volume of every application passing through PulseAudio, regardless of whether that application features its own slider or not. This is a great way of curtailing noisy websites.

GStreamer

INPUTS: Phonon

OUTPUTS: ALSA, PulseAudio, Jack, ESD

With GStreamer, Linux audio starts to look even more confusing. This is because, like PulseAudio, GStreamer doesn't seem to add anything new to the mix. It's another multimedia framework and gained a reasonable following of developers in the years before PulseAudio, especially on the Gnome desktop. It's one of the few ways to install and use proprietary codecs easily on the Linux desktop. It's also the audio framework of choice for GTK developers, and you can even find a version handling audio on the Palm Pre.

GStreamer slots into the audio layers above PulseAudio (which it uses for sound output on most distributions), but below the application level. GStreamer is unique because it's not designed solely for audio - it supports several formats of streaming media, including video, through the use of plugins.

MP3 playback, for example, is normally added to your system through an additional codec download that attaches itself as a GStreamer plugin. The commercial Fluendo MP3 decoder, one of the only officially licenced codecs available for Linux, is supplied as a GStreamer plugin, as are its other proprietary codecs, including MPEG-2, H.264 and MPEG.

Jack

INPUTS: PulseAudio, GStreamer, ALSA,

OUTPUTS: OSS, FFADO, ALSA

Despite the advantages of open configurations such as PulseAudio, they all pipe audio between applications with the assumption that it will proceed directly to the outputs. Jack is the middle layer - the audio equivalent of remote procedure calls in programming, enabling audio applications to be built from a variety of components.

The best example is a virtual recording studio, where one application is responsible for grabbing the audio data and another for processing the audio with effects, before finally sending the resulting stream through a mastering processor to be readied for release. A real recording studio might use a web of cables, sometimes known as jacks, to build these connections. Jack does the same in software.

Jack is an acronym for 'Jack Audio Connection Kit'. It's built to be low-latency, which means there's no undue processing performed on the audio that might impede its progress. But for Jack to be useful, an audio application has to be specifically designed to handle Jack connections. As a result, it's not a simple replacement for the likes of ALSA and PulseAudio, and needs to be run on top of another system that will generate the sound and provide the physical inputs.

With Jack, you can connect the audio output from applications to the audio input of others manually - just like in a real recording studio.

With Jack, you can connect the audio output from applications to the audio input of others manually - just like in a real recording studio.

With most Jack-compatible applications, you're free to route the audio and inputs in whichever way you please. You could take the output from VLC, for example, and pipe it directly into Audacity to record the stream as it plays back.

Or you could send it through JackRack, an application that enables you to build a tower of real-time effects, including pinging delays, cavernous reverb and voluptuous vocoding.

This versatility is fantastic for digital audio workstations. Ardour uses Jack for internal and external connections, for instance, and the Jamin mastering processor can only be used as part of a chain of Jack processes. It's the equivalent of having full control over how your studio is wired. Its implementation has been so successful on the Linux desktop that you can find Jack being put to similar use on OS X.

FFADO

INPUTS: Jack

OUTPUTS: Audio hardware

In the world of professional and semi-professional audio, many audio interfaces connect to their host machine using a FireWire port. This approach has many advantages. FireWire is fast and devices can be bus powered. Many laptop and desktop machines have FireWire ports without any further modification, and the standard is stable and mostly mature. You can also take FireWire devices on the road for remote recording with a laptop and plug them back into your desktop machine when you get back to the studio.

But unlike USB, where there's a standard for handling audio without additional drivers, FireWire audio interfaces need their own drivers. The complexities of the FireWire protocol mean these can't easily create an ALSA interface, so they need their own layer. Originally, this work fell to a project called FreeBOB. This took advantage of the fact that many FireWire audio devices were based on the same hardware.

FFADO is the successor to FreeBOB, and opens the driver platform to many other types of FireWire audio interface. Version 2 was released at the end of 2009, and includes support for many units from the likes of Alesis, Apogee, ART, CME, Echo, Edirol, Focusrite, M-Audio, Mackie, Phonic and Terratec. Which devices do and don't work is rather random, so you need to check before investing in one, but many of these manufacturers have helped driver development by providing devices for the developers to use and test.

Another neat feature in FFADO is that some the DSP mixing features of the hardware have been integrated into the driver, complete with a graphical mixer for controlling the balance of the various inputs and outputs. This is different to the ALSA mixer, because it means audio streams can be controlled on the hardware with zero latency, which is exactly what you need if you're recording a live performance.

Unlike other audio layers, FFADO will only shuffle audio between Jack and your audio hardware. There's no back door to PulseAudio or GStreamer, unless you run those against Jack. This means you can't use FFADO as a general audio layer for music playback or movies unless you're prepared to mess around with installation and Jack. But it also means that the driver isn't overwhelmed by support for various different protocols, especially because most serious audio applications include Jack support by default. This makes it one of the best choices for a studio environment.

Xine

INPUTS: Phonon

OUTPUTS: PulseAudio, ALSA, ESD

We're starting to get into the niche geology of Linux audio. Xine is a little like the chalk downs; it's what's left after many other audio layers have been washed away. Most users will recognise the name from the very capable DVD movie and media player that most distributions still bundle, despite its age, and this is the key to Xine's longevity.

When Xine was created, the developers split it into a back-end library to handle the media, and a front-end application for user interaction. It's the library that's persisted, thanks to its ability to play numerous containers, including AVI, Matroska and Ogg, and dozens of the formats they contain, such as AAC, Flac, MP3, Vorbis and WMA. It does this by harnessing the powers of many other libraries. As a result, Xine can act as a catch-all framework for developers who want to offer the best range of file compatibility without worrying about the legality of proprietary codecs and patents.

Xine can talk to ALSA and PulseAudio for its output, and there are still many applications that can talk to Xine directly. The most popular are the Gxine front-end and Totem, but Xine is also the default back-end for KDE's Phonon, so you can find it locked to everything from Amarok to Kaffeine.

Phonon

INPUTS: Qt and KDE applications

OUTPUTS: GStreamer, Xine

Phonon was designed to make life easier for developers and users by removing some of the system's increasing complexity. It started life as another level of audio abstraction for KDE 4 applications, but it was considered such a good idea that Qt developers made it their own, pulling it directly into the Qt framework that KDE itself is based on.

This had great advantages for developers of cross-platform applications. It made it possible to write a music player on Linux with Qt and simply recompile it for OS X and Windows without worrying about how the music would be played back, the capabilities of the sound hardware being used, or how the destination operating system would handle audio. This was all done automatically by Qt and Phonon, passing the audio to the CoreAudio API on OS X, for example, or DirectSound on Windows. On the Linux platform (and unlike the original KDE version of Phonon), Qt's Phonon passes the audio to GStreamer, mostly for its transparent codec support.

Phonon support is being quietly dropped from the Qt framework. There have been many criticisms of the system, the most common being that it's too simplistic and offers nothing new, although it's likely that KDE will hold on to the framework for the duration of the KDE 4 lifecycle.

The rest of the bunch

There are many other audio technologies, including ESD, SDL and PortAudio. ESD is the Enlightenment Sound Daemon, and for a long time it was the default sound server for the Gnome desktop. Eventually, Gnome was ported to use libcanberra (which itself talks to ALSA, GStreamer, OSS and PulseAudio) and ESD was dropped as a requirement in April 2009. Then there's Arts, the KDE equivalent of ESD, although it wasn't as widely supported and seemed to cause more problems than it solved. Most people have now moved to KDE 4, so it's no longer an issue.

SDL, on the other hand, is still thriving as the audio output component in the SDL library, which is used to create hundreds of cross-platform games. It supports plenty of features, and is both mature and stable.

PortAudio is another cross-platform audio library that adds SGI, Unix and Beos to the mix of possible destinations. The most notable application to use PortAudio is the Audacity audio editor, which may explain its sometimes unpredictable sound output and the quality of its Jack support.

And then there's OSS, the Open Sound System. It hasn't been a core Linux audio technology since version 2.4 of the kernel, but there's just no shaking it. This is partly because so many older applications are dependent on it and, unlike ALSA, it works on systems other than Linux. There's even a FreeBSD version. It was a good system for 1992, but ALSA is nearly always recommended as a replacement.

OSS defined how audio would work on Linux, and in particular, the way audio devices are accessed through the ioctl tree, as with /dev/dsp, for example. ALSA features an OSS compatibility layer to enable older applications to stick to OSS without abandoning the current ALSA standard.

The OSS project has experimented with open source and proprietary development, and is still being actively developed as a commercial endeavour by 4 Front Technologies. Build 2002 of OSS 4.2 was released in November 2009.

First published in Linux Format

First published in Linux Format magazine

You should follow us on Identi.ca or Twitter


Your comments

Very Clear

Dear TuxRadar,

Thanks for this! This is a great and clear article. It has amazed me how complex the Linux system has become, although this is understandable in the FOSS world, where development is driven by scratching itches rather than a super-system-architect! However, to achieve stability and broad desktop support, clear architecture is helpful, if not essential.

Still, there is some fantastic software here. Jack is amazing. ALSA and OSS, GStreamer et al too. Shame Phonon is not thriving quite so well (news for me).

I hope this article spurs some kind of greater effort like the cross-desktop project which seems to be unifying the desktop needs, but directed into harmony in the acoustic subsystems of Linux! (Hey, there's a good name for a project! I am around all week if you want to thank me!)

I guess it will fall to the distro's to plumb this together and make it just-work.

My greatest problem is not really the complexity of the sound system but the flaky hardware I try to run Linux on. My Toshiba Satelite A70 was a pain to get the sound working on. The troubleshooting I tried was frustrating. I got it working again with OSS4 but in the end the quite simply the hardware was faulty. But with a software stack so complex, this too me months to figure out!!! Ah well.

But thanks for this article.

Kevin

Appreciation

+1

Excellent

Linux audio has always confused me immensely, and I'm probably not the only one. This is the clearest written article I've ever seen explaining it!

Informative

Where sound input (such as a microphone or "line-in") come into play? I am having mic trouble under Karmic, and the ALSA mixer and the Pulse-Audio stuff doesn't seem to help. I've tried every check-box and configuration option, but it's a no go. I'm not asking for specific troubleshooting steps, I just want to understand where the microphone fits into the "big picture".

PS The Mic works when I boot to windows, so I know that it does work.

thank you

Thanks for writing this! It helps a lot!

Thanks

Great article! Thanks.

Crisp & Clear Article

I am a regular here and this article is so informative that i used it on one of my forum post :D

wow! cool!

very interesting. thanks!

the future?

good article thanks Graham. would just be interested to know in the longer term what are the plans - what can we expect to see developers concentrating their efforts on as regards audio in the years to come? will there be any prospect of consolidating all of these great ideas into a simpler standard which will provide greater ease of use and fewer compatibility issues eventually? Also would be interested to know what you regard as the best distro for music production in terms of user friendliness.

thanks for this I am a new

thanks for this I am a new convert to the world of linux and information like this helps out a lot.

finally a clear explanation!

thanks for writing this!
now I can fwd it to fellow laptop musicians who are more baffled than me by Linux audio ;)

Wow

What a mess!

Nice article

Hi thanks for the nice article. I surprised to know the Qt dropped Phonon. and I see the Linux audio stack is too complex.

An Idea

I'm sure that this has been tried before, but perhaps it's worth trying again:

Break it up!

Merge FFADO with the part of ALSA that talks to the built-in hardware, and we have complete audio hardware compatibility - just an audio pipe that exposes hardware capabilities to the next layer.

Above hardware should sit a re-incarnation of Jack (say, "reJackD"), which handles the patchbay, mixing, and (at the user's discretion) JackRack, together.

Codecs (MP3, Ogg, etc) should be added individually (like Gstreamer) as pre-patchbay (for decoding) and post-patchbay (for encoding) filters. Thus, any type of audio stream needing to go somewhere, could be sent to a specific filter in the patchbay, and the output from that can go wherever it needs to go: hardware outputs, encoders, recording software, etc.

The final user interface can expose output hardware volume, hardware balance, and input hardware sensitivity sliders by default. That makes perfect sense, and is all that the average desktop user would need. (Individual applications have their own volume sliders which affect only that application's stream.) These controls should directly influence controls in reJackD (which will get the job done either via software or hardware as applicable), not any intermediaries.

Additional controls available in reJackD - mixer, patchbay, and perhaps even JackRack - could be available to power users, via an "advanced" tab in the volume control properties.

For compatibility with existing systems, pulseaudio, oss, etc., could be modeled as "compatibility inputs" in reJackD's patchbay. Future development of audio programs should encourage a library, say, "libsound" that will route audio directly to the correct pre-patchbay filter, from where default settings (routing, mixing, etc) in reJackD would take over.

For Cross-platform compatibility, libsound could provide different types of output for different versions.

libsound should be desktop-agnostic, and be used in both Gnome and KDE (and others). Desktop sounds (alerts, clicks, etc.) should be treated like just another input for reJackD.

Advantages of this would be:

1. unified, layered sound architecture for Linux,
2. all the power and goodness of Jack,
3. infinite flexibility,
4. uniform common-sense interface for users,
5. Cross-platform compatibility for audio interfacing applications,
6. Codecs dealt with centrally reducing duplicity (no each program having their own mp3 codec, for example).

Disadvantages, well, I don't know!
I'm new to software development (well, brand new). Any pointers where I can send this proposal for serious consideration?

Continued

By the way, thanks for the article!

Looking at the idea, it would take a lot of cross-project collaboration... but that's what open source is all about, right? mutual sharing and improvement...

Liked it

Great article I really liked it. And it was just the right length, after reading this and what I have heard you guys and others say in Linux podcasts my ideal sound architecture would probably look like this for me as a desktop user:
GStreamer
PulseAudio
ALSA
Hardware

For a sound studio:
Jack
ALSA
Hardware

What do you think of this weary small and armature analysis?

By the way thanks for a great podcast and an awesome magazine.

Re: An Idea

I think it would be awesome to have a unified sound system that did everything and that everyone uses... but history shows that cross-project collaboration usually doesn't work out. People want it THEIR way, and some of them will just never compromise. I have no idea where to send your idea... maybe go on irc and start chatting up people... but i would love it if it came to fruition.

Still confused . . .

I'm still confused . . . who's ALSA?

ALSA

Advanced Linux Sound Architecture, I believe.

This article is crap. I will

This article is crap.
I will explain this in 10 seconds.

You have RAW audio data and sound card. For sound card you need driver, hence this is what ALSA and OSS do.
RAW>>>ALSA(Sound card)

Now RAW can be compressed either lossy or lossless, so it eats much less space. So you have CODEC libraries. Either monolithic all-in-one like Mplayer, flexible like Gstreamer or some weird mix like Xine(which can only decode).
Compressed(like .ogg)>>>Decoders(Mplayer, Xine, Gstreamer; sound now RAW)>>>ALSA(Sound card).

Now imagine you have several applications, which try to output sound at once(two audio players). One of them can lock the card and prevent other to get it. This is only partially negliable with ALSA, as it has very basic mixing module - dmix, but its unable to mix audiostreams of different Hz and rates. Also with pure ALSA nothing prevents app to reserve audio card for itself(adobe flashplugin anyone?) So you need a sound server - PulseAudio. Even if you dont need networked sound server, Pulse mixes, sets volumes and prevents takeover. It is musthave unless you have only ONE audio application running. With pulse, audio streams are also easy to manipulate incl. channels, output to several cards etc. Pulse incorporates a lot of ways to capture audio output from applications. Libraries that allow applications to handle sounds, like OpenAL, SDL; decoders like Mplayer, Gstreamer and Xine should always be build with PulseAudio output, and HAVE ALSA OUTPUT REMOVED! Because Pulse also has possibility to capture someone trying to output to alsa directly, but this makes audio travel: App>>Alsa>>Pulse>>Alsa, instead of App>>Pulse>>Alsa. Famous cracking is result of this missconfiguration. Build pulse with alsa support, remove alsa from everywhere else and add pulse sink instead. This is how pulse is set up.

Compressed(like .ogg)>>>Decoders>>>>PulseAudio>>>ALSA(one or many soundcards) or networking

ESD is not discussed, since it does what Pulse does - better mixing, but only for apps that support it (few) and thats it. Its being replaced by Pulse. Depricated.

Phonon is just a universal sound interface for Qt apps. Its nothing more than this. Set it to pulse audio as default sink or keep it alone if you use ONLY kde apps for sound output. Qt is not Linux or ALSA only, hence the need for Phonon as universal backend. It does for Qt, what SDL does for games.

JACK is nice thing that allows to route audio streams between applications in weirdest ways possible. Apps have to be JACK-enabled. Only musicians need this and you can combine JACK with Pulse; JACK handles routing between apps and drops final stream to Pulse.

Pretty much thats it.
Gstreamer is most advanced coder/decoder stack.
Pulseaudio is most advanced mixer and point-and-click sound manager.
ALSA is driver stack.

SDL, OpenAL are just sound playback libraries. Make sure they have ALSA or PulseAudio as output(depending on what you use) and forget them.
Phonon is Qt/KDE audio gateway, nothing more.

One more: ALSA vs OSS. OSSv4

One more: ALSA vs OSS.
OSSv4 features decent software mixer; so it does by itself what ALSA+Pulse do. So if you use OSS instead of ALSA, you pretty much does not need Pulse(unless you want per-application and per-soundcard point-and-click volume controls). Im not aware if it can prevent sound card steal though. And it has BAD reputation for going closed source and paid in the past. Pulse is not resource heavy, nor latent btw.

Small correction

Xine will output to jack. Although this option has been around for years Ubuntu hasn't had this option compiled in but will do so from Lucid Lynx onwards.

phonon->xine->jack->ffado works very well for me.

"Crap" looks like good hierarchy

Crap's layout arranges the pieces into a usable, understandable structure.

Pity about the distracting title.

Great pyramid diagram at the top, btw. <g>

Pulse Audio

Whatever pulse audio is SUPPOSED to do, all it actually seems to have achieved (in ubuntu anyway) is to utterly screw up audio.
A couple of releases ago everything worked perfectly.
With Lucid I now have to unmute ALSA at every reboot to hear my TV tuner, and DVDs will sometimes refuse to output surround sound in either Xine or VLC unless sound output is set to ANALOGUE?! (apparently Pulse audio is completely unable to do surround sound). Additionally many applications (Xine, VLC, Amarok) have to be configured individually to output sound and there aren't even basic utilities like a universal graphic equalizer for output.
Sound is THE weakpoint of Linux.

Yeah, after 3 years of fighting it, my MacBook Pro is on the way

*nix that works!

Linux Audio, should have just WORKED, by now ...

Hey, thanks for this GREAT article explaining Linux Audio...
---------------------------------

One thing is for sure.
Linux audio in general, is nothing but multiple-piece of eventually-will-break-KAKA, it appears to be full of egocentric dev/programmer fools. why ?, why is this ?
Because unless "they" think if it, or invent it, well it just can't be any good, right ?!
Where is the leadership in Linux's audio woes.?
This is the same mindset that Mr. Linus Gates-Torvalds took over ZFS ! Torvalds is sometimes, just a big-baby.
..., alsa, pulseaudio, oss, esd, ..., grandma's panties, ...
So after PA fails, they should name their next audio fiasco "Pride-Pulse" -as in "PeePee", "pride before the FALL!"
There is a complicated (but getting blurrier each day) level between "real" professional linux audio enthusiasts, (like the uses of JACKd,...) and the likes of most users.
Basically the average consumer just needs to "click on APP and sound is supposed to be Heard, and properly, ALL THE TIME" -Windows has been doing this flawlessly even BEFORE Linux was born. And Apple, has been doing this even longer.
Linux (mostly the fools I mentioned above) are destroying the "viablity" of Linux to ever become a good alternative Windows-replacement OS.
For krist's sake ppl. Its "audio" 'ya know 20-to-20kHz, can we not even get this right in Linux after 18 freekin' YEARS ?!
Linux, didn't seem to get it right with esd, alsa,..., and simply adding yet another complication-"bandaid" like pulseaudio isn't going to fix it all.
ALSA probably came close, but OSS semmed to have the BETTER "universal" quality overall, to me, and I'm talking about average comsumer-level needs and wants.
I have no idea where OSS, or any of them are really going now though. ?
Until Linux dev's realize that there has to be "STANDARDS" at the interface-level of audio hardware<-> to the firmware/driver/software just in front of it then it'll never really work for "everything".
every audio application will "HAVE" to talk a certain amount of agreed-upon language for all to work.
Running different directions at the same time is gonna get the rest of us NOWHERE, as this past "linux audio" has laughingly proved.
Oh ya, and when things get real pushy, lets blame those "windows-only-proprietary" audio chipmakers' because it's all their fault for needing-to-make-a-living.
Naaa, I say this time, as in it's past, Linux has no one to blame but themselves.
"OPEN"-Hardware anyone ?
Happy Now ?!

Linux Audio is truly a mess.

However, this article does make it clear how the various systems interact with each other.
Thank you for taking the time to post this.

Linux needs to have a single audio system that works out of the box.

Linux Audio definitely needs some standards.

hey, Great article, as in you nailed it in the HEAD !
:)
and, "XINE" is a 4-letter word that rhymes with "PUKE", -err umm in whatever level lamguage that it fails to talk to, at other levels of audio application layers, from which, it fails to draw a ....
yup, I now get the picture of Linux Audio.
:(

Sound needs a different way of thinking.

My belief is that the real problem is simply "locking".

If all these underlying layers stopped locking access to the hardware we wouldn't have these problems.

Surely at the hardware level all that is happening is that a byte of data is being copied to a memory location and an interrupt called?

Why does each low level "sound manager" have to lock access to the device each time it wants to send it's output? Why does it need exclusive access?

Why do we need the "sound server" model? Sure it fits in with the rest of the way Linux/Unix functions, but do we REALLY NEED that for sound? I mean it's not system critical, if the hardware is too busy when an app sends a byte or IRQ request, then just drop it, forget it, don't try to queue it because by then it's too late anyway. Queuing creates the distorted (slow) sounds we sometimes hear and the clicking and popping.

It's not like the disk read/write stack, missing bytes are of no interest for desktop audio use.

So basically drop the network model of thinking for audio, it's just not needed and has so far proven itself a failure.

Re: One more: ALSA vs OSS. OSSv4

"OSSv4 features decent software mixer; so it does by itself what ALSA+Pulse do. So if you use OSS instead of ALSA, you pretty much does not need Pulse(unless you want per-application and per-soundcard point-and-click volume controls). Im not aware if it can prevent sound card steal though. And it has BAD reputation for going closed source and paid in the past. Pulse is not resource heavy, nor latent btw."
-Per-application volume control is provided by GTK-based ossxmix
-OSSv4 is free software since 3 years, but I agree with you

Misinformation

1) JACK does not "drop it to Pulse" unless you purposefully create this as a rather wierd configuration.

2) Queing is not what creates "clicks", "pops" or "cracks". In fact, queing AKA buffering AKA latency is one of the simplest, easiest and most widely used techniques to avoid these phenomena.

3) OSS performs mixing in the kernel, which is a violation of Linux kernel design policy. Note that the X server, which mixes together the output of multiple display-using applications, is not implemented in the kernel (though it uses a set of kernel space drivers). There are no technical reasons to do mixing in the kernel - the motivation is mostly pragmatic, since the kernel is the point of access that no apps can choose to workaround. Oh, except with OSS, where they can avoid it if they don't want it. Go figure.

4) Flatfish wrote "Linux needs to have a single audio system that works out of the box". Which box? Should the "linux" that runs on your android cell phone have the same audio system as my 8-core studio system with 256 channels of input and output? Should a home theater system have the same audio system as an embedded linux used to drive a software synthesizer? And before you mention OS X, even Apple have abandoned the "single audio system" - the iPhone and its cousins have a system which although superficially similar to CoreAudio on "actual macs", is really fundamentally different from a programming perspective.

How can that be?

I'm currently in a state where some apps play audio, some other not. Sometimes, after boot, have to restart ALSA or pulseaudio manually, sometimes not.
Sometimes I update and I have no audio, sometimes I have no audio, then it comes back after and update.

It's disappointing that in 2010 Linux still can't have a standard, working, audio subsystem.

I understand how it doesn't work reliably (and your article helped), but I can't understand why people can't work together at a solution while in other (most?) areas, the same development model (opensource) worked so well.

I suppose there's simply not much interest, probably geeks are deaf...

JMTC

Thank you!

This is a great article, and has answered MANY questions I've had for years now!

Regards,
Jack

OSSv4

I wish you give OSS the attention it deserves. Try it on a couple of installations and see how it just works. I will never go back to the ALSA mess.

FireWire *does* have an audio standard.

FireWire most certainly *does* have an audio standard. In fact, FireWire had a working audio standard back before the USB spec was even finalized. The main standard is called AV/C, and FireWire camcorders have spoken it since 1995. Many FireWire audio interfaces speak it, too.

There's also the mLAN standard. Most of the audio interfaces that don't speak AV/C speak one version of mLAN or another.

FireWire interfaces that don't conform to one of those specs are relatively rare....

Great article

Request for some basic short finer details of audio inside kernel - alsa -> i2s -> mcbsp/msp

Thanks for this great article.

Yes, Linux Audio is a mess, and yes, many make it WORSE

People with comments like:

>I wish you give OSS the attention it deserves. Try it on a
> couple of installations and see how it just works. I will
> never go back to the ALSA mess.

This is the same as if I would say:

Why of why do you slight ESD? ESD is still very useful. Why do you say it is dieing?

Sarcasticly you could say: "why do you want the linux audio infrastructure to become more simple? Choice is important, it makes sure nobody, not even experts that invest years in getting to know it, can use and understand it."

Get this: we got too much choice in Linux audio. Instead of 20 half baked systems, we need 3 working ones.

But people are stupid, incredible stupid: They add on more. Phonon is an example. Why take pulseaudio, that is aready that defacto standard? Nope, we will invent a new one. ANOTHER ONE for gods sake!!!
Another one that contributes to the mess that is already impossible.

ALSA might be bad, it might have its problems, as well as Pulseaudio, but remember: linux audio is going nowhere, when NOBODY IS USING IT, or better said: when nobody CAN use it.

Send some non expert on the half baked quest for "OSS4 is supposed to be good" and he or she will soon, after hours chuck Linux audio to the curb. Why shouldn't they? It does not work in their eyes, at least not without putting 13212 hours into it.

Don't get me wrong. I love Linux Audio for what it does: Jack sound beats pretty much anything you get on Windows. It has logging and if something goes wrong, you can actually fix it. In Windows, if you got problems, you are going to reinstall your soundcard driver and then, the whole OS. That is why I despise windows and its users: they are idiots and nitwits. People that throw away a washing machine or freezer, because... it is broken...
EVER HEARD OF FIXING STUFF? These people waste a lot of money and time on instisting on their stupidity, claiming that things need to be simple. And then get overcharged by their mechanic, lawyer, dentist, doctor or whatever. Note: knowing as little as possible is a very dangerous and expensive proposition. No wonder you are always broke.

Anyway, please people that do Linux audio: concentrate on what is good for the user and the user community as a whole.
Forget ideologies. Linux audio is so messed up as it is, ideologies can only hurt at this point. Most people are not smart enough to use Linux audio, much less invest this much time into it, fighting off the dumb standards (everything else than ALSA / pulse and jack is toast and so over)

Think about what is good for Linux, for somebody that just wants to use audio and make music on Linux.

pulseaudio is conquering linux audio, thankfully.

>I'm currently in a state where some apps play audio, some other not. Sometimes, after boot, have to restart ALSA or pulseaudio manually, sometimes not.
Sometimes I update and I have no audio, sometimes I have no audio, then it comes back after and update.

Well, you are not alone, trust me! It is a huge problem.

>It's disappointing that in 2010 Linux still can't have a standard, working, audio subsystem.

Yes, you said it right. But there is good news: pulse audio seems to be the new audio standard. Too bad that you still need the jack audio server for pro audio, so we still need pulse audio and jack to work together. Currently they don't, so either one of them grabs the soundcard and bingo...

>I understand how it doesn't work reliably (and your article helped), but I can't understand why people can't work together at a solution while in other (most?) areas, the same development model (opensource) worked so well.

>I suppose there's simply not much interest, probably geeks are deaf...

Oh there is, I am a super geek, but you know, it is HARD, super HARD!!! There are tons of applications that don't support pulseaudio or jack yet. VMware player for example still insists on using ALSA or OSS (which shows how off touch and super yesterday VMware as a company is on Linux)
So Vmware player will not work in any new linux, because all new linuxes use pulseaudio, whcih is really a huge improvement. And pulse even restarts now, after it crashes, which is very nice and useful.

So we are getting there and Ubuntu, as in everything on the desktop, is leading the way.

So we are working to improve Linux audio, but unfortuantely there are IDIOTS that still invent new sound systems, probably living in their own world and assuming that sooner or later everybody will use THEIR system. These are the same type of idiots that believe that another file format for audio is useful. Not that we already have way too many of them, only about 23982938 of them.
They defintitely don't care about the user. They write software for... well, for just writing it. These are the same people that write software and then refuse to document it. Well, amazing: so now you got something that you cannot use because ... well, you don't know how...

Great article!!!

But, after being so negative:
Congrats for the great article. I like it very much, it explains linux audio short and on the point and shows the relationships of the different systems. very nicely done.

A little work done in cleaning up the mess and making it more transparent!

Pulse Audio sucks.

The reason PA gets hostility is not just because it doesn't add anything new to the table, it's that it loves to BREAK SOUND for no reason other than to stroke its author's ego.

Pulse Audio is so hated because it is utter crap, and should never have been adopted so widely in the first place.

why the hostility?

pulse isnt that bad at all, learn to use it first before you say its crap, ive been using since it was available and never had a problem with it. it makes using programs like projectm, audacity a total cynch. it is true though that there are too many layers, but i see a problem with making say jack the lowest level before hardware, how would it know all the different soundcards? alsa has the knowledge already. surely it would a long time and a lot of people to write new drivers???is that how it works? the good thing about linux is that if you dont like it, you can code your own stuff, or give the coders of the programs your views on how it can be improved, and if you do it nicely, they might listen.

Man you roxxx

Really, really, really (and much more) thanks for this explanation ! The clearest I've read on this subject ;) Thanks !

can anyone describe where DAC (digital-to-analog) is happening

i'm trying to figure out how to capture/record what i am hearing before it goes through the DAC.

i've seen many examples, but they are resampling analog instead of just grabbing slices off the sound card (like what myFairTunes was doing)

cheers

Great Article

After scouring the internet for good explanations and details on audio under Linux, I finally understand. This serves for a great foundation.

Now, on to my next step. Which is, to actually get JACK to work.

I just installed openSUSE 11.4 KDE 4.6 and even though JACK installed with no problems, it just will not boot up and run.

So, I hope I can find an article here on properly configuring audio and troubleshooting when it does not work. Then, same thing with JACK because I am determined to use Ardour!

Another vote for Mess, at least as far as Firewire is concerned

I just want to plug in my forewire audio interface and play music. Occasionally I want to take an analogue input from the same box and record it.

For some reason, this is not even thought about, let alone possible.

I can use the Mother-board built-in and ... it just works.

I can use my ancient RME PCI card ... and it just works. (in fact it is nice that it is supported, when even RME are not supporting it past Win-XP

But my Echo Audio Firewire device? No: I am not interested in the fact that Jack will connect anything to anything for me, I just want to listen to music!

My computing life started with Unix. I love that there are hundreds of commands that I may never use, but hey, they are there. One day, Jack might be just what I need, but I do not want to be forced to use it, just because my device is firewire. It's like being forced to learn C just because yu want to write a simple shell script. Unix never was like that.

Good job "messing" up ppls minds

This was a truly great article!
Now I finally understand how ALSA, OSS, FFADO, Pulse, Jack, Gstreamer and Xine all work together while they run concurringly on my system. Thanks to that great pyramid diagram, I finally understand :D

And it's truly such a mess! All I want to do is play some music, and there's all those confusing options I've got to choose from. Where should I start?

(For an actually realistic explanation of audio, read the comment titled "This article is crap" it sums everything up beautifully)

BeOS Got it Right

Copy it, they had sub 5 msec latency on 90's hardware with no aliasing or glitchs, Copy what BeOS did, and end the misery, it did everything all of these systems do, and it did it better.

Maybe it's actually not so terrible

Probably the best solution is the one I think major sound architecture developers and distros have created: out of the box normal sound works fine for normal users of normal hardware. For fancy users with fancy hardware with fancy apps, there may be some configuration struggle, perhaps epic. But the end result is that the possibilities are quite versatile from low level dummy to studio quality. All kinds of quirky things are possible if not easy (e.g. piping sound over a network). I think that keeps to the spirit of Linux in general. Easy things (if not proprietary) are easy and the very, very hard things are possible, if only barely. Also let's not forget that tremendous progress has been made and the good work of sound devs is not likely finished.

An operating ststem with no sound......

The biggest problem with everything Linux is thet itś in the end always always excused with a naive idea that everyone that uses it should be interested enough to be a total geek aka a programmer themselfves. A little bit like cops starting to walk and talk like criminals because they move in that area so much they loose perspective.

Im no programmer and will never be. How can it be that a basic function such as the sound output doesent work in Linux!!

I fuckin hate Linux !! Itś good at going out on the net quickly bla bla but whqat the fuck come on!!!!!!! I dont care about four desktops in a cubestruvture or how many free programs there is. Im talking about a car without wheels here :). An operating ststem with no sound......

It's exactly things like

It's exactly things like ALSA and friends that give Linux a bad name and frighten people away.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

CAPTCHA
We can't accept links (unless you obfuscate them). You also need to negotiate the following CAPTCHA...

Username:   Password:
Create Account | About TuxRadar