Thursday, December 15, 2022

Let me show you my ARSE.

Arbitrarily Routing Sound Engine

 I, and I'm sure a lot of other people, have a need that's seriously not being met by current systems on the market. I am referring to Windows-based systems here, as I'm sure you could probably do something like this with JACK on Linux, but ain't nobody got time for that.

What I need is a system that will allow me to map any number of audio inputs to any number of audio outputs simultaneously on Windows. So, for example:

  • Microphone, Discord audio, Game audio, Music and Overlay Audio routed to one input of OBS
  • Microphone, Discord, Game and Overlay Audio (but no Music) routed to a second input of OBS, for VOD capture
  • Microphone routed to Discord for chat (possibly game audio routed there too, so participants can "watch" without double audio confusedness/weirdness on their part)
  • Microphone, Discord, Game and Music all being captured simultaneously, in separate tracks, for later combining manually as part of an edited clip or highlight reel for YouTube
  • And, of course, Discord, Game, Music and Overlay all being played through headphones for the user to hear

This would also work well for Podcasters, being able to capture multiple microphones, and/or call-in guests, along with Soundboard audio in a multi-track session, without having to worry about sync or external sources, for later mixing and combining as necessary. You could also have plugins like an auto-ducking plugin, so when someone talks, another source (or sources) have their volume cut, like DJs on the Radio.

Existing Solutions

Currently it seems the best option is the Virtual Mixer software Voicemeeter (and it's increasingly more complex variations, Banana and Potato). While they can work, they're very limiting - There's no multi-channel recording, and they rely on having several virtual devices that you manually send audio to or consume audio from. While they do have recording built-in for Banana and Potato, it mixes the inputs down to a single output, which is not ideal. Software like this seems to be based on the same paradigm as hardware mixers - Mixing inputs down onto a (number of) output buses. While this works for some needs, it's not as flexible as it could be, and with a little work I think this could be a lot better.

Presentation (User)

So what does the user see? This is where it could get difficult. One of the easiest visual metaphors would be a patch board. Each (system) output would have a "default" option, so new programs/devices would automatically be hooked up to them, but you could also manually grab a "cable" and route an input to an output, or unplug a routed option if you don't want it.

To simplify presentation, by default, things that make sound would be stacked up on the left, devices that consume sound would be stacked on the right, and any plugins could be dragged into the space in the middle to sit between them. "Cables" then run between the sound makers and sound consumers, or from makers to plugins, and plugins to consumers.

This is still going to be a complex system, so definitely not for beginner users, but it would be immensely powerful, and would allow a great deal of flexibility, both for live presentation and for mixing and correction afterwards. For instance, the multi-channel capture would allow for easy editing, removal or replacement of content flagged by ContentID, for instance, having a good clean capture to re-assemble the audio from. For easier control, especially while other things are happening on the system (in a game, for instance) you could have some method to hook up and listen for MIDI control codes, so you can map inputs or outputs to things like the KORG nanoKONTROL2

Presentation (Programmatic)

So, how would this appear to programs running? Well, ideally there would be one input and one output (unless more were required, like in OBS' case). Each program would just see a single main and default device to output to, and a single default device to input from, and the engine would handle routing sources appropriately. The "devices" would accept the best quality option available (multi-channel, bitrate, and frequency) and internally the audio data would be converted as needed (Mono microphones upmixed to stereo, 7.1 audio downmixed to stereo for headphones etc.)

Potential Problems

The main hurdle here would be getting the audio out of Windows before it mixes it internally, as this system would have to bypass most of the internals of the Windows sound system, and presenting a unique audio feed to each listening program. I am unfamiliar with the internal workings of Windows Audio, but if programs like OBS can capture audio from a specific program it seems to me there must be a method to do so, even if it's making a "dummy" device that throws away any audio data it gets from Windows and hooking into programs themselves.

Conclusions

This would be a fantastic idea, and I'd love to see it implemented, but I cannot do it personally - It's coding on a much lower level than I've done before, involves hooking into the deep and arcane wizardry of the Windows audio subsystem which I haven't even looked at let alone touched, and is frankly way beyond me. But it's a good idea IMHO, and I'm sure someone could make a small fortune making it or something like it. I frankly wouldn't be surprised if someone has already made something like this, and if they have I'd love to know about it.

Meanwhile, feel free to use this idea to make the next killer audio app, and when you do just chuck my name in the credits somewhere. :D