Tuesday, July 21, 2015

Sidetone project 1

With PPQT pretty much out of my hair (until my OCD drives me back to do another translator or such) I turned to another project, a small one I call "Sidetone".

Sidetone is the sound of one's own voice in a telephone handset. I say handset because I learned the term when working for PacBell in the 60s. Picture a telephone handset:

You listen at the one end; you speak into the other; and a small amount of the sound of your own voice is repeated in the listening end. It makes the phone sound "live". Before a connection is made, or after the connection drops, there's no sidetone and your voice sounds flat, muffled, dead.

I'm starting to spend shifts taking calls on the Recovering From Religion hotline for which I wear a Plantronics headset. And it offers no sidetone. It's annoying. Of course I can hear myself, my voice is transmitted through the air and through my skull bones. And I know the mic is working because I can go to the Sound Preference pane and see the VU indicator bouncing as I speak. But I sound muffled to myself—no surprise, since I have padded things over my ears. I want sidetone! Checking around the web, I find some indications that in Windows, it is possible to get the audio driver to provide sidetone. It's an obvious feature for the system's audio to offer. Given you have a single USB or BlueTooth device that has both an input side and an output side, how hard would it be to direct an attenuated copy of the input signal back to the output? But this is not a feature offered by the Mac OS Sound Preference panel.

So I started picturing a simple little Mac app that would (somehow) take audio in and dribble a little sidetone back out. I thought maybe I could write a real "grown-up" app using XCode. I knew that Mac OS had something called Core Audio; and I know XCode has lots of example programs. So I tried to find some example that I could maybe manipulate to do what I wanted.

And I did, it's CAPlayThrough. But Oh. My. Word. what a monster. CAPlayThrough.cpp alone is over 800 lines (excluding the lengthy don't-sue-us prolog) and there are four other .cpp files and a bunch of .h files as well.

And the capper? It doesn't work very well! I had XCode build it and run it, and told it to take input from the mike and write output to the earphones. Speak into the mic and I sound as if I were in a good-sized barrel or a small cave; there is a latency of at least 0.1 second. That's not good for sidetone; sidetone has to be near-zero latency.

Then I poked around the Python docs and pypi for a while. Audio support is not one of the "batteries included" in Python. There are several libraries to interface to PortAudio, an open-source package. But it wasn't immediately clear how well PortAudio was integrated into Mac Core Audio. And the PyAudio interface package, like some other audio modules I looked at, depends on numpy. Which tells me they are storing and retrieving audio samples as numpy arrays. Probably I'm being unfair but that smells of latencies to me.

Then it occurred to me to look at good old (Py)Qt. What does Qt have to offer in the audio arena?

Quite a lot, it turns out. I'm not sure at this point if it is going to be possible to do what I want, but the facilities are certainly simple. The following program, displayed in full, lists all the available audio devices and their characteristics.

from PyQt5.QtMultimedia import QAudioDeviceInfo,QAudio
from PyQt5.QtWidgets import QApplication

app = QApplication([])

def show_info( mode, dev_list ):

    print('Found {} {} devices'.format( len(dev_list), mode) )

    for audio_info in dev_list :
        print('\n\nname:', audio_info.deviceName())
        preferred = audio_info.preferredFormat()
        print('\tpreferred channel count:', preferred.channelCount())
        print('\tpreferred sample rate:', preferred.sampleRate() )
        print('\tpreferred sample size:', preferred.sampleSize() )
        #print('\tsupported sample rates')
        #for rate in audio_info.supportedSampleRates() :
            #print( '\t\t{}'.format(rate) )
        #print('\tsupported sample sizes')
        #for size in audio_info.supportedSampleSizes() :
            #print( '\t\t{}'.format(size) )

dev_list = QAudioDeviceInfo.availableDevices( QAudio.AudioInput )
show_info( 'input', dev_list )

dev_list = QAudioDeviceInfo.availableDevices( QAudio.AudioOutput )
show_info( 'output', dev_list )

That took about 15 minutes to put together, most spent in the Assistant finding the names of the classes. Here's some sample output.

Found 2 input devices

name: Built-in Microphone
 preferred channel count: 2
 preferred sample rate: 44100
 preferred sample size: 24

name: Plantronics .Audio 648 USB
 preferred channel count: 2
 preferred sample rate: 44100
 preferred sample size: 16

Found 2 output devices

name: Built-in Output
 preferred channel count: 2
 preferred sample rate: 44100
 preferred sample size: 24

name: Plantronics .Audio 648 USB
 preferred channel count: 2
 preferred sample rate: 44100
 preferred sample size: 16

So this looks very encouraging. I'm going to try to cobble together some actual audio input and output next.

No comments: