python-data-extra-scipy.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Signal processing using Scipy\n",
    "\n",
    "In this notebook we will explore how the [scipy](https://scipy.org/scipylib/) package can be used to perform signal processing.\n",
    "\n",
    "Scipy stands for *sci*entific *py*thon, and is a package with a vast range of modules for a range of scientific computing problems. From linear algebra to optimization; from integration to interpolation: it is the go-to Python package for traditional scientific computing problems. Scipy is built from numpy's `ndarray`, and scipy interfaces with `matlotlib` for visuals. So when working with scipy it is essential we also have numpy and matplotlib imported too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scipy is a large package with many submodules. It is standard to just import the modules we will be working with on a givgen task. In this notebook, we'll be using the module `signal`, which give access to methods related to signal processing. We will also need some helper functions for reading in signals, from the `scipy.io` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import signal\n",
    "from scipy.io import wavfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also run the following code to tell Python we don't want it to display warnings when we run code. This is because the functions we will use from `scipy.io` often produce spurious warnings which are not of concern to us here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with signals\n",
    "A signal is a series of measurements over time, related to some phenmomenon. It could be the light emitted from a chemical reaction, sound transmitted through the air, or the heat of an organism throughout some biological process. When we work with signals in python, we are considering digital signals: samples of a signal taken at discrete time-steps. Today we will be working with a familiar signal: we'll be working with audio signals (i.e. those coming from sound waves). \n",
    "\n",
    "### Reading in a wav audio file\n",
    "Let's see how we can represent an audio signal in python. We are going to use the `wavfile.read` helper to read in a recording of a guitar chord (you can listen to it by clicking on the file in the Jupyter folder)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "chord = wavfile.read(\"data/chord-11.wav\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function returns a tuple. The first element is the sample rate, and the second is the actual data from the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "chord"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plotting the data from the wav with `plt` will give the charactersitic plot of the sound wave that we're used to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(chord[1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To check we're doing things correctly, let's write that data back to another file. By opening the original file and the file we've written, you should convince yourself that the numpy array created by `wavfile.read` really does represent the audio signal of the wav file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "wavfile.write(\"data/test.wav\", chord[0], chord[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Converting to Mono\n",
    "Many sound signals are multi-channel, and often we store audio in stereo. In fact, the audio signal `chord` is in stereo. The audio data is a list of pairs. Each pair represents a point in time: the first value is the data from the L channel, and the secon is the data from the R channel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chord[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For simplicity, we will work only with mono signals for the remainder of this notebook. Your first task is to write a function `to_mono`, which takes a tuple representing a sound signal, and returns another sound signal by averaging the data fro each time-step. The data part of your sound signal should be an array of `np.int16` data.\n",
    "\n",
    "_(Hint: remember that a signal is represented as a tuple: so your function must return a tuple including the sample frequency!)_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, replace `chord` with its mono version. Plot this signal, and write it to a wav file. Does it sound similar to the file you started with?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Power Spectral Density\n",
    "The [Power Spectral Density](https://en.wikipedia.org/wiki/Spectral_density) (psd) is a way of estimating how a signal breaks into its component parts. With audio signals, it tells us how we could have built up our original signal by combining certain elementary (sin and cosine) sound waves. It's a bit like trying to \"unmix\" coloured paint to figure out how much red, blue and yellow went into a given colour. The process is quite complicated, so for the purposes of this notebook we will just understand it at this high level: given a signal, the PSD gives us an estimate of its decomposition into elementary waves, which is like a kind of fingerprint unique to that signal. The PSD is related to the Fourier transform, which you may have heard of - in fact, we can use the Fourier transform to estimate PSD.\n",
    "\n",
    "In `scipy.signal`, the `periodogram` method computes the PSD of a signal. We need to supply the signal data, as well as the sample rate. Then `signal.periodogram` returns an array of frequencies, and a corresponding array representing the Power Spectral Density: how much of each pure frequency went into the original signal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "psd = signal.periodogram(chord[1], fs = chord[0])\n",
    "psd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(psd[0], psd[1])\n",
    "plt.xlabel(\"Frequency (Hz)\")\n",
    "plt.title(\"Power Spectral Density\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparing Yes and No\n",
    "Suppose we want to classify between recordings of the word \"yes\", and the word \"no\". Of course, speech recognition is a huge problem and we would need much more than just the PSD to tackle it in general. But if we know in advance that a signal will be saying \"yes\" or \"no\" (for example suppose we were creating an automatic phone system for a big company, that asks the caller a question and routes them according to their answer). Then it turns out that the PSD is in fact enough to differentiate between them.\n",
    "\n",
    "Let's see this in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Obtain mono versions of the recordings\n",
    "yes = to_mono(wavfile.read(\"data/yes_male.wav\"))\n",
    "no = to_mono(wavfile.read(\"data/no_male.wav\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get psds\n",
    "yes_psd = signal.periodogram(yes[1], fs=yes[0])\n",
    "no_psd = signal.periodogram(no[1], fs=no[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# PSD for yes\n",
    "plt.plot(yes_psd[0], yes_psd[1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# PSD for no\n",
    "plt.plot(no_psd[0], no_psd[1])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the signal for \"no\" does not contain as much high-frequency as the signal for \"yes\". We will try to create a classifier based on this feature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Developing a Feature\n",
    "We will now develop a feature based on the observation above. Given a signal, a feature is a number we can compute whose value is significantly different based on whether a recording is a \"yes\" or a \"no\". We will then use such a feature to classify our recordings.\n",
    "\n",
    "One possibility is to add up the PSD for low frequencies, and divide it by the PSD for high frequencies. Then a recording of the word \"no\" should have a higher value than a recording of the word \"yes\". This feature is not sensitive to the length of the recording, and neither is it affected by different volume leves since it is a ratio.\n",
    "\n",
    "The easiest way to do this is to choose a hard cutoff between what we regard as high vs low frequency. Then to compute our feature, we just add up the PSD for all frequencies below/above the cutoff, and take their ratio. Looking at the plots above, we can try placing the line between low and high frequency at 5000 Hz.\n",
    "\n",
    "Write a function `get_feature` which accepts the PSD of a signal, and returns the feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we need to try our function on some known recordings, to find a threshold value for the feature which distinguishes \"yes\" from \"no\". In `data/yes` and `data/no` are different recordings of the words yes and no: these recordings are our training data. The paths of these files are tored in the csv `data/scipy_training_files.csv`.\n",
    "\n",
    "Use numpy to obtain two lists, `yes_paths` and `no_paths`, from the file `scipy_training_files.csv`. The lists should contain the file paths of the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we need to choose a threshold value, below which the feature indicates a \"yes\" and above which it indicates a \"no\". We will decide a threshold value by eye in this exercise.\n",
    "\n",
    "You should write code to:\n",
    "- Read in all of the files with filepaths given in `yes_paths`\n",
    "- Compute the feature value for each recording\n",
    "- Repeat the above for the `no_paths` folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now use `plt` to produce an overlaid plot of two histograms - one counting the feature values for \"yes\" recordings, the other counting the feature values for \"no\" recordings. To make things easier to interpret, you should use the same bins for each histogram - at first, set the bins to be of size 0.01 on the range from 0 to 3. Passing a list of the endpoints of the bins to `plt.hist` will control the size of the bins used.\n",
    "\n",
    "Your plot should have a legend, and will be easier to read if you set the parameter `alpha=0.5` in each call to `plt.hist` (this makes the plots appear slightly transparent)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, there are some \"Yes\" outliers at the upper end of the scale, making it difficult to see what's happening at the low end. We can copy the above code and change the range of our binning to restrict the plot to the low end of the scale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Having inspected the above data, determine what you believe to be a good threshold value for this calculation. To choose an exact value, you may wish to inspect the values given in `y_feats` and `n_feats`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing Your Classification\n",
    "Write a function `classify_yesno` which takes in a sound (a bitrate/data tuple), and classifies whether this is the word \"yes\" or the word \"no\". You should return the string `\"Y\"` for yes, and `\"N\"` for no."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now run the code cell below. It will test your classifier against test data, and count True Positives (how many \"yes\" recordings were correctly classified as such), True Negatives (how many \"no\" recordings were properly classified), Fale Positives ( where \"no\" was classified as \"yes\"), and False Negatives (where \"yes\" was classified as \"no\")."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TP = 0\n",
    "FP = 0\n",
    "TN = 0\n",
    "FN = 0\n",
    "\n",
    "paths = np.genfromtxt(\"data/scipy_test_files.csv\", dtype=np.str, delimiter=\",\")\n",
    "yes_tests = paths[0]\n",
    "no_tests = paths[1]\n",
    "\n",
    "for s in yes_tests:\n",
    "    sound = to_mono(wavfile.read(s))\n",
    "    if classify_yesno(sound) == \"Y\":\n",
    "        TP += 1\n",
    "    else:\n",
    "        FN += 1\n",
    "        \n",
    "for s in no_tests:\n",
    "    sound = to_mono(wavfile.read(s))\n",
    "    if classify_yesno(sound) == \"N\":\n",
    "        TN += 1\n",
    "    else:\n",
    "        FP += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This data can be summarised in a confusion matrix. In this matrix, the (0, 0) entry of the confusion matrix shows True Positives, and the (1, 1) entry shows True Negatives. Since we tested our classifier on 8 samples of each type, then the closer our matrix looks to $$\\begin{bmatrix}8 & 0\\\\ 0 & 8 \\end{bmatrix}$$ the better our classifier is performing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "confusion_matrix = np.reshape(np.array([TP, FP, FN, TN]), (2, 2))\n",
    "\n",
    "confusion_matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could also compute a summary statistic of the classifier's performance. One such statistic is the accuracy: that is the proportion of the recordings correctly classified. it is calculated as $$\\frac{TP + TN}{TP + TN + FP + FN}$$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "acc = (TP + TN)/(TP + FP + TN + FN)\n",
    "\n",
    "acc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In reality, we'd likely train and test on much more data. We would also train the classifier using some form of predictive model, rather than by eyeballing the data. Packages such as [scikit-learn](https://scikit-learn.org/stable/index.html) have easy-to-use inbuilt functions for building and testing classifiers (in fact, sklearn has its own project notebook as part of this course)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "This notebook was based on a project from \"Enhance your DSP Course with these Interesting Projects\" by D. Hoffbeck, in proceedings AC 2012-3836, the American Society for Engineering Education, 2012. \n",
    "\n",
    "Training files were obtained from freesound. Licensing info and accreditation is given in scipy_training_licensing.txt.\n",
    "\n",
    "Testing files were recorded by the author."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}