Skip to content
Snippets Groups Projects
python-data-3-numpy.ipynb 41.6 KiB
Newer Older
ignat's avatar
ignat committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook 3 - NumPy\n",
    "[NumPy](http://numpy.org) short for Numerical Python, has long been a cornerstone of numerical computing on Python. It provides the data structures, algorithms and the glue needed for most scientific applications involving numerical data in Python. All computation is done in vectorised form - using vectors of several values at once instead of singular values at a time. NumPy contains, among other thigs:\n",
ignat's avatar
ignat committed
    "* A fast and efficient multidimensional array object `ndarray`.\n",
    "* Mathematical functions for performing element-wise computations with arrays or mathematical operations between arrays.\n",
    "* Tools for reading and manipulating large array data to disk and working with memory-mapped files.\n",
    "* Linear algebra, random number generation and Fourier transform capabilities.\n",
    "\n",
    "For the rest of the course, whenever array is mentioned it refers to the NumPy ndarray.\n",
    "<br>\n",
    "\n",
    "## Table of contents\n",
    "- [The ndarray](#ndarray)\n",
    "    - [Creating arrays](#creating)\n",
    "    - [Data Types](#data)\n",
    "    - [Arithmetic Operations](#arithmetic)\n",
    "    - [Indexing and Slicing](#indexing)\n",
    "    - [Transposing and Swapping Axis](#transposing)\n",
ignat's avatar
ignat committed
    "- [Universal Functinos](#universal)\n",
    "- [Other useful operations](#other)\n",
    "- [File IO](#file)\n",
    "- [Liear algebra](#linear)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Why NumPy?\n",
    "Is the first question that anybody asks when they find out about it. \n",
    "\n",
    "Some people might say: *I don't care about speed, I want to spend my time researching how to cure cancer, not optimise coputer code!*\n",
    "\n",
    "That's perfectly reasonable, but are you willing to wait a lot more for your experiment to finish? I definiately don't want to do that. Let's see how much faster NumPy really is!\n",
    "\n",
    "to show that we'll be using the magic command `%timeit` which you can read more about [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html) and don't worry about the details now, they will clear up later.\n",
    "\n",
    "Let's have a look at generating a vector of 10M random values and then summing them all up using the Python way and using the NumPy way!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running normal python sum()\n",
      "838 ms ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
      "Running numpy sum()\n",
      "7.89 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "x = np.random.randn(10000000) # generate random numbers\n",
    "\n",
    "print(\"Running normal python sum()\")\n",
    "%timeit sum(x)\n",
    "\n",
    "print(\"Running numpy sum()\")\n",
    "%timeit np.sum(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**WOW** that was a difference of more than a **100 times** and that was just for a single summing operation. Imagine if you had several of those running all the time!\n",
    "\n",
    "Are you onboard with Numpy then? Let's proceed.."
   ]
  },
ignat's avatar
ignat committed
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The ndarray <a name=\"ndarray\"></a>\n",
    "The ndarray is a backbone on Numpy. It's a fast and flexible container for N-dimensional array objects, usually used for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.\n",
ignat's avatar
ignat committed
    "\n",
    "Here is a quick example of its capabilities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
ignat's avatar
ignat committed
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.66256056,  0.42579375,  0.59860182],\n",
       "       [-0.89591925, -0.4932093 , -0.3728094 ]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
ignat's avatar
ignat committed
   "source": [
    "import numpy as np\n",
    "\n",
    "# create a 2x3 array of random values\n",
    "data = np.random.randn(2,3)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
ignat's avatar
ignat committed
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 6.62560562,  4.25793747,  5.98601823],\n",
       "       [-8.95919252, -4.93209302, -3.728094  ]])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
ignat's avatar
ignat committed
   "source": [
    "data * 10 #multiply all numbers by 10"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
ignat's avatar
ignat committed
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.32512112,  0.85158749,  1.19720365],\n",
       "       [-1.7918385 , -0.9864186 , -0.7456188 ]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
ignat's avatar
ignat committed
   "source": [
    "data + data #element-wise addition"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Every array has a shape, a tuple indicating the size of each dimnesion and a dtype. You can obtain these via the respective methods:"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
ignat's avatar
ignat committed
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
ignat's avatar
ignat committed
   "source": [
    "# number of dimensions of the array\n",
    "data.ndim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
ignat's avatar
ignat committed
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 3)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
Loading
Loading full blame...