Skip to content
Snippets Groups Projects
python-data-3-numpy.ipynb 41.8 KiB
Newer Older
ignat's avatar
ignat committed
   "metadata": {},
   "source": [
    "All is white right? let's add some black to it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas[50, 25] = 1\n",
    "plt.imshow(canvas, interpolation=\"none\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 6\n",
    "Use the canvas template above and create an image where the top right and bottom left pixels are set to black.\n",
    "\n",
    "*Note: Remember to first reset your canvas to only 0s*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
ignat's avatar
ignat committed
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 7\n",
    "Draw a horizontal and verticle line across the canvas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
ignat's avatar
ignat committed
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 8\n",
    "Make the top left corner of the image black.\n",
    "\n",
    "*Extra challenge: do this wihtout using numbers for indexing*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
ignat's avatar
ignat committed
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Universal Functions <a name=\"universal\"></a>\n",
    "or *ufunc* are functions that perform element-wise operations on data in ndarrays. You can think of them as fast vectorised wrappers for simply functions. Here is an example of `sqrt` and `exp`:"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.arange(10)\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sqrt(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.exp(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other universal functions take 2 arrays as input. These are called *binary* functions.\n",
    "\n",
    "For example `maximum()` selects the biggest values from two input arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.random.randn(10)\n",
    "y = np.random.randn(10)\n",
    "np.maximum(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other functions can return multiple arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.randn(10)\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "remainder, whole = np.modf(A)\n",
    "print(remainder)\n",
    "print(whole)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a list of most *ufuncs* in NumyPy:\n",
    "*again, you don't need to memorise them. This is just a reference*\n",
    "### Unary functions (accept one argument)\n",
    "\n",
    "| Function | Description |\n",
    "|----|----|\n",
    "| abs, fabs | Compute the absolute value element-wise for integer, floating point, or complex values.<br>Use fabs as a faster alternative for non-complex-valued data |\n",
    "| sqrt | Compute the square root of each element. Equivalent to arr ** 0.5 |\n",
    "| square | Compute the square of each element. Equivalent to arr ** 2 |\n",
    "| exp | Compute the exponent ex of each element |\n",
    "| log, log10, log2, log1p | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively |\n",
    "| sign | Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative) |\n",
    "| ceil | Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element |\n",
    "| floor | Compute the floor of each element, i.e. the largest integer less than or equal to each element |\n",
    "| rint | Round elements to the nearest integer, preserving the dtype |\n",
    "| modf | Return fractional and integral parts of array as separate array |\n",
    "| isnan | Return boolean array indicating whether each value is NaN (Not a Number) |\n",
    "| isfinite, isinf | Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively |\n",
    "| cos, cosh, sin, sinh, tan, tanh | Regular and hyperbolic trigonometric functions |\n",
    "|  arccos, arccosh, arcsin,<br>arcsinh, arctan, arctanh | Inverse trigonometric functions |\n",
    "| logical_not | Compute truth value of not x element-wise. Equivalent to -arr. |\n",
    "\n",
    "### Binary functions (accept 2 arguments)\n",
    "| Functions | Description |\n",
    "| ---- | ---- |\n",
    "| add | Add corresponding elements in arrays |\n",
    "| subtract | Subtract elements in second array from first array |\n",
    "| multiply | Multiply array elements |\n",
    "| divide, floor_divide | Divide or floor divide (truncating the remainder) |\n",
    "| power | Raise elements in first array to powers indicated in second array |\n",
    "| maximum, fmax | Element-wise maximum. fmax ignores NaN |\n",
    "| minimum, fmin | Element-wise minimum. fmin ignores NaN |\n",
    "| mod | Element-wise modulus (remainder of division) |\n",
    "| copysign | Copy sign of values in second argument to values in first argument |\n",
    "| greater, greater_equal, less,<br>less_equal, equal, not_equal |\tPerform element-wise comparison, yielding boolean array. <br>Equivalent to infix operators >, >=, <, <=, ==, != |\n",
    "| logical_and, logical_or, logical_xor | Compute element-wise truth value of logical operation. Equivalent to infix operators & &#124;, ^ |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other useful operations <a name=\"other\"></a>\n",
    "NumPy offers a set of mathematical functions that compute statistics about an entire array:"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B = np.random.randn(5, 4)\n",
    "B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.mean(B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.mean(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here `mean(axis=1)` means compute the mean across the columns (axis 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a set of other similar functions:\n",
    "\n",
    "| Function | Description|\n",
    "| --- | --- |\n",
    "|sum | Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0. |\n",
    "| mean | Arithmetic mean. Zero-length arrays have NaN mean. |\n",
    "| std, var | Standard deviation and variance, respectively, with optional<br>degrees of freedom adjustment (default denominator n). |\n",
    "|min, max | Minimum and maximum. |\n",
    "| argmin, argmax | Indices of minimum and maximum elements, respectively. |\n",
    "| cumsum | Cumulative sum of elements starting from 0 |\n",
    "| cumprod | Cumulative product of elements starting from 1 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There also are some boolean operations. `any` test where one or more values in an array is `True` and `all` test where all values are `True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.randn(100)\n",
    "A_bool = A > 0\n",
    "A_bool"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A_bool.any()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A_bool.all()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 9\n",
    "Generate and normalise a random 5x5x5 matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 10\n",
    "Create a random vector of size 30 and find its mean value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 11\n",
    "\n",
    "Subrtract the mean of each row of a randomly generated matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sorting <a name=\"sorting\"></a>\n",
    "Similar to Python's built-in list type, NumyPy arrays can be sorted in place:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.randn(10)\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.sort()\n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another option is `unique()` which returns the sorted unique values in an array."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear Algebra <a name=\"linear\"></a>\n",
    "Similar to other languages like MATLAB, NumyPy offers a set of standard linear algebra operations, like matrix multiplication, decompositions, determinants and etc.. Unlike some other languages though, the default operations like `*` peform element-wise operations. To perform matrix-wise operartions we need to use special functions:"
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temp = np.arange(16)\n",
    "A = temp[:8]\n",
    "B = temp[8:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.dot(B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also extend this with the `numpy.linalg` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy.linalg import inv, qr\n",
    "A = np.random.randn(5, 5)\n",
    "mat = A.T.dot(A)\n",
    "mat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inv(mat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mat.dot(inv(mat))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a set of commonly used numpy.linalg functions\n",
    "\n",
    "| Function | Description |\n",
    "| --- | --- |\n",
    "| diag | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array,<br>or convert a 1D array into a square matrix with zeros on the off-diagonal |\n",
    "| dot | Matrix multiplication |\n",
    "| trace | Compute the sum of the diagonal elements |\n",
    "| det | Compute the matrix determinant |\n",
    "| eig | Compute the eigenvalues and eigenvectors of a square matrix |\n",
    "| inv | Compute the inverse of a square matrix |\n",
    "| pinv | Compute the Moore-Penrose pseudo-inverse inverse of a square matrix |\n",
    "| qr | Compute the QR decomposition |\n",
    "| svd | Compute the singular value decomposition (SVD) |\n",
    "| solve | Solve the linear system Ax = b for x, where A is a square matrix |\n",
    "| lstsq | Compute the least-squares solution to y = Xb |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## File IO <a name=\"file\"></a>\n",
    "NumPy offers its own set of File IO functions.\n",
    "\n",
    "The most common one is `genfromtxt()` which can load the common `.csv` and `.tsv` files.\n",
    "\n",
    "Now let us analyse temperature data from Stockholm over the years.\n",
    "\n",
    "First we have to load the file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.genfromtxt(\"./data/stockholm_td_adj.dat\")\n",
ignat's avatar
ignat committed
    "data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first column of this array gives years, and the 6th gives temperature readings. We can extract these."
   ]
  },
ignat's avatar
ignat committed
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "yrs = data[:, 0]\n",
    "temps = data[:, 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are several ways to plot data using [matplotlib](https://matplotlib.org/index.html). Below, we have used `plt.subplots()` which returns a blank figure object and an axes object `ax`. \n",
ignat's avatar
ignat committed
    "\n",
    "We can plot two datasets against each other by assigning them to the axes unsing `ax.plot()`. Then we can specify settings with `ax.axis()`.\n",
    "\n",
    "Finally labels are set with `ax.set_`...\n",
    "\n",
    "We will cover plotting in more depth in notebook 6, so there's no need to get too caught up in the details right now. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(16,6)) #Create figure and axes\n",
    "ax.plot(yrs, temps) #Asign yrs, temps to axes\n",
    "ax.axis(\"normal\") ##Axis settings\n",
    "\n",
    "#Set some labels\n",
ignat's avatar
ignat committed
    "ax.set_title(\"Temperatures in Stockholm\")\n",
    "ax.set_xlabel(\"year\")\n",
    "ax.set_ylabel(\"Temperature (C)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 12\n",
    "Obtain the digonal of a dot product of 2 random matrices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 13\n",
    "Obtain the determinant of a [Cauchy matrix](https://en.wikipedia.org/wiki/Cauchy_matrix) from two arrays.\n",
    "\n",
    "Remember the cauchy formula:\n",
ignat's avatar
ignat committed
    "$$ C_{{ij}}={\\frac {1}{x_{i}-y_{j}}} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extra Exercise 1\n",
    "Remember the canvas exercises? Can you use slicing with a skip of 5 to get the image below?\n",
    "\n",
    "Remember how you can add a step to slicing in the Python built-in data types? Well, you can do that in NumyPy as well!\n",
    "\n",
    "\n",
    "![white rectangle with dots](http://softdev.ppls.ed.ac.uk/static/images/dots.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extra Exercise 2\n",
    "Consider a random vector with shape (100, 2) representing coordinates, find the maximum euclidian distance between 2 points.\n",
    "\n",
    "Recall the Euclidean distance formula: the distance from $(x_1, y_1)$ to $(x_2, y_2)$ is\n",
    "\n",
    "$$ \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$$."
ignat's avatar
ignat committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extra execise 3\n",
    "Generate a generic 2D Gaussian-like array, centered around 0,0\n",
    "\n",
    "You might want to recall the gaussian formula:\n",
    "\n",
    "$$ P(x) = \\frac{1}{\\sigma \\sqrt{2\\pi}} e^{\\frac{-(x - \\mu)^2}{2\\sigma^2}} $$\n",
    "\n",
    "You can use $$ \\mu = 0  \\quad  \\sigma = 1 $$\n",
    "\n",
    "*Hint: You can find the [np.meshgrid](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.meshgrid.html) function helpful here.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
ignat's avatar
ignat committed
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}