python-data-1-warmup.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook 1 - Warm-up Exercises\n",
    "\n",
    "In this notebook we will warm up with a textual analysis exercise, using some of the assumed basic python knowledge for the course. We will see how to use some basic string methods as well as how to open and close files in python (later, some of these methods will be superceded by inbuild methods of data science packages we will use).\n",
    "\n",
    "For this we will be using the text [Humanistic Nursing by Josephine G. Paterson and Loretta T. Zderad](http://www.gutenberg.org/ebooks/25020). You already have this downloaded in your workspace.\n",
    "\n",
    "To open up the file, Python gives us a very handy function, we just have to give it the path to the file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file = open(\"data/humanistic_nursing.txt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An easy way to deal with text files is reading it line by line within a for loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for line in file:\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, that was a lot of text. Can we turn it into something useful?\n",
    "\n",
    "For example, we can split up each line into the words making it and then count the occurances of the word \"and\". Here's code that does that. Try it out!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "counter = 0\n",
    "file = open(\"data/humanistic_nursing.txt\")\n",
    "\n",
    "for line in file:\n",
    "    for word in line.split():\n",
    "        if word == \"and\":\n",
    "            counter += 1\n",
    "\n",
    "# display results\n",
    "print(\"The text contains {} 'and' words\".format(counter))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: Count any word\n",
    "Based on the code above, now write your own code which counts the occurances of any word. Do this by using a variable `word`.\n",
    "\n",
    "You should find the word `patient` 125 times.\n",
    "\n",
    "Try looking for some other words as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "target = \"patient\"\n",
    "\n",
    "# fill in your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2: Find the longest word\n",
    "Find the word with the most charecters in the whole text. Print it out and its length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# fill in your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Does that seem like a correct word?\n",
    "\n",
    "If not, you can try storing a dictionary of the largest words you find to identify which is truly the biggest word."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}