Skip to content
Snippets Groups Projects
python-data-1-warmup.ipynb 529 KiB
Newer Older
nbgitpuller's avatar
nbgitpuller committed
14001 14002 14003 14004 14005 14006 14007 14008 14009 14010 14011 14012 14013 14014 14015 14016 14017 14018 14019 14020 14021 14022 14023 14024 14025 14026 14027 14028 14029 14030 14031 14032 14033 14034 14035 14036 14037 14038 14039 14040 14041 14042 14043 14044 14045 14046 14047 14048 14049 14050 14051 14052 14053 14054 14055 14056 14057 14058 14059 14060 14061 14062 14063 14064 14065 14066 14067 14068 14069 14070 14071 14072 14073 14074 14075 14076 14077 14078 14079 14080 14081 14082 14083 14084 14085 14086 14087 14088 14089 14090 14091 14092 14093 14094 14095 14096 14097 14098 14099 14100 14101 14102 14103 14104 14105 14106 14107 14108 14109 14110 14111 14112 14113 14114 14115 14116 14117 14118 14119 14120 14121 14122 14123 14124 14125 14126 14127 14128 14129 14130 14131 14132 14133 14134 14135 14136 14137 14138 14139 14140 14141 14142 14143 14144 14145 14146 14147 14148 14149 14150 14151 14152 14153 14154 14155 14156 14157 14158 14159 14160 14161 14162 14163 14164 14165 14166 14167 14168 14169 14170 14171 14172 14173 14174 14175 14176 14177 14178 14179 14180 14181 14182 14183 14184 14185 14186 14187 14188 14189 14190 14191 14192 14193 14194 14195 14196 14197 14198 14199 14200 14201 14202 14203 14204 14205 14206 14207 14208 14209 14210 14211 14212 14213 14214 14215 14216 14217 14218 14219 14220 14221 14222 14223 14224 14225 14226 14227 14228 14229 14230 14231 14232 14233 14234 14235 14236 14237 14238 14239 14240 14241 14242 14243 14244 14245 14246 14247 14248 14249 14250 14251 14252 14253 14254 14255 14256 14257 14258 14259 14260 14261 14262 14263 14264 14265 14266 14267 14268 14269 14270 14271 14272 14273 14274 14275 14276 14277 14278 14279 14280 14281 14282 14283 14284 14285 14286 14287 14288 14289 14290 14291 14292 14293 14294 14295 14296 14297 14298 14299 14300 14301 14302 14303 14304 14305 14306 14307 14308 14309 14310 14311 14312 14313 14314 14315 14316 14317 14318 14319 14320 14321 14322 14323 14324 14325 14326 14327 14328 14329 14330 14331 14332 14333 14334 14335 14336 14337 14338 14339 14340 14341 14342 14343 14344 14345 14346 14347 14348 14349 14350 14351 14352 14353 14354 14355 14356 14357 14358 14359 14360 14361 14362 14363 14364 14365 14366 14367 14368 14369 14370 14371 14372 14373 14374 14375 14376 14377 14378 14379 14380 14381 14382 14383 14384 14385 14386 14387 14388 14389 14390 14391 14392 14393 14394 14395 14396 14397 14398 14399 14400 14401 14402 14403 14404 14405 14406 14407 14408 14409 14410 14411 14412 14413 14414 14415 14416 14417 14418 14419 14420 14421 14422 14423 14424 14425 14426 14427 14428 14429 14430 14431 14432 14433 14434 14435 14436 14437 14438 14439 14440 14441 14442 14443 14444 14445 14446 14447 14448 14449 14450 14451
      "     you in writing (or by e-mail) within 30 days of receipt that s/he\n",
      "\n",
      "     does not agree to the terms of the full Project Gutenberg-tm\n",
      "\n",
      "     License.  You must require such a user to return or\n",
      "\n",
      "     destroy all copies of the works possessed in a physical medium\n",
      "\n",
      "     and discontinue all use of and all access to other copies of\n",
      "\n",
      "     Project Gutenberg-tm works.\n",
      "\n",
      "\n",
      "\n",
      "- You provide, in accordance with paragraph 1.F.3, a full refund of any\n",
      "\n",
      "     money paid for a work or a replacement copy, if a defect in the\n",
      "\n",
      "     electronic work is discovered and reported to you within 90 days\n",
      "\n",
      "     of receipt of the work.\n",
      "\n",
      "\n",
      "\n",
      "- You comply with all other terms of this agreement for free\n",
      "\n",
      "     distribution of Project Gutenberg-tm works.\n",
      "\n",
      "\n",
      "\n",
      "1.E.9.  If you wish to charge a fee or distribute a Project Gutenberg-tm\n",
      "\n",
      "electronic work or group of works on different terms than are set\n",
      "\n",
      "forth in this agreement, you must obtain permission in writing from\n",
      "\n",
      "both the Project Gutenberg Literary Archive Foundation and Michael\n",
      "\n",
      "Hart, the owner of the Project Gutenberg-tm trademark.  Contact the\n",
      "\n",
      "Foundation as set forth in Section 3 below.\n",
      "\n",
      "\n",
      "\n",
      "1.F.\n",
      "\n",
      "\n",
      "\n",
      "1.F.1.  Project Gutenberg volunteers and employees expend considerable\n",
      "\n",
      "effort to identify, do copyright research on, transcribe and proofread\n",
      "\n",
      "public domain works in creating the Project Gutenberg-tm\n",
      "\n",
      "collection.  Despite these efforts, Project Gutenberg-tm electronic\n",
      "\n",
      "works, and the medium on which they may be stored, may contain\n",
      "\n",
      "\"Defects,\" such as, but not limited to, incomplete, inaccurate or\n",
      "\n",
      "corrupt data, transcription errors, a copyright or other intellectual\n",
      "\n",
      "property infringement, a defective or damaged disk or other medium, a\n",
      "\n",
      "computer virus, or computer codes that damage or cannot be read by\n",
      "\n",
      "your equipment.\n",
      "\n",
      "\n",
      "\n",
      "1.F.2.  LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the \"Right\n",
      "\n",
      "of Replacement or Refund\" described in paragraph 1.F.3, the Project\n",
      "\n",
      "Gutenberg Literary Archive Foundation, the owner of the Project\n",
      "\n",
      "Gutenberg-tm trademark, and any other party distributing a Project\n",
      "\n",
      "Gutenberg-tm electronic work under this agreement, disclaim all\n",
      "\n",
      "liability to you for damages, costs and expenses, including legal\n",
      "\n",
      "fees.  YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT\n",
      "\n",
      "LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE\n",
      "\n",
      "PROVIDED IN PARAGRAPH F3.  YOU AGREE THAT THE FOUNDATION, THE\n",
      "\n",
      "TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE\n",
      "\n",
      "LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR\n",
      "\n",
      "INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH\n",
      "\n",
      "DAMAGE.\n",
      "\n",
      "\n",
      "\n",
      "1.F.3.  LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a\n",
      "\n",
      "defect in this electronic work within 90 days of receiving it, you can\n",
      "\n",
      "receive a refund of the money (if any) you paid for it by sending a\n",
      "\n",
      "written explanation to the person you received the work from.  If you\n",
      "\n",
      "received the work on a physical medium, you must return the medium with\n",
      "\n",
      "your written explanation.  The person or entity that provided you with\n",
      "\n",
      "the defective work may elect to provide a replacement copy in lieu of a\n",
      "\n",
      "refund.  If you received the work electronically, the person or entity\n",
      "\n",
      "providing it to you may choose to give you a second opportunity to\n",
      "\n",
      "receive the work electronically in lieu of a refund.  If the second copy\n",
      "\n",
      "is also defective, you may demand a refund in writing without further\n",
      "\n",
      "opportunities to fix the problem.\n",
      "\n",
      "\n",
      "\n",
      "1.F.4.  Except for the limited right of replacement or refund set forth\n",
      "\n",
      "in paragraph 1.F.3, this work is provided to you 'AS-IS,' WITH NO OTHER\n",
      "\n",
      "WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO\n",
      "\n",
      "WARRANTIES OF MERCHANTIBILITY OR FITNESS FOR ANY PURPOSE.\n",
      "\n",
      "\n",
      "\n",
      "1.F.5.  Some states do not allow disclaimers of certain implied\n",
      "\n",
      "warranties or the exclusion or limitation of certain types of damages.\n",
      "\n",
      "If any disclaimer or limitation set forth in this agreement violates the\n",
      "\n",
      "law of the state applicable to this agreement, the agreement shall be\n",
      "\n",
      "interpreted to make the maximum disclaimer or limitation permitted by\n",
      "\n",
      "the applicable state law.  The invalidity or unenforceability of any\n",
      "\n",
      "provision of this agreement shall not void the remaining provisions.\n",
      "\n",
      "\n",
      "\n",
      "1.F.6.  INDEMNITY - You agree to indemnify and hold the Foundation, the\n",
      "\n",
      "trademark owner, any agent or employee of the Foundation, anyone\n",
      "\n",
      "providing copies of Project Gutenberg-tm electronic works in accordance\n",
      "\n",
      "with this agreement, and any volunteers associated with the production,\n",
      "\n",
      "promotion and distribution of Project Gutenberg-tm electronic works,\n",
      "\n",
      "harmless from all liability, costs and expenses, including legal fees,\n",
      "\n",
      "that arise directly or indirectly from any of the following which you do\n",
      "\n",
      "or cause to occur: (a) distribution of this or any Project Gutenberg-tm\n",
      "\n",
      "work, (b) alteration, modification, or additions or deletions to any\n",
      "\n",
      "Project Gutenberg-tm work, and (c) any Defect you cause.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "Section  2.  Information about the Mission of Project Gutenberg-tm\n",
      "\n",
      "\n",
      "\n",
      "Project Gutenberg-tm is synonymous with the free distribution of\n",
      "\n",
      "electronic works in formats readable by the widest variety of computers\n",
      "\n",
      "including obsolete, old, middle-aged and new computers.  It exists\n",
      "\n",
      "because of the efforts of hundreds of volunteers and donations from\n",
      "\n",
      "people in all walks of life.\n",
      "\n",
      "\n",
      "\n",
      "Volunteers and financial support to provide volunteers with the\n",
      "\n",
      "assistance they need, is critical to reaching Project Gutenberg-tm's\n",
      "\n",
      "goals and ensuring that the Project Gutenberg-tm collection will\n",
      "\n",
      "remain freely available for generations to come.  In 2001, the Project\n",
      "\n",
      "Gutenberg Literary Archive Foundation was created to provide a secure\n",
      "\n",
      "and permanent future for Project Gutenberg-tm and future generations.\n",
      "\n",
      "To learn more about the Project Gutenberg Literary Archive Foundation\n",
      "\n",
      "and how your efforts and donations can help, see Sections 3 and 4\n",
      "\n",
      "and the Foundation web page at http://www.pglaf.org.\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "Section 3.  Information about the Project Gutenberg Literary Archive\n",
      "\n",
      "Foundation\n",
      "\n",
      "\n",
      "\n",
      "The Project Gutenberg Literary Archive Foundation is a non profit\n",
      "\n",
      "501(c)(3) educational corporation organized under the laws of the\n",
      "\n",
      "state of Mississippi and granted tax exempt status by the Internal\n",
      "\n",
      "Revenue Service.  The Foundation's EIN or federal tax identification\n",
      "\n",
      "number is 64-6221541.  Its 501(c)(3) letter is posted at\n",
      "\n",
      "http://pglaf.org/fundraising.  Contributions to the Project Gutenberg\n",
      "\n",
      "Literary Archive Foundation are tax deductible to the full extent\n",
      "\n",
      "permitted by U.S. federal laws and your state's laws.\n",
      "\n",
      "\n",
      "\n",
      "The Foundation's principal office is located at 4557 Melan Dr. S.\n",
      "\n",
      "Fairbanks, AK, 99712., but its volunteers and employees are scattered\n",
      "\n",
      "throughout numerous locations.  Its business office is located at\n",
      "\n",
      "809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887, email\n",
      "\n",
      "business@pglaf.org.  Email contact links and up to date contact\n",
      "\n",
      "information can be found at the Foundation's web site and official\n",
      "\n",
      "page at http://pglaf.org\n",
      "\n",
      "\n",
      "\n",
      "For additional contact information:\n",
      "\n",
      "     Dr. Gregory B. Newby\n",
      "\n",
      "     Chief Executive and Director\n",
      "\n",
      "     gbnewby@pglaf.org\n",
      "\n",
      "\n",
      "\n",
      "Section 4.  Information about Donations to the Project Gutenberg\n",
      "\n",
      "Literary Archive Foundation\n",
      "\n",
      "\n",
      "\n",
      "Project Gutenberg-tm depends upon and cannot survive without wide\n",
      "\n",
      "spread public support and donations to carry out its mission of\n",
      "\n",
      "increasing the number of public domain and licensed works that can be\n",
      "\n",
      "freely distributed in machine readable form accessible by the widest\n",
      "\n",
      "array of equipment including outdated equipment.  Many small donations\n",
      "\n",
      "($1 to $5,000) are particularly important to maintaining tax exempt\n",
      "\n",
      "status with the IRS.\n",
      "\n",
      "\n",
      "\n",
      "The Foundation is committed to complying with the laws regulating\n",
      "\n",
      "charities and charitable donations in all 50 states of the United\n",
      "\n",
      "States.  Compliance requirements are not uniform and it takes a\n",
      "\n",
      "considerable effort, much paperwork and many fees to meet and keep up\n",
      "\n",
      "with these requirements.  We do not solicit donations in locations\n",
      "\n",
      "where we have not received written confirmation of compliance.  To\n",
      "\n",
      "SEND DONATIONS or determine the status of compliance for any\n",
      "\n",
      "particular state visit http://pglaf.org\n",
      "\n",
      "\n",
      "\n",
      "While we cannot and do not solicit contributions from states where we\n",
      "\n",
      "have not met the solicitation requirements, we know of no prohibition\n",
      "\n",
      "against accepting unsolicited donations from donors in such states who\n",
      "\n",
      "approach us with offers to donate.\n",
      "\n",
      "\n",
      "\n",
      "International donations are gratefully accepted, but we cannot make\n",
      "\n",
      "any statements concerning tax treatment of donations received from\n",
      "\n",
      "outside the United States.  U.S. laws alone swamp our small staff.\n",
      "\n",
      "\n",
      "\n",
      "Please check the Project Gutenberg Web pages for current donation\n",
      "\n",
      "methods and addresses.  Donations are accepted in a number of other\n",
      "\n",
      "ways including checks, online payments and credit card donations.\n",
      "\n",
      "To donate, please visit: http://pglaf.org/donate\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "Section 5.  General Information About Project Gutenberg-tm electronic\n",
      "\n",
      "works.\n",
      "\n",
      "\n",
      "\n",
      "Professor Michael S. Hart is the originator of the Project Gutenberg-tm\n",
      "\n",
      "concept of a library of electronic works that could be freely shared\n",
      "\n",
      "with anyone.  For thirty years, he produced and distributed Project\n",
      "\n",
      "Gutenberg-tm eBooks with only a loose network of volunteer support.\n",
      "\n",
      "\n",
      "\n",
      "Project Gutenberg-tm eBooks are often created from several printed\n",
      "\n",
      "editions, all of which are confirmed as Public Domain in the U.S.\n",
      "\n",
      "unless a copyright notice is included.  Thus, we do not necessarily\n",
      "\n",
      "keep eBooks in compliance with any particular paper edition.\n",
      "\n",
      "\n",
      "\n",
      "Each eBook is in a subdirectory of the same number as the eBook's\n",
      "\n",
      "eBook number, often in several formats including plain vanilla ASCII,\n",
      "\n",
      "compressed (zipped), HTML and others.\n",
      "\n",
      "\n",
      "\n",
      "Corrected EDITIONS of our eBooks replace the old file and take over\n",
      "\n",
      "the old filename and etext number.  The replaced older file is renamed.\n",
      "\n",
      "VERSIONS based on separate sources are treated as new eBooks receiving\n",
      "\n",
      "new filenames and etext numbers.\n",
      "\n",
      "\n",
      "\n",
      "Most people start at our Web site which has the main PG search facility:\n",
      "\n",
      "\n",
      "\n",
      "http://www.gutenberg.org\n",
      "\n",
      "\n",
      "\n",
      "This Web site includes information about Project Gutenberg-tm,\n",
      "\n",
      "including how to make donations to the Project Gutenberg Literary\n",
      "\n",
      "Archive Foundation, how to help produce our new eBooks, and how to\n",
      "\n",
      "subscribe to our email newsletter to hear about new eBooks.\n",
      "\n",
      "\n",
      "\n",
      "EBooks posted prior to November 2003, with eBook numbers BELOW #10000,\n",
      "\n",
      "are filed in directories based on their release date.  If you want to\n",
      "\n",
      "download any of these eBooks directly, rather than using the regular\n",
      "\n",
      "search system you may utilize the following addresses and just\n",
      "\n",
      "download by the etext year.\n",
      "\n",
      "\n",
      "\n",
      "http://www.ibiblio.org/gutenberg/etext06\n",
      "\n",
      "\n",
      "\n",
      "    (Or /etext 05, 04, 03, 02, 01, 00, 99,\n",
      "\n",
      "     98, 97, 96, 95, 94, 93, 92, 92, 91 or 90)\n",
      "\n",
      "\n",
      "\n",
      "EBooks posted since November 2003, with etext numbers OVER #10000, are\n",
      "\n",
      "filed in a different way.  The year of a release date is no longer part\n",
      "\n",
      "of the directory path.  The path is based on the etext number (which is\n",
      "\n",
      "identical to the filename).  The path to the file is made up of single\n",
      "\n",
      "digits corresponding to all but the last digit in the filename.  For\n",
      "\n",
      "example an eBook of filename 10234 would be found at:\n",
      "\n",
      "\n",
      "\n",
      "http://www.gutenberg.org/1/0/2/3/10234\n",
      "\n",
      "\n",
      "\n",
      "or filename 24689 would be found at:\n",
      "\n",
      "http://www.gutenberg.org/2/4/6/8/24689\n",
      "\n",
      "\n",
      "\n",
      "An alternative method of locating eBooks:\n",
      "\n",
      "http://www.gutenberg.org/GUTINDEX.ALL\n",
      "\n",
      "\n",
      "\n",
      "*** END: FULL LICENSE ***\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for line in file:\n",
    "    print(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, that was a lot of text. Can we turn it into something useful?\n",
    "\n",
    "For example, we can split up each line into the words making it and then count the occurances of the word \"and\". Here's code that does that. Try it out!"
nbgitpuller's avatar
nbgitpuller committed
   "execution_count": 3,
nbgitpuller's avatar
nbgitpuller committed
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The text contains 1922 'and' words\n"
     ]
    }
   ],
    "counter = 0\n",
    "file = open(\"data/humanistic_nursing.txt\")\n",
    "\n",
    "for line in file:\n",
    "    for word in line.split():\n",
    "        if word == \"and\":\n",
    "            counter += 1\n",
    "\n",
    "# display results\n",
    "print(\"The text contains {} 'and' words\".format(counter))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1: Count any word\n",
nbgitpuller's avatar
nbgitpuller committed
    "Based on the code above, now write your own code which counts the occurances of any word. Do this by using a variable `target` to represent the word we're counting.\n",
    "\n",
    "You should find the word `patient` 125 times.\n",
    "\n",
    "Try looking for some other words as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
   "metadata": {},
   "source": [
    "## Exercise 2: Find the longest word\n",
nbgitpuller's avatar
nbgitpuller committed
    "Find the word with the most characters in the whole text. Print it out and its length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Does that seem like a correct word?\n",
    "\n",
    "If not, you can try storing a dictionary of the largest words you find to identify which is truly the biggest word."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
nbgitpuller's avatar
nbgitpuller committed
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}