Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
"\n",
"In this notebook we will warm up with a textual analysis exercise, using some of the assumed basic python knowledge for the course. We will see how to use some basic string methods as well as how to open and close files in python (later, some of these methods will be superceded by inbuild methods of data science packages we will use).\n",
"\n",
"For this we will be using the text [Humanistic Nursing by Josephine G. Paterson and Loretta T. Zderad](http://www.gutenberg.org/ebooks/25020). You already have this downloaded in your workspace.\n",
"\n",
"To open up the file, Python gives us a very handy function, we just have to give it the path to the file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file = open(\"data/humanistic_nursing.txt\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An easy way to deal with text files is reading it line by line within a for loop:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"for line in file:\n",
" print(line)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, that was a lot of text. Can we turn it into something useful?\n",
"\n",
"For example, we can split up each line into the words making it and then count the occurances of the word \"and\". Here's a function that does that. Try it out!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# define functino\n",
"def countAnd(file_path):\n",
" counter = 0\n",
" file = open(file_path)\n",
" \n",
" for line in file:\n",
" for word in line.split():\n",
" if word == \"and\":\n",
" counter += 1\n",
" \n",
" return counter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# try function\n",
"countAnd(\"./data/humanistic_nursing.txt\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1: Count any word\n",
"Based on the function above, now write your own function which counts the occurances of any word. For example:\n",
"```python\n",
"countAny(filename, \"medicine\")\n",
"```\n",
"will return the occurences of the word \"medicine\" in the file filename."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def countAny(file_path, des_word):\n",
" # [ WRITE YOUR CODE HERE ]\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify your function\n",
"countAny(\"./data/humanistic_nursing.txt\", \"patient\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2: Count multiple words\n",
"Before this exercise you should be familiar with Python dictionaries. If you're not, please see [here](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).\n",
"\n",
"Write a function which takes a file path and a list of words, and returns a dictionary mapping each word to its frequency in the given file.\n",
"\n",
"Intuitively, we can first fill in the dictionary keys with the words in our list. Afterwards we can count the occurrences of each word and and fill in the appropriate dictionary value.\n",
"\n",
"*Hint: Can we use `countAny()` for this?*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def countAll(file_path, words):\n",
" # [ WRITE YOUR CODE HERE ]\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify your function\n",
"countAll(\"./data/humanistic_nursing.txt\", [\"patient\", \"and\", \"the\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should expect `{'patient': 125, 'and': 1922, 'the': 2604}`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}