Skip to content
Snippets Groups Projects
Commit b774313d authored by pfoo's avatar pfoo
Browse files

Proofread and edit exercise notebook solution.

parent 99ce9af8
No related branches found
No related tags found
No related merge requests found
......@@ -16,11 +16,11 @@ To execute a line of code, click on it and press *Ctrl + Enter*.
To execute a chunk of code, click the green run button at the top right corner of the code chunk or highlight the entire code chunk and press *Ctrl + Enter*.
# 3.0 Introduction to built-in datasets
# 3.1 Introduction to built-in datasets
For this exercise, we will be using several different dataset. These dataset are built into R hence they can be used by simply calling the name of the dataset, without the need to load any packages. Run the following code chunk to get a quick overview of the dataset.
For this exercise, we will be using two different dataset. These dataset are built into R hence they can be used by simply calling the name of the dataset, without the need to load any packages. Run the following code chunk to get a quick overview of the dataset.
# 3.0.1 Iris dataset
# 3.1.1 Iris dataset
```{r}
# The dimension of the dataset
......@@ -40,7 +40,7 @@ summary(iris)
?iris
```
# 3.0.2 Pressure dataset
# 3.1.2 Pressure dataset
```{r}
# The dimension of the dataset
......@@ -60,17 +60,31 @@ summary(pressure)
?pressure
```
# 3.1 Segment canvas
# 3.2 Segment canvas
In R, it can be helpful to create multiple plots on a single canvas. This can be done by splitting the canvas into a specific grid size using the function `par()`, with the argument `mfrow` that takes a vector of 2 numeric values, i.e., `c(num1, num2)`. The first value in the vector represents the number of rows whereas the second value represents the number of columns.
\* `rnorm(1000, 0, 1)` is a function to generate 1000 random samples from the N(0, 1) distribution, with mean = 0 and standard deviation = 1. It is used for demonstration purposes in this section only.
```{r}
# One plot in the canvas
par(mfrow = c(1, 1))
hist(rnorm(1000, 0, 1))
# Segment the canvas to fit four plots
par(mfrow = c(2, 2))
hist(rnorm(1000, 0, 1))
hist(rnorm(1000, 0, 1))
hist(rnorm(1000, 0, 1))
hist(rnorm(1000, 0, 1))
# Reset the canvas
par(mfrow = c(1, 1))
```
# 3.2 Basic plots
# 3.3 Basic plots
## 3.2.1 Scatter plot
## 3.3.1 Scatter plot
The easiest plot to create in R is the scatter plot, using the `plot()` function, e.g., `plot(x_axis_var, y_axis_var)`.
......@@ -83,9 +97,9 @@ Try plotting a scatter plot for the `iris` dataset, using `Petal.Width` as the x
plot(iris$Petal.Width, iris$Petal.Length)
```
### 3.2.1.1 Plot title and labels
### 3.3.1.1 Plot title and labels
Notice that the scatter plot above lacks a descriptive plot title, and the axis labels are not easily understood by someone unfamiliar with the data.To enhance the plot's clarity, you can add the following arguments in the `plot()` function:
Notice that the scatter plot above lacks a descriptive plot title, and the axis labels are not easily understandable by someone unfamiliar with the dataset.To enhance the plot's clarity, you can add the following arguments in the `plot()` function:
- `main`: Adds a title to the plot.
- `xlab`: Adds an x-axis label to the plot.
......@@ -93,7 +107,7 @@ Notice that the scatter plot above lacks a descriptive plot title, and the axis
Complete the code by adding the title, x-axis label and y-axis label.
\* To specify an argument when calling the function, use `function(argument = argumentValue)`
\* To specify an argument when calling the function, use `function(..., argument = argumentValue)`
```{r}
# Complete the following code
......@@ -103,7 +117,7 @@ plot(iris$Petal.Width, iris$Petal.Length,
ylab = "Petal Length (cm)")
```
### 3.2.1.2 Best fit line
### 3.3.1.2 Best fit line
In scatter plot, a best-fit line is often used to provide an estimation. This line can be computed using the `lm()` function, which takes a formula as input. To compute the best-fit line, the code will looks like `lm(y_axis_var ~ x_axis_var)`, where the dependent variable is on the left and the independent variable is on the right. The tilde `~` character signifies the relationship between the variables.
......@@ -117,11 +131,11 @@ best_fit = lm(iris$Petal.Length ~ iris$Petal.Width)
abline(best_fit)
```
## 3.2.2 Line graph
## 3.3.2 Line graph
Line graph is a variant of the `plot()` function introduced previously. To plot a line graph in R, simply add the `type = "l"` argument when calling the `plot()` function, i.e., `plot(x_axis_var, y_axis_var, type ="l")`.
Try plotting a line graph for the `pressure` dataset, providing descriptive plot tile, x-axis label, and y-axis label.
Try plotting a line graph for the `pressure` dataset, providing descriptive plot title, x-axis label, and y-axis label.
\* Tip: The `plot()` function can generate a basic scatter/line plot when provided with a dataset containing exactly two columns. For example, `plot(dataset)` uses the first column as the x-axis and the second column as the y-axis.
......@@ -133,7 +147,7 @@ plot(pressure, type = "l",
ylab = "Pressure (mm)")
```
### 3.2.2.1 Multiple lines on a single plot
### 3.3.2.1 Multiple lines on a single plot
Sometimes it can be useful to plot multiple lines on the same plot for comparison purposes. Additional lines can be added to the plot using the `lines()` function. Note that this function only works after the `plot()` function has been called.
......@@ -148,9 +162,9 @@ pressure_new = 0.9 * pressure
lines(pressure_new)
```
### 3.2.2.2 Plot with colour and legend
### 3.3.2.2 Plot with colour and legend
After adding a line to the previous line graph, notice that both lines are plotted with the same colour and it is difficult to differentiate between them. Hence, it is a good practice to use colours in plots. This can be done through the `col` argument which is supported by most basic plot functions.
After adding a line to the previous line graph, notice that both lines are plotted with the same colour and it is difficult to differentiate the two lines. Hence, it is a good practice to use colours in plots. This can be done through the `col` argument which is supported by most basic plot functions.
Plot a line graph for the `pressure` dataset with the colour `blue`, and add a line for the `pressure_new` dataset in `red`.
......@@ -165,7 +179,7 @@ plot(pressure, type = "l", col = "blue",
lines(pressure_new, col = "red")
```
With different coloured line, it is obvious that they represents different data. However, it is not clear from the plot which line represents which data. To make the plot easier to understand, a legend can be added to the plot by calling the `lengend()` function. The required arguments are:
With different coloured line, it is obvious that they represents different dataset. However, it is not clear from the plot which line represents which dataset. To make the plot easier to understand, a legend can be added to the plot by calling the `lengend()` function. The required arguments are:
- `x`: Specifies the location of the legend. For simplicity, it is common to use the predefined location such as `topleft`, `bottomleft`, etc. (Run `?legend` in the console to find out more)
......@@ -184,9 +198,9 @@ legend("topleft",
fill = c("blue", "red"))
```
## 3.2.3 Bar chart
## 3.3.3 Bar chart
Bar chart is often used to visualise a frequency table, plotted in R with the `barplot()` function. Since both the `iris` and `pressure` dataset are not a frequency table, this section uses the `table()` function to create a frequency table from a column of the `iris` dataset for demonstration purposes.
Bar charts are often used to visualise a frequency table, plotted in R using the `barplot()` function. Since both the `iris` and `pressure` dataset are not a frequency table, this section uses the `table()` function to create a frequency table from a column of the `iris` dataset for demonstration purposes.
\* Frequency table: A table with the count of each unique values in the dataset.
......@@ -207,9 +221,9 @@ barplot(table(iris$Petal.Width),
ylab = "Frequency")
```
## 3.2.4 Histogram
## 3.3.4 Histogram
Histogram is often used to observe the trend in a dataset, plotted in R with the `hist()` function. Note that this function can only be applied to a column in a table. Create a histogram of the `Sepal.Width` column in the `iris` dataset with descriptive plot tile, x-axis label, and y-axis label.
Histogram is often used to observe the trend in a dataset, plotted in R using the `hist()` function. Note that this function can only be applied to a column in a table. Create a histogram of the `Sepal.Width` column in the `iris` dataset with descriptive plot tile, x-axis label, and y-axis label.
```{r}
# Write your code below
......@@ -219,7 +233,7 @@ hist(iris$Sepal.Width,
ylab = "Frequency")
```
## 3.2.5 Box plot
## 3.3.5 Box plot
Boxplot is often used to visualise the statistical information of a dataset, showing:
......@@ -235,7 +249,7 @@ Boxplot is often used to visualise the statistical information of a dataset, sho
- outliers
It is plotted in R with the `boxplot()` function and unlike `hist()`, `boxplot()` can be applied to a table with multiple columns. Produce a boxplot of the `iris` dataset with descriptive plot title.
It is plotted in R using the `boxplot()` function. Unlike `hist()`, `boxplot()` can be applied to a table with multiple columns. Produce a boxplot of the `iris` dataset with descriptive plot title.
```{r}
# Write your code below
......@@ -249,15 +263,15 @@ boxplot(iris$Sepal.Width, horizontal = TRUE,
main = "Boxplot of Iris' Sepal Width")
```
# 3.3 Customisation
# 3.4 Customisation
R provides various built-in arguments to customise a plot. This section will introduce some of the commonly used customisation. The full list of arguments for plot customisation can be found in the documentation (run the code chunk below):
R provides various built-in arguments to customise a plot. This section will introduce some of the commonly used arguments for such purpose. The full list of arguments for plot customisation can be found in the documentation (run the code chunk below):
```{r}
?par
```
## 3.3.1 Types of points
## 3.4.1 Types of points
The plot point's style and size can be customised with the `pch` and `cex` arguments, respectively. The `pch` argument has a list of pre-defined style represented by integers. Run the code chunk below to check the pre-defined point style in R.
......@@ -265,7 +279,7 @@ The plot point's style and size can be customised with the `pch` and `cex` argum
?pch
```
The `cex` argument control the size of the point with respect to 1. Hence, a value larger than 1 enlarge the plot point, while a value smaller than 1 minimise the plot point.
The `cex` argument controls the size of the point with respect to 1. Hence, a value larger than 1 enlarges the plot point, while a value smaller than 1 minimises the plot point.
Try to apply different combination of `pch` and `cex` to the plot below and see how the plot point changes.
......@@ -274,25 +288,26 @@ Try to apply different combination of `pch` and `cex` to the plot below and see
plot(pressure, pch = 4, cex = 0.9)
```
## 3.3.2 Types of lines
## 3.4.2 Types of lines
The plot line's style and width can be customised with the `lty` and `lwd` arguments, respectively. Similar to points, R has a list of pre-defined line's style represented by integers. A description of these styles can be found in the documentation by running `?par`. The `lwd` work as per `cex` to control the width of the line with respect to `. A value larger than 1 results in thicker line, while a value smaller than 1 results in thinner line.
The plot line's style and width can be customised with the `lty` and `lwd` arguments, respectively. Similar to points, R has a list of pre-defined line's style represented by integers. A description of these styles can be found in the documentation by running `?par`. The `lwd` works as per `cex` to control the width of the line with respect to 1. A value larger than 1 results in thicker line, while a value smaller than 1 results in thinner line.
Try to apply different combination of `lty` and `lwd` to the plot below and see how the line changes.
```{r}
# Complete the code below
plot(pressure, type = "l", lty = 4, lwd = 2)
```
## 3.3.3 Axis limits
## 3.4.3 Axis limits
Sometimes, it can be helpful to shorten the axis for better visualisation. This can be achieved by specifying the axis' range using the `xlim` and `ylim` argument. The code below demonstrates how to specify the x-axis' range.
Sometimes, it can be helpful to shorten the axis' range for a focused view. This can be achieved by specifying the axis' range using the `xlim` and `ylim` argument. The code below demonstrates how to specify the x-axis' range.
```{r}
plot(pressure, type = "l", xlim = c(250, 350))
```
Using a similar approach, use the `ylim` argument to set the y-axis range from 200 to 400.
Using a similar approach, set the y-axis range from 200 to 400 using the `ylim` argument.
```{r}
# Complete the code below by adding the ylim argument
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment