From 630e4b017fdf768596e9469a30d8d228cdc143a0 Mon Sep 17 00:00:00 2001
From: pfoo <pfoo@ed.ac.uk>
Date: Wed, 19 Mar 2025 15:33:40 +0000
Subject: [PATCH] Upload New File: notebook solution

---
 RBeginnersExercise3_Sol.Rmd | 200 ++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 RBeginnersExercise3_Sol.Rmd

diff --git a/RBeginnersExercise3_Sol.Rmd b/RBeginnersExercise3_Sol.Rmd
new file mode 100644
index 0000000..530a9af
--- /dev/null
+++ b/RBeginnersExercise3_Sol.Rmd
@@ -0,0 +1,200 @@
+---
+title: "R Beginners Exercise 3: Data Visualisation"
+output:
+  word_document: default
+  html_document: 
+    toc: true
+editor_options:
+  chunk_output_type: console
+---
+
+# Introduction
+
+Welcome to R for Beginners Exercise 3! This notebook contains the exercises for the lesson that we will be looking at during exercise breaks throughout the course as well as being a work space for you to use during the session!
+
+To execute a line of code, click on it and press *Ctrl + Enter*.
+
+To execute a chunk of code, click the green run button at the top right corner of the code chunk or highlight the entire code chunk and press *Ctrl + Enter*.
+
+# 3.0 Introduction to built-in datasets
+
+For this exercise, we will be using several different dataset. These dataset are built into R hence they can be used by simply calling the name of the dataset, without the need to load any packages. Run the following code chunk to get a quick overview of the dataset.
+
+# 3.0.1 Iris dataset
+
+```{r}
+# The dimension of the dataset
+sprintf("The iris dataset contains %.0f rows and %.0f columns. ", 
+        dim(iris)[1], dim(iris)[2])
+
+# The columns' name and datatype
+sapply(iris, class) 
+
+# Check for incomplete case
+iris[!complete.cases(iris),]
+
+# A summary of each column
+summary(iris) 
+
+# View more information of iris dataset in the documentation
+?iris
+```
+
+# 3.0.2 Pressure dataset
+
+```{r}
+# The dimension of the dataset
+sprintf("The pressure dataset contains %.0f rows and %.0f columns. ", 
+        dim(pressure)[1], dim(pressure)[2])
+
+# The columns' name and datatype
+sapply(pressure, class)
+
+# Check for incomplete case
+pressure[!complete.cases(pressure),] 
+
+# A summary of each column
+summary(pressure) 
+
+# View more information of pressure dataset in the documentation
+?pressure
+```
+
+# 3.1 Segment canvas
+
+In R, it can be helpful to create multiple plots on a single canvas. This can be done by splitting the canvas into a specific grid size using the function `par()`, with the argument `mfrow` that takes a vector of 2 numeric values, i.e., `c(num1, num2)`. The first value in the vector represents the number of rows whereas the second value represents the number of columns.
+
+```{r}
+par(mfrow = c(1, 1))
+```
+
+# 3.2 Basic plots
+
+## 3.2.1 The scatter plot
+
+The easiest plot to create in R is the scatter plot, using the `plot()` function, e.g., `plot(x_axis_var, y_axis_var)`.
+
+Try plotting a scatter plot for the `iris` dataset, using `Petal.Width` as the x-axis and `Petal.Length` as the y-axis.
+
+\* Reminder: The syntax to select a column of a dataset is `dataset$columnName`.
+
+```{r}
+# Write your code below
+plot(iris$Petal.Width, iris$Petal.Length)
+```
+
+### 3.2.1.1 Plot title and labels
+
+Notice that the scatter plot above lacks a descriptive plot title, and the axis labels are not easily understood by someone unfamiliar with the data.To enhance the plot's clarity, you can add the following arguments in the `plot()` function:
+
+-   `main`: Adds a title to the plot.
+-   `xlab`: Adds an x-axis label to the plot.
+-   `ylab`: Adds a y-axis label to the plot.
+
+Complete the code by adding the title, x-axis label and y-axis label.
+
+```{r}
+# Complete the following code
+plot(iris$Petal.Width, iris$Petal.Length, 
+     main = "Iris Petal Length against Width",
+     xlab = "Petal Width (cm)", 
+     ylab = "Petal Length (cm)")
+```
+
+### 3.2.1.2 Best fit line
+
+In scatter plot, a best-fit line is often used to provide an estimation. This line can be computed using the `lm()` function, which takes a formula as input. To compute the best-fit line, the code will looks like `lm(y_axis_var ~ x_axis_var)`, where the dependent variable is on the left and the independent variable is on the right. The tilde `~` character signifies the relationship between the variables.
+
+The best-fit line can be added directly to the same scatter plot by calling the `abline()` function, e.g., `abiline(best_fit_line)`.
+
+Write the code to add a best fit line to the scatter plot plotted in the previous section.
+
+```{r}
+# Write your code below
+best_fit = lm(iris$Petal.Length ~ iris$Petal.Width)
+abline(best_fit)
+```
+
+## 3.2.2 Line graph
+
+Line graph is a variant of the `plot()` function introduced previously. To plot a line graph in R, simply add the `type = "l"` argument when calling the `plot()` function, i.e., `plot(x_axis_var, y_axis_var, type ="l")`.
+
+Try plotting a line graph for the `pressure` dataset, providing descriptive plot tile, x-axis label, and y-axis label.
+
+\* Tip: The `plot()` function can generate a basic scatter/line plot when provided with a dataset containing exactly two columns. For example, `plot(dataset)` uses the first column as the x-axis and the second column as the y-axis.
+
+```{r}
+# Write your code below
+plot(pressure, type = "l", main = "Vapour Pressure of Mercury", 
+     xlab = "Temperature (degree C)", ylab = "Pressure (mm)")
+```
+
+### 3.2.2.1 Multiple lines on a single plot
+
+Sometimes it can be useful to plot multiple lines on the same plot for comparison purposes. Additional lines can be added to the plot using the `lines()` function. Note that this function only works after the `plot()` function has been called.
+
+For demonstration purposes, the following code create a dataset named `pressure_new` derived from the existing `pressure` dataset. Using the `lines()` function, plot a line of this derived dataset on the line graph plotted previously.
+
+```{r}
+# Create the derived dataset
+pressure_new = 0.9 * pressure
+
+# Plot the line on top of the line graph
+# Write your code below
+lines(pressure_new)
+```
+
+### 3.2.2.2 Plot with colour and legend
+
+After adding a line to the previous line graph, notice that both lines are plotted with the same colour and it is difficult to differentiate between them. Hence, it is a good practice to use colours in plots. This can be done through the `col` argument which is supported by most basic plot functions.
+
+Plot a line graph for the `pressure` dataset with the colour `blue`, and add a line for the `pressure_new` dataset in `red`.
+
+```{r}
+# Write your code to plot the pressure dataset below
+plot(pressure, type = "l", col = "blue",
+     main = "Vapour Pressure of Mercury", 
+     xlab = "Temperature (degree C)", ylab = "Pressure (mm)")
+
+# Write your code to add the line for pressure_new dataset below
+lines(pressure_new, col = "red")
+```
+
+With different coloured line, it is obvious that they represents different data. However, it is not clear from the plot which line represents which data. To make the plot easier to understand, a legend can be added to the plot by calling the `lengend()` function. The required arguments are:
+
+-   `x`: Specifies the location of the legend. For simplicity, it is common to use the predefined location such as "topleft", "bottomleft, etc. (Run `?legend` in the console to find out more)
+
+-   `legend`: A list of labels in the legend.
+
+-   `fill`: A list of corresponding colour to create filled checkbox in the legend.
+
+Add a legend for the line graph with two lines, using the three arguments introduced.
+
+```{r}
+# Write your code to add the legend below
+legend("topleft", legend = c("Pressure", "Pressure_new"), fill = c("blue", "red"))
+```
+
+## 3.2.3 Bar chart
+
+Bar chart is often used for visualising a frequency table. Since both the `iris` and `pressure` dataset are not a frequency table, this section uses the `table()` function to create a frequency table from a column of the `iris` dataset for demonstration purposes. 
+
+\* Frequency table: A table with the count of each unique values in the dataset. 
+
+```{r}
+barplot(table(iris$Petal.Length))
+```
+
+## 3.2.4 Histogram
+
+```{r}
+hist(iris$Sepal.Width)
+```
+
+## 3.2.5 Box plot
+
+```{r}
+boxplot(iris$Sepal.Width, horizontal = TRUE)
+```
+
+
-- 
GitLab