diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..5b6a0652566d10360493952aec6d4a4febc77083 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +.Rproj.user +.Rhistory +.RData +.Ruserdata diff --git a/RBeginnersExercise3_Sol.Rmd b/RBeginnersExercise3_Sol.Rmd index 530a9af5a93655544247382512ee5166822dc6bc..4ad8aa8b09b0f3bfc49519478b652ac36d7833f6 100644 --- a/RBeginnersExercise3_Sol.Rmd +++ b/RBeginnersExercise3_Sol.Rmd @@ -70,7 +70,7 @@ par(mfrow = c(1, 1)) # 3.2 Basic plots -## 3.2.1 The scatter plot +## 3.2.1 Scatter plot The easiest plot to create in R is the scatter plot, using the `plot()` function, e.g., `plot(x_axis_var, y_axis_var)`. @@ -93,6 +93,8 @@ Notice that the scatter plot above lacks a descriptive plot title, and the axis Complete the code by adding the title, x-axis label and y-axis label. +\* To specify an argument when calling the function, use `function(argument = argumentValue)` + ```{r} # Complete the following code plot(iris$Petal.Width, iris$Petal.Length, @@ -125,8 +127,10 @@ Try plotting a line graph for the `pressure` dataset, providing descriptive plot ```{r} # Write your code below -plot(pressure, type = "l", main = "Vapour Pressure of Mercury", - xlab = "Temperature (degree C)", ylab = "Pressure (mm)") +plot(pressure, type = "l", + main = "Vapour Pressure of Mercury", + xlab = "Temperature (degree C)", + ylab = "Pressure (mm)") ``` ### 3.2.2.1 Multiple lines on a single plot @@ -154,7 +158,8 @@ Plot a line graph for the `pressure` dataset with the colour `blue`, and add a l # Write your code to plot the pressure dataset below plot(pressure, type = "l", col = "blue", main = "Vapour Pressure of Mercury", - xlab = "Temperature (degree C)", ylab = "Pressure (mm)") + xlab = "Temperature (degree C)", + ylab = "Pressure (mm)") # Write your code to add the line for pressure_new dataset below lines(pressure_new, col = "red") @@ -162,39 +167,140 @@ lines(pressure_new, col = "red") With different coloured line, it is obvious that they represents different data. However, it is not clear from the plot which line represents which data. To make the plot easier to understand, a legend can be added to the plot by calling the `lengend()` function. The required arguments are: -- `x`: Specifies the location of the legend. For simplicity, it is common to use the predefined location such as "topleft", "bottomleft, etc. (Run `?legend` in the console to find out more) +- `x`: Specifies the location of the legend. For simplicity, it is common to use the predefined location such as `topleft`, `bottomleft`, etc. (Run `?legend` in the console to find out more) -- `legend`: A list of labels in the legend. +- `legend`: A list (vector) of labels to be presented in the legend. -- `fill`: A list of corresponding colour to create filled checkbox in the legend. +- `fill`: A list (vector) of corresponding colours to create filled checkboxes in the legend. -Add a legend for the line graph with two lines, using the three arguments introduced. +Add a legend to the line graph with two lines, using the three arguments introduced. + +\* For argument that takes a list (vector), use `function(argument = c(value1, value2, ...))` ```{r} # Write your code to add the legend below -legend("topleft", legend = c("Pressure", "Pressure_new"), fill = c("blue", "red")) +legend("topleft", + legend = c("Pressure", "Pressure_new"), + fill = c("blue", "red")) ``` ## 3.2.3 Bar chart -Bar chart is often used for visualising a frequency table. Since both the `iris` and `pressure` dataset are not a frequency table, this section uses the `table()` function to create a frequency table from a column of the `iris` dataset for demonstration purposes. +Bar chart is often used to visualise a frequency table, plotted in R with the `barplot()` function. Since both the `iris` and `pressure` dataset are not a frequency table, this section uses the `table()` function to create a frequency table from a column of the `iris` dataset for demonstration purposes. + +\* Frequency table: A table with the count of each unique values in the dataset. -\* Frequency table: A table with the count of each unique values in the dataset. +```{r} +barplot(table(iris$Petal.Length), + main = "Frequency of Iris' Petal Length", + xlab = "Petal Length (cm)", + ylab = "Frequency") +``` + +Try to plot a bar chart of the `Petal.Width` column in the `iris` dataset with descriptive plot tile, x-axis label, and y-axis label. ```{r} -barplot(table(iris$Petal.Length)) +# Write your code below +barplot(table(iris$Petal.Width), + main = "Frequency of Iris' Petal Width", + xlab = "Petal Width (cm)", + ylab = "Frequency") ``` ## 3.2.4 Histogram +Histogram is often used to observe the trend in a dataset, plotted in R with the `hist()` function. Note that this function can only be applied to a column in a table. Create a histogram of the `Sepal.Width` column in the `iris` dataset with descriptive plot tile, x-axis label, and y-axis label. + ```{r} -hist(iris$Sepal.Width) +# Write your code below +hist(iris$Sepal.Width, + main = "Histogram of Iris' Sepal Width", + xlab = "Sepal Width (cm)", + ylab = "Frequency") ``` ## 3.2.5 Box plot +Boxplot is often used to visualise the statistical information of a dataset, showing: + +- median + +- lower quantile (first quartile) + +- upper quantile (third quartile) + +- min + +- max + +- outliers + +It is plotted in R with the `boxplot()` function and unlike `hist()`, `boxplot()` can be applied to a table with multiple columns. Produce a boxplot of the `iris` dataset with descriptive plot title. + +```{r} +# Write your code below +boxplot(iris, main = "Boxplots of Iris Dataset") +``` + +When using it on a specific column in a table or on a vector, the `horizontal = TRUE` argument is often applied to rotate the boxplot for better visualisation. Produce a boxplot of the same column used to plot the histogram, and rotate the boxplot with the `horizontal = TRUE` argument with descriptive plot title. + ```{r} -boxplot(iris$Sepal.Width, horizontal = TRUE) +boxplot(iris$Sepal.Width, horizontal = TRUE, + main = "Boxplot of Iris' Sepal Width") ``` +# 3.3 Customisation + +R provides various built-in arguments to customise a plot. This section will introduce some of the commonly used customisation. The full list of arguments for plot customisation can be found in the documentation (run the code chunk below): + +```{r} +?par +``` + +## 3.3.1 Types of points + +The plot point's style and size can be customised with the `pch` and `cex` arguments, respectively. The `pch` argument has a list of pre-defined style represented by integers. Run the code chunk below to check the pre-defined point style in R. + +```{r} +?pch +``` + +The `cex` argument control the size of the point with respect to 1. Hence, a value larger than 1 enlarge the plot point, while a value smaller than 1 minimise the plot point. + +Try to apply different combination of `pch` and `cex` to the plot below and see how the plot point changes. + +```{r} +# Complete the code below +plot(pressure, pch = 4, cex = 0.9) +``` + +## 3.3.2 Types of lines + +The plot line's style and width can be customised with the `lty` and `lwd` arguments, respectively. Similar to points, R has a list of pre-defined line's style represented by integers. A description of these styles can be found in the documentation by running `?par`. The `lwd` work as per `cex` to control the width of the line with respect to `. A value larger than 1 results in thicker line, while a value smaller than 1 results in thinner line. + +Try to apply different combination of `lty` and `lwd` to the plot below and see how the line changes. + +```{r} +plot(pressure, type = "l", lty = 4, lwd = 2) +``` + +## 3.3.3 Axis limits + +Sometimes, it can be helpful to shorten the axis for better visualisation. This can be achieved by specifying the axis' range using the `xlim` and `ylim` argument. The code below demonstrates how to specify the x-axis' range. + +```{r} +plot(pressure, type = "l", xlim = c(250, 350)) +``` + +Using a similar approach, use the `ylim` argument to set the y-axis range from 200 to 400. + +```{r} +# Complete the code below by adding the ylim argument +plot(pressure, type = "l", xlim = c(200, 350), ylim = c(200, 400)) +``` + + + + + diff --git a/part-3-data-visualisation.Rproj b/part-3-data-visualisation.Rproj new file mode 100644 index 0000000000000000000000000000000000000000..8e3c2ebc99e2e337f7d69948b93529a437590b27 --- /dev/null +++ b/part-3-data-visualisation.Rproj @@ -0,0 +1,13 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX