id | date_sample | system | location | users | ts |
---|---|---|---|---|---|
1 | 2023-11-01 | pit latrine | household | 5 | 136.24 |
2 | 2023-11-01 | pit latrine | household | 7 | 102.45 |
3 | 2023-11-01 | pit latrine | household | NA | 57.02 |
4 | 2023-11-01 | pit latrine | household | 6 | 27.03 |
5 | 2023-11-01 | pit latrine | household | 12 | 97.27 |
6 | 2023-11-02 | septic tank | household | 7 | 78.21 |
7 | 2023-11-02 | septic tank | household | 14 | 15.24 |
8 | 2023-11-02 | septic tank | household | 4 | 29.39 |
9 | 2023-11-02 | septic tank | household | 10 | 64.22 |
10 | 2023-11-02 | septic tank | household | 12 | 8.01 |
11 | 2023-11-03 | pit latrine | public toilet | 50 | 11.24 |
12 | 2023-11-03 | pit latrine | public toilet | 32 | 84.05 |
13 | 2023-11-03 | pit latrine | public toilet | 41 | 55.92 |
14 | 2023-11-03 | pit latrine | public toilet | 160 | 15.32 |
15 | 2023-11-03 | pit latrine | public toilet | 20 | 22.65 |
16 | 2023-11-04 | septic tank | public toilet | 26 | 8.72 |
17 | 2023-11-04 | septic tank | public toilet | 91 | 43.92 |
18 | 2023-11-04 | septic tank | public toilet | 68 | 10.37 |
19 | 2023-11-04 | septic tank | public toilet | 112 | 23.21 |
20 | 2023-11-04 | septic tank | public toilet | 59 | 15.64 |
Module 4 - Assignment 1
Data organization in spreadsheets
This course introduces learners to tools and workflows for data science with R. Learners are also introduced to the concept of collaborative writing and coding using git and GitHub within the context of reproducible documents (i.e. Quarto). So far we have used data that is well structured and ready to be used. However, in reality a lot of data entry and storage is still managed in spreadsheets. This is why we also touch on some (research) data management topics (Data Organization in Spreadsheet).
The reading for this assignment provides guidance for data entry and storage aspects. It offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses.
Task 1: Read and prepare examples
For this assignment, we ask you to:
- Read Broman and Woo (2018): “Data organization in spreadsheets”.
- Chose two of the recommendations and come up with real-world examples or scenarios where the recommendations could be applied in your work.
- Be prepared to share these examples and explain how the recommendations would improve your workflows. This will be in a class setting as part of small discussion group (max 3 people).
Task 2: Apply the recommendations to your samples data from Module 3
A pre-requisite for this homework is that you worked through the “Spreadsheet” assignment of module 3. If you have not done so, please do this firss: https://rbtl-fs24.github.io/website/assignments/md-03/am-03-2-spreadsheet.html
- Open the rbtl-fs24 workspace on posit.cloud
- Re-open your
samples-USERNAME
repository. - Create a new .R file and save it as
data_cleaning.R
in the folder. - Add
library(tidyverse)
to the top of the file. - Add
library(googlesheets4)
to the top of the file. - Use the
read_sheet()
function to read in your Google Sheets spreadsheet and store it in an object calledsamples
. - Use the
glimpse()
function to inspect the data. - Try to use R functions to apply the recommendations from the reading to your data. Note down recommendations that you struggled to achieve.
An .R script is a file containing R code that can be executed in the R environment, primarily used for defining functions, data manipulation, and running analyses. A .qmd file, associated with Quarto, is a more advanced document format that integrates R code with narrative text, allowing for the creation of dynamic, formatted reports or presentations that can include both the code and its output.
Imagine it as one single code-chunk without narrative text. Comments can be added using the #
symbol.
Example
The following is an example of a dataset that follows the recommendations from the reading.
Task 3: Create new folders
- Navigate to the Files tab in the bottom right window of RStudio.
- Click on the “Folder” button.
- Enter the name “data” in field and click OK.
- Click on the new
data
folder in the bottom right window. - Click on the “Folder” button.
- Enter the name “processed” in field and click OK.
- Click on the new
processed
folder in the bottom right window.
Task 4: Write processed data
Once you completed the data cleaning tasks from Task 2, write your processed data into the new data/processed
folder:
- In your
data_cleaning.R
file, write R code to apply the recommendations from the reading to your data. - Use the assignment operator
<-
to store the processed data in a new object nameddata_out
. - Use the
write_csv()
function to write thedata_out
object to thedata/processed
folder.
write_csv(data_out, "data/processed/waste-characterisation-processed.csv")
Task 5: Submit homework assignment
- Add all files to the commit, commit the changes with a meaningful commit message, and push the changes to GitHub.
- Open an issue on GitHub on your samples-USERNAME repo and tag the course instructor
@larnsce
.