Course Overview

Thank you for your interest in this course. Your course instructors: Prof Elizabeth Tilley and Lars Schöbitz are looking forward to meet you.

We will meet Thursdays at CAB G 59 from 12:15 to 15:00. There is no Moodle page for this course. Everything you need will be published through this website.

Course Information

This course provides learners with skills in using the collection of R tidyverse packages as a tool for data analysis, reproducible research and communication. Lectures will be delivered through participatory live coding for students to learn how to write code in code-along exercises. We will use publicly available data related to waste management, air quality, and sanitation. Students will learn how to help themselves to build upon the obtained skills to apply them to their data analysis projects.

Topics include:

  • The data science life-cycle
  • Data organization in spreadsheets
  • Exploratory data analysis using visualization
  • Concept of tidy data and data tidying
  • Data transformation and descriptive statistics
  • Data communication using the Quarto open-source scientific and technical publishing system
  • Theory and foundations of field-based research
  • Research Design and implications for analysis

Learning Goals

  1. Be able to use a common set of data science tools (R, RStudio IDE, Git, GitHub, tidyverse, Quarto) to illustrate and communicate the results of data analysis projects.

  2. Learn to use the Quarto file format and the RStudio IDE visual editing mode to produce scholarly documents with citations, footnotes, cross-references, figures, and tables.

  3. Be able to design a questionnaire to collect information that can be analysed to answer a waste-related research question that is relevant for Zurich.

  4. Understand the main challenges associated with managing different types of waste, and how they differ between Europe and Africa.

Textbooks and Materials

We will rely entirely on open source and open access material for this course. We will use “R for Data Science” by Hadley Wickham as complementary reading and learning material for this course. Additional readings will consist of blog posts, journal articles, and reports. All required readings and class material will be provided through this website.

Course Calendar

date module topic
22 February 2024 1 Welcome & get ready for the course
29 February 2024 2 Data science lifecycle & Exploratory data analysis using visualization
07 March 2024 3 Data transformation with dplyr
14 March 2024 4 Data import & Data organization in spreadsheets
21 March 2024 5 Conditions & Dates & Tables
28 March 2024 6 Data types & Vectors & Pivoting
04 April 2024 Easter Break
11 April 2024 7 Joining tables & Creating and publishing scholarly articles with Quarto and GitHub pages
18 April 2024 8 Waste Research
25 April 2024 9 Research Design
02 May 2024 10 Questionnaires
09 May 2024 Auffahrt Break
16 May 2024 11 Pre-test and logistics
23 May 2024 Data collection
30 May 2024 12 Data analysis & report writing
06 June 2024 Project Submission Deadline
13 June 2024 Exam

Weekly Structure

Assignment submission: Wednesdays, latest by 23:59.

Monday
Tuesday Student hours from 14:00 to 16:00 (CET)
Wednesday Assignment submission, latest by 23:59 (CET)
Thursday Lecture from 12:15 to 15:00 (CET)
Friday

Performance assessment

The performance assessment and resulting grading scheme are shown below.

  • End-of-semester exam: 50 points
  • Compulsory continuous performance assessment: 50 points, of which
    • Homework assignments: 20 points (n = 10)
    • Capstone project: 30 points, of which
      • Technical parts of submitted report: 20 points (we will communicate what we expect)
      • Intellectual framing of results: 10 points (we will communicate what we expect)

Table Table 1 shows the conversion from points to grades. Grades follow the ETHZ’s Grading System. Points are rounded to the nearest grade, for example:

  • 97 points = 5.75
  • 93 points = 5.75
  • 92 points = 5.50
  • 45 points = 4.00
  • 44 points = 3.50
Table 1: Conversion from points to grades.
grade points
6.00 100
5.75 95
5.50 90
5.25 85
5.00 80
4.75 75
4.50 70
4.25 60
4.00 50
3.50 40
3.00 30
2.50 20
2.00 10
1.00 0

End-of-semester exam

There is a 2-hour final written exam, which assesses the technical skills taught during the course. It contains programming exercises using the R programming language. The success of the exam depends on the effort put into the compulsory continuous performance assessment. The exam receives 50 points.

Compulsory continuous performance assessment

Homework assignments: Each week will have at least one homework assignment. Homework assignments are delivered as Quarto documents with instructions and sample code. Students are required to submit their work through GitHub. A total of ten assignments receive a pass/fail with 2 points for each assignment and 20 points in total.

Capstone Project: A final capstone project provides students with an opportunity to apply their skills and techniques to real-world data sets. Each student will collect their own data for this project, either using a survey based tool (Google Forms) or an observational study (Google Sheets).

Detailed instructions for the completion of the capstone project will be provided. The project report will be delivered as a Quarto documens and students are asked to submit their work through GitHub. The capstone project receives 30 points.

Readings: Every week, additional readings will be provided that support students in learning the underlying concept that are taught during the class. Readings are not graded.

Policies

Class attendance

We hope that you can attend class in person. If you cannot attend a class, we expect you to contact us and inform us about it. There will be a live streaming recording that you can watch from home, however we will not accomodate for two way communication.

If you miss a class, we expect you to work through the material of the class using the recording of the live streaming.

AI Policy

We expect you to use AI tools in this class (e.g. perplexity.ai, ChatGPT, etc.). Some assignments may require it. Learning to us AI is an emerging skill that we want you to embrace.

Be aware of the limits of these tools:

  • Minimum effort prompts will yield low quality results. Refine your prompts to get good outcomes. This will take work.

  • Don’t trust anything it says. Unless you know the answer or know how to check it, assume it is wrong. You will be responsible for any errors or omissions provided by the tool. It works best for topics you understand.

  • AI is a tool that you need to acknowledge using. Include links to your prompts and explain how you used AI to complete an assignment. Failure to do so is in violation of academic integrity policies.

  • Be thoughtful about when this tool is useful. Don’t use it if it isn’t appropriate for the case of circumstance.

This AI Policy was adapted from Ethan Mollick

Code of Conduct

This course follows the ETH Respect Code of Conduct. If you have not yet read this Code of Conduct, please familiarize yourself with it. If you experience inappropriate behaviour from us or any of your classmates, you will find contact and advice services here: https://respekt.ethz.ch/en/contact-and-advice-services.html