-
-
Notifications
You must be signed in to change notification settings - Fork 32
Add 'Reproducible Reports with Quarto' Episode #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,286 @@ | ||
| --- | ||
| title: Reproducible Reports with Quarto | ||
| teaching: 45 | ||
| exercises: 15 | ||
| source: Rmd | ||
| --- | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::: objectives | ||
|
|
||
| - Describe the value of reproducible reporting. | ||
| - Create a new Quarto document (`.qmd`) in RStudio. | ||
| - Use Markdown syntax to format text. | ||
| - Create and run code chunks within a Quarto document. | ||
| - Render a Quarto document to an HTML report. | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::: questions | ||
|
|
||
| - How can I combine my code, results, and narrative into a single document? | ||
| - How can I automatically update my reports when my data changes? | ||
| - What is Quarto and how does it differ from a standard R script? | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
|
||
| ```{r, include=FALSE} | ||
| source("files/download_data.R") | ||
| library(tidyverse) | ||
| # Read raw data and apply cleaning steps from previous episodes | ||
| books2 <- read_csv("data/books.csv") %>% | ||
| rename( | ||
| title = X245.ab, | ||
| author = X245.c, | ||
| callnumber = CALL...BIBLIO., | ||
| isbn = ISN, | ||
| pubyear = X008.Date.One, | ||
| subCollection = BCODE1, | ||
| format = BCODE2, | ||
| location = LOCATION, | ||
| tot_chkout = TOT.CHKOUT, | ||
| loutdate = LOUTDATE, | ||
| subject = SUBJECT, | ||
| callnumber2 = CALL...ITEM. | ||
| ) %>% | ||
| mutate( | ||
| pubyear = as.integer(pubyear), | ||
| subCollection = recode(subCollection, | ||
| "-" = "general collection", | ||
| u = "government documents", | ||
| r = "reference", | ||
| b = "k-12 materials", | ||
| j = "juvenile", | ||
| s = "special collections", | ||
| c = "computer files", | ||
| t = "theses", | ||
| a = "archives", | ||
| z = "reserves" | ||
| ), | ||
| format = recode(format, | ||
| a = "book", | ||
| e = "serial", | ||
| w = "microform", | ||
| s = "e-gov doc", | ||
| o = "map", | ||
| n = "database", | ||
| k = "cd-rom", | ||
| m = "image", | ||
| "5" = "kit/object", | ||
| "4" = "online video" | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| ## Introduction to Reproducible Reporting | ||
|
|
||
| So far, we have been writing code in `.R` scripts. This is excellent for data analysis, but what happens when you need to share your findings with a colleague or a library director? You might copy a plot into a Word document or an email, then type out your interpretation. | ||
|
|
||
| But what if the data changes next month? You would have to re-run your script, re-save the plot, copy it back into Word, and update your text. This manual process is prone to errors and tedious. | ||
|
|
||
| **Quarto** allows you to combine your code, its output (plots, tables), and your narrative text into a single document. When you "render" the document, R runs the code and produces a polished report (HTML, PDF, or Word) automatically. | ||
|
|
||
| ## Creating a Quarto Document | ||
|
|
||
| To create a new Quarto document in RStudio: | ||
|
|
||
| 1. Click the **File** menu. | ||
| 2. Select **New File** > **Quarto Document...** | ||
| 3. In the dialog box, give your document a **Title** (e.g., "Library Usage Report") and enter your name as **Author**. | ||
| 4. Ensure **HTML** is selected as the output format. | ||
| 5. Click **Create**. | ||
|
|
||
| RStudio will open a new file with some example content. Notice the file extension is `.qmd`. | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::::: callout | ||
|
|
||
| ## Quarto vs. RMarkdown | ||
|
|
||
| If you have used R before, you might be familiar with RMarkdown (`.Rmd`). Quarto (`.qmd`) is the next-generation version of RMarkdown. It works very similarly but supports more languages (like Python and Julia) and has better features for scientific publishing. | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::::: | ||
|
|
||
| ## Anatomy of a Quarto Document | ||
|
|
||
| A Quarto document has three main parts: | ||
|
|
||
| ### 1. The YAML Header | ||
|
|
||
| At the very top, enclosed between two lines of `---`, is the **YAML Header**. This contains metadata about the document. | ||
|
|
||
| ```yaml | ||
| --- | ||
| title: "Library Usage Report" | ||
| author: "Your Name" | ||
| format: html | ||
| --- | ||
| ``` | ||
|
|
||
| ### 2. Markdown Text | ||
|
|
||
| The white space is where you write your narrative. You use **Markdown** syntax to format text. | ||
|
|
||
| - `**Bold**` for **bold text** | ||
| - `*Italics*` for *italics* | ||
| - `# Heading 1` for a main title | ||
| - `## Heading 2` for a section title | ||
| - `- List item` for bullet points | ||
|
|
||
| ### 3. Code Chunks | ||
|
|
||
| Code chunks are where your R code lives. They start with ` ```{r} ` and end with ` ``` `. | ||
|
|
||
| ```` | ||
| ```{r} | ||
| # This is a code chunk | ||
| summary(cars) | ||
| ``` | ||
| ```` | ||
|
|
||
| You can insert a new chunk by clicking the **+C** button in the editor toolbar, or by pressing <kbd>Ctrl</kbd>+<kbd>Alt</kbd>+<kbd>I</kbd> (Windows/Linux) or <kbd>Cmd</kbd>+<kbd>Option</kbd>+<kbd>I</kbd> (Mac). | ||
|
|
||
| ## Your First Report | ||
|
|
||
| Let's clean up the example file and create a report using our `books` data. | ||
|
|
||
| 1. Delete everything in the file *below* the YAML header. | ||
| 2. Add a new **setup** code chunk to load our libraries and prepare the data. | ||
|
|
||
| ```{{r}} | ||
| #| label: setup | ||
| #| include: false | ||
|
|
||
| library(tidyverse) | ||
|
|
||
| # Load data and rename columns for clarity | ||
| books2 <- read_csv("data/books.csv") %>% | ||
| rename( | ||
| subCollection = BCODE1, | ||
| tot_chkout = TOT.CHKOUT, | ||
| format = BCODE2 | ||
| ) %>% | ||
| mutate( | ||
| subCollection = recode(subCollection, | ||
| "-" = "general collection", | ||
| j = "juvenile", | ||
| b = "k-12 materials" | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::::: callout | ||
|
|
||
| ## Chunk Options | ||
|
|
||
| Notice the lines starting with `#|`. These are **chunk options**. | ||
| - `#| label: setup` gives the chunk a name. | ||
| - `#| include: false` runs the code but hides the code and output from the final report. This is great for loading data silently. | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::::: | ||
|
|
||
| ### Adding Analysis | ||
|
|
||
| Now, let's add a section header and some text. | ||
|
|
||
| ```markdown | ||
| ## High Usage Items | ||
|
|
||
| We are analyzing items with more than 10 checkouts to understand circulation patterns across sub-collections. | ||
| ``` | ||
|
|
||
| Next, insert a new code chunk and paste the plotting code we developed in the previous episode (ggplot2). | ||
|
|
||
| ```` | ||
| ```{r} | ||
| #| label: plot-high-usage | ||
| #| echo: false | ||
|
|
||
| # Filter for high usage | ||
| booksHighUsage <- books2 %>% | ||
| filter(!is.na(tot_chkout), | ||
| tot_chkout > 10) | ||
|
|
||
| # Create the plot | ||
| ggplot(data = booksHighUsage, | ||
| aes(x = subCollection, y = tot_chkout)) + | ||
| geom_boxplot(alpha = 0) + | ||
| geom_jitter(alpha = 0.5, color = "tomato") + | ||
| scale_y_log10() + | ||
| labs(title = "Distribution of Checkouts by Sub-Collection", | ||
| x = "Sub-Collection", | ||
| y = "Total Checkouts (Log Scale)") + | ||
| theme_bw() + | ||
| theme(axis.text.x = element_text(angle = 45, hjust = 1)) | ||
| ``` | ||
| ```` | ||
|
|
||
| Setting `#| echo: false` will display the *plot* in the report, but hide the R *code* that generated it. This is often preferred for reports intended for non-coders. | ||
|
|
||
| ## Rendering the Document | ||
|
|
||
| Now comes the magic. Click the **Render** button (blue arrow icon) at the top of the editor pane. | ||
|
|
||
| RStudio will: | ||
| 1. Run all your code chunks from scratch. | ||
| 2. Generate the plots and results. | ||
| 3. Combine them with your text. | ||
| 4. Create a new file named `library_usage_report.html` in your project folder. | ||
| 5. Open a preview of the report. | ||
|
|
||
| ::::::::::::::::::::::::::::::::::::::: challenge | ||
|
|
||
| ## Challenge: Add a Summary Table | ||
|
|
||
| 1. Add a new header `## Summary Statistics` to your Quarto document. | ||
| 2. Insert a new code chunk. | ||
| 3. Write code to calculate the mean checkouts per format (Hint: use `group_by(format)` and `summarize()`). | ||
| 4. Render the document again to see your new table included in the report. | ||
|
|
||
| ::::::::::::::: solution | ||
|
|
||
| ## Solution | ||
|
|
||
| Add this to your document: | ||
|
|
||
| ```markdown | ||
| ## Summary Statistics | ||
|
|
||
| The table below shows the average checkouts for each item format. | ||
| ``` | ||
|
|
||
| ```` | ||
| ```{r} | ||
| #| label: summary-table | ||
|
|
||
| books2 %>% | ||
| group_by(format) %>% | ||
| summarize(mean_checkouts = mean(tot_chkout, na.rm = TRUE)) %>% | ||
| arrange(desc(mean_checkouts)) | ||
| ``` | ||
| ```` | ||
|
|
||
| Render the document to see the updated report. | ||
|
|
||
| ::::::::::::::::::::::::: | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
|
||
| ## Why This Matters | ||
|
|
||
| By using Quarto, your report is now **reproducible**. | ||
|
|
||
| If you download a new version of `books.csv` next month: | ||
| 1. Save it to your `data/` folder. | ||
| 2. Open your Quarto document. | ||
| 3. Click **Render**. | ||
|
|
||
| Your report will automatically update with the new data, creating a fresh plot and table without you having to copy-paste a single thing. | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::: keypoints | ||
|
|
||
| - **Quarto** allows you to mix code and text to create reproducible reports. | ||
| - Use the **YAML header** to configure document metadata like title and output format. | ||
| - **Code chunks** run R code and can display or hide input/output using options like `#| echo: false`. | ||
| - **Rendering** the document executes the code and produces the final output (HTML, PDF, etc.). | ||
| - This workflow saves time and reduces errors when reporting on data that changes over time. | ||
|
|
||
| :::::::::::::::::::::::::::::::::::::::::::::::::: | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small addition could be referencing a markdown guide (e.g. https://www.markdownguide.org/ - for further information), as there doesn't appear to be anything Carpentries related we could link to.