Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
#Read the penguins_samp1 data file from githubpenguins <-read_csv("https://raw.githubusercontent.com/mcduryea/Intro-to-Bioinformatics/main/data/penguins_samp1.csv")
Rows: 44 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#See the first six rows of the data we've read in to our notebookpenguins %>%head(2) %>%kable() %>%kable_styling(c("striped", "hover"))
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
year
Gentoo
Biscoe
59.6
17
230
6050
male
2007
Gentoo
Biscoe
48.6
16
230
5800
male
2008
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).
About our Data
The data we are working with is a data set on Penguins, which includes 8 features measured on 44 penguins. The features included are physiological features (like bill length, bill depth, flipper length, body mass, etc) as well as other features like the year that the penguin was observed, the island the penguin was observed on, the sex of the penguin, and the species of the penguin.
Interesting Questions to Ask
What is the average flipper length? What about for each species?
Are there more male or female penguins? What about per island or species?
What is the average body mass? What about by island? By species? By sex?
What is the ratio of bill length to bill depth for a penguin? What is the overall average of this metric? Does it change by species, sex, or island?
Does average body mass change by year?
Data Manipulation Tools & Strategies
We can look at individual columns in a data set or subsets of columns in a data set. For example, if we are only interested in flipper length and species, we can select() those columns.
If we want to filter() and only show certain rows, we can do that too.
#we can filter by sex (categorical variables)penguins %>%filter(species =="chinstrap")
# A tibble: 0 × 8
# … with 8 variables: species <chr>, island <chr>, bill_length_mm <dbl>,
# bill_depth_mm <dbl>, flipper_length_mm <dbl>, body_mass_g <dbl>, sex <chr>,
# year <dbl>
#we can filter by numerical varibales penguins %>%filter(body_mass_g >=6000)
# A tibble: 2 × 8
species island bill_length_mm bill_depth_mm flipper_leng…¹ body_…² sex year
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 Gentoo Biscoe 59.6 17 230 6050 male 2007
2 Gentoo Biscoe 49.2 15.2 221 6300 male 2007
# … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
#we can also do bothpenguins %>%filter((body_mass_g >=6000) | (island =="Torgersen"))
# A tibble: 7 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 Gentoo Biscoe 59.6 17 230 6050 male 2007
2 Gentoo Biscoe 49.2 15.2 221 6300 male 2007
3 Adelie Torgersen 40.6 19 199 4000 male 2009
4 Adelie Torgersen 38.8 17.6 191 3275 fema… 2009
5 Adelie Torgersen 41.1 18.6 189 3325 male 2009
6 Adelie Torgersen 38.6 17 188 2900 fema… 2009
7 Adelie Torgersen 36.2 17.2 187 3150 fema… 2009
# … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
Answering our Questions
Most of our questions involve summarizing data, and perhaps summarizing over groups. We can summarize data using the summarize() function, and group data using group_by().
Let’s find the average flipper length.
#Overall average flipper lengthpenguins %>%summarize(avg_flipper_length =mean(flipper_length_mm))
# A tibble: 1 × 1
avg_flipper_length
<dbl>
1 212.
#Single Species Averagepenguins %>%filter(species =="Gentoo") %>%summarize (avg_flipper_length =mean(flipper_length_mm))
# A tibble: 3 × 2
species avg_flipper_length
<chr> <dbl>
1 Adelie 189.
2 Chinstrap 200
3 Gentoo 218.
How many of each species do we have?
penguins %>%count(species)
# A tibble: 3 × 2
species n
<chr> <int>
1 Adelie 9
2 Chinstrap 2
3 Gentoo 33
How many of each sex are there? What about by
penguins %>%count(sex)
# A tibble: 2 × 2
sex n
<chr> <int>
1 female 20
2 male 24
penguins %>%group_by(species) %>%count(sex)
# A tibble: 6 × 3
# Groups: species [3]
species sex n
<chr> <chr> <int>
1 Adelie female 6
2 Adelie male 3
3 Chinstrap female 1
4 Chinstrap male 1
5 Gentoo female 13
6 Gentoo male 20
We can use mutate() to add new columns to our data set.