Piecewise Workflow

When to Use This Approach

The “piecewise” name refers to splitting data retrieval into separate steps (accessParquetData → custom dplyr operations → loadParquetData) rather than using the single-step returnSamples() wrapper.

Use a piecewise workflow when you need:

Direct SQL control: Write custom dplyr queries on the DuckDB connection before loading data
Multiple queries from one connection: Query different data types efficiently and without reconnecting, such as to determine association between a species and a pathway
Complex filtering: Apply advanced dplyr operations beyond simple column matching
Memory-efficient workflows: Preview data size with dplyr::show_query() before loading
Debugging queries: Inspect the SQL generated at each step

If you just want to retrieve data quickly, use First 15 Minutes or Full Workflow instead. Those vignettes use the wrapper function returnSamples() which handles the connection and loading steps for you.

How It Works

The piecewise workflow splits data retrieval into two steps:

accessParquetData() - Creates a DuckDB connection and sets up VIEW objects for parquet files (remote or local)
loadParquetData() - Queries those views with your filters and loads data as a TreeSummarizedExperiment

Between these steps, you can use dplyr to inspect, modify, and optimize queries before executing them.

Discovering Available Data

Before selecting files, explore what data types and reference tables are available:

available_data <- get_hf_parquet_urls(repo_name = "waldronlab/metagenomics_mac")

ref_info <- get_ref_info()

File Selection

Now that we know what’s available, we can select our files of interest. The data_type value corresponds to the parquet files we want to access.

We then run accessParquetData, which creates the DuckDB connection and sets up DuckDB “VIEW” objects for each available file, whether remotely hosted or locally stored. It returns a DuckDB connection object. If you are using a remote repo, the data_types argument can be left blank to access all available files, or a smaller number can be provided. DBI functions can then be used to see which views are now available.

con <- accessParquetData(dbdir = ":memory:",
                        repo = "waldronlab/metagenomics_mac",
                        local_files = NULL,
                        data_types = "relative_abundance")
DBI::dbListTables(con)
#> [1] "relative_abundance_clade_name_species"
#> [2] "relative_abundance_uuid"

Supplying local files would look much as expected, and the “data_types” argument need not be supplied:

local_files <- c("/path/to/relative_abundance_clade_name_species.parquet",
                "/path/to/relative_abundance_uuid.parquet")

con <- accessParquetData(dbdir = ":memory:",
                        repo = NULL,
                        local_files = local_files,
                        data_types = NULL)

Selecting Samples

The sample selection process is the same as in the standard workflow, with the exception that we are only passing the UUID column to the loading function. Here we will create the same table as earlier, then pull just the UUIDS. This is again an optional step, as you may want to get data across all samples for meta-analyses.

data("sampleMetadata", package = "parkinsonsMetagenomicData")
sample_table <- sampleMetadata |>
    filter(study_name == "ZhangM_2023") |>
    select(where(~ !any(is.na(.x))))

selected_uuids <- sample_table$uuid
selected_uuids
#>  [1] "0807eb2a-a15e-4647-8e19-2600d8fda378"
#>  [2] "e0fbb54f-0249-4917-a4d7-bd68acb89c62"
#>  [3] "25172837-2849-4db3-be91-d54d6a815d00"
#>  [4] "39ddb5e7-97f6-4d3c-812b-9653b03f99b3"
#>  [5] "7b152a7d-e244-4e2b-b924-7195c7ecfb10"
#>  [6] "dd30f93b-7999-47a4-93fb-21971b899939"
#>  [7] "1406666f-04a8-43c9-983b-4ed62fd6da4a"
#>  [8] "fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd"
#>  [9] "8707e374-5ddb-4220-8cbf-364b8b0e7be1"
#> [10] "22848a9c-66a6-4993-9058-cb6464edb42f"
#> [11] "08e2b754-78e2-4cb4-8ff2-95fd7b0ff44a"
#> [12] "9baef0b2-93d2-4a40-8082-d357c7f8156a"
#> [13] "09a9303d-d87d-4556-9672-04cbbcaf3d37"
#> [14] "ac9f3532-90d8-412c-9c80-491037f0bcc2"
#> [15] "eda61949-02dc-40ae-8dbe-bea2add85a52"
#> [16] "1f007260-be6c-4a21-800a-ad9c36129a0d"
#> [17] "e47a59bb-443a-405f-9c5d-02659d80e9e5"
#> [18] "b3eaf3ab-43ef-4830-ab6d-12bafed3c61e"
#> [19] "28f7352f-fe23-4003-93e1-41f4fedc6232"
#> [20] "b07e2362-5851-4181-ba9a-15d9109ee4dd"
#> [21] "677be4e3-722b-4e43-bd5a-36d8fbed6f86"
#> [22] "0c817272-f873-475f-a401-dfe46a679a9f"
#> [23] "7a3945d9-21bb-434a-9a4e-bfcdeb6194de"
#> [24] "56aa2ad5-007d-407c-a644-48aac1e9a8f0"

Selecting Features

Our feature table is treated similarly, in that we are no longer providing the whole table. Instead, we construct our filtering arguments as a named list. The element name is the column to filter by, and the element values will be exact matches. If this is our previous table:

clade_name_ref <- load_ref("clade_name_ref")
feature_table <- clade_name_ref %>%
    filter(grepl("Faecalibacterium", clade_name_genus)) %>%
    filter(!is.na(clade_name_species)) %>%
    filter(!is.na(clade_name_terminal))

We would set up the filters like so. These two columns are the minimum required to produce the same result as our prior example.

filter_values <- list(clade_name_species = unique(feature_table$clade_name_species),
                    clade_name_terminal = unique(feature_table$clade_name_terminal))

Here is also where we supply our UUIDs to specify samples.

filter_values[["uuid"]] <- selected_uuids
filter_values
#> $clade_name_species
#> [1] "s__Faecalibacterium_SGB15346"       "s__Faecalibacterium_prausnitzii"   
#> [3] "s__Faecalibacterium_sp_An122"       "s__Faecalibacterium_sp_CLA_AA_H233"
#> [5] "s__Faecalibacterium_sp_HTFF"       
#> 
#> $clade_name_terminal
#>  [1] "t__SGB15346" "t__SGB15316" "t__SGB15317" "t__SGB15318" "t__SGB15322"
#>  [6] "t__SGB15323" "t__SGB15332" "t__SGB15339" "t__SGB15342" "t__SGB15312"
#> [11] "t__SGB15315" "t__SGB15340"
#> 
#> $uuid
#>  [1] "0807eb2a-a15e-4647-8e19-2600d8fda378"
#>  [2] "e0fbb54f-0249-4917-a4d7-bd68acb89c62"
#>  [3] "25172837-2849-4db3-be91-d54d6a815d00"
#>  [4] "39ddb5e7-97f6-4d3c-812b-9653b03f99b3"
#>  [5] "7b152a7d-e244-4e2b-b924-7195c7ecfb10"
#>  [6] "dd30f93b-7999-47a4-93fb-21971b899939"
#>  [7] "1406666f-04a8-43c9-983b-4ed62fd6da4a"
#>  [8] "fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd"
#>  [9] "8707e374-5ddb-4220-8cbf-364b8b0e7be1"
#> [10] "22848a9c-66a6-4993-9058-cb6464edb42f"
#> [11] "08e2b754-78e2-4cb4-8ff2-95fd7b0ff44a"
#> [12] "9baef0b2-93d2-4a40-8082-d357c7f8156a"
#> [13] "09a9303d-d87d-4556-9672-04cbbcaf3d37"
#> [14] "ac9f3532-90d8-412c-9c80-491037f0bcc2"
#> [15] "eda61949-02dc-40ae-8dbe-bea2add85a52"
#> [16] "1f007260-be6c-4a21-800a-ad9c36129a0d"
#> [17] "e47a59bb-443a-405f-9c5d-02659d80e9e5"
#> [18] "b3eaf3ab-43ef-4830-ab6d-12bafed3c61e"
#> [19] "28f7352f-fe23-4003-93e1-41f4fedc6232"
#> [20] "b07e2362-5851-4181-ba9a-15d9109ee4dd"
#> [21] "677be4e3-722b-4e43-bd5a-36d8fbed6f86"
#> [22] "0c817272-f873-475f-a401-dfe46a679a9f"
#> [23] "7a3945d9-21bb-434a-9a4e-bfcdeb6194de"
#> [24] "56aa2ad5-007d-407c-a644-48aac1e9a8f0"

Loading into R

We then provide our list of filtering arguments to loadParquetData along with the database connection object and the data type we are accessing and receive a Tree Summarized Experiment object. Loading the data may take some time depending on the file type and queries.

basic_experiment <- loadParquetData(con = con,
                                    data_type = "relative_abundance",
                                    filter_values = filter_values,
                                    custom_view = NULL,
                                    include_empty_samples = TRUE,
                                    dry_run = FALSE)
basic_experiment
#> class: TreeSummarizedExperiment 
#> dim: 9 24 
#> metadata(0):
#> assays(1): relative_abundance
#> rownames(9):
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_SGB15346|t__SGB15346
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_prausnitzii|t__SGB15318
#>   ...
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_prausnitzii|t__SGB15323
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_sp_CLA_AA_H233|t__SGB15315
#> rowData names(19): clade_name clade_name_kingdom ...
#>   NCBI_tax_id_terminal additional_species
#> colnames(24): fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd
#>   39ddb5e7-97f6-4d3c-812b-9653b03f99b3 ...
#>   1406666f-04a8-43c9-983b-4ed62fd6da4a
#>   677be4e3-722b-4e43-bd5a-36d8fbed6f86
#> colData names(56): uuid db_version ...
#>   ZhangM_2023_uncurated_Sample.Name ZhangM_2023_uncurated_SRA.Study
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

Performing a dry run in order to examine the SQL query would look much as before:

query_only <- loadParquetData(con = con,
                                data_type = "relative_abundance",
                                filter_values = filter_values,
                                custom_view = NULL,
                                include_empty_samples = TRUE,
                                dry_run = TRUE)
dplyr::show_query(query_only)
#> <SQL>
#> SELECT q01.*
#> FROM (
#>   SELECT relative_abundance_clade_name_species.*
#>   FROM relative_abundance_clade_name_species
#>   WHERE (clade_name_species = 's__Faecalibacterium_SGB15346')
#> 
#>   UNION ALL
#> 
#>   SELECT relative_abundance_clade_name_species.*
#>   FROM relative_abundance_clade_name_species
#>   WHERE (clade_name_species = 's__Faecalibacterium_prausnitzii')
#> 
#>   UNION ALL
#> 
#>   SELECT relative_abundance_clade_name_species.*
#>   FROM relative_abundance_clade_name_species
#>   WHERE (clade_name_species = 's__Faecalibacterium_sp_An122')
#> 
#>   UNION ALL
#> 
#>   SELECT relative_abundance_clade_name_species.*
#>   FROM relative_abundance_clade_name_species
#>   WHERE (clade_name_species = 's__Faecalibacterium_sp_CLA_AA_H233')
#> 
#>   UNION ALL
#> 
#>   SELECT relative_abundance_clade_name_species.*
#>   FROM relative_abundance_clade_name_species
#>   WHERE (clade_name_species = 's__Faecalibacterium_sp_HTFF')
#> ) q01
#> WHERE
#>   (clade_name_terminal IN ('t__SGB15346', 't__SGB15316', 't__SGB15317', 't__SGB15318', 't__SGB15322', 't__SGB15323', 't__SGB15332', 't__SGB15339', 't__SGB15342', 't__SGB15312', 't__SGB15315', 't__SGB15340')) AND
#>   (uuid IN ('0807eb2a-a15e-4647-8e19-2600d8fda378', 'e0fbb54f-0249-4917-a4d7-bd68acb89c62', '25172837-2849-4db3-be91-d54d6a815d00', '39ddb5e7-97f6-4d3c-812b-9653b03f99b3', '7b152a7d-e244-4e2b-b924-7195c7ecfb10', 'dd30f93b-7999-47a4-93fb-21971b899939', '1406666f-04a8-43c9-983b-4ed62fd6da4a', 'fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd', '8707e374-5ddb-4220-8cbf-364b8b0e7be1', '22848a9c-66a6-4993-9058-cb6464edb42f', '08e2b754-78e2-4cb4-8ff2-95fd7b0ff44a', '9baef0b2-93d2-4a40-8082-d357c7f8156a', '09a9303d-d87d-4556-9672-04cbbcaf3d37', 'ac9f3532-90d8-412c-9c80-491037f0bcc2', 'eda61949-02dc-40ae-8dbe-bea2add85a52', '1f007260-be6c-4a21-800a-ad9c36129a0d', 'e47a59bb-443a-405f-9c5d-02659d80e9e5', 'b3eaf3ab-43ef-4830-ab6d-12bafed3c61e', '28f7352f-fe23-4003-93e1-41f4fedc6232', 'b07e2362-5851-4181-ba9a-15d9109ee4dd', '677be4e3-722b-4e43-bd5a-36d8fbed6f86', '0c817272-f873-475f-a401-dfe46a679a9f', '7a3945d9-21bb-434a-9a4e-bfcdeb6194de', '56aa2ad5-007d-407c-a644-48aac1e9a8f0'))

Example 2: Pathway Abundance

The piecewise workflow works identically for functional data types. Here’s an example with pathway abundance:

# Connect to pathway data
con_pathways <- accessParquetData(dbdir = ":memory:",
                                   repo = "waldronlab/metagenomics_mac",
                                   data_types = "pathabundance_unstratified")

DBI::dbListTables(con_pathways)
#> [1] "pathabundance_unstratified_pathway" "pathabundance_unstratified_uuid"

# Load pathway reference
pathway_ref <- load_ref("pathway_ref")

# Filter for butyrate biosynthesis pathways
butyrate_pathways <- pathway_ref |>
    filter(grepl("butanoate|butyrat", pathway, ignore.case = TRUE)) |>
    filter(!grepl("\\|", pathway)) |>  # Exclude stratified versions
    select(pathway)

# Set up filter values (same samples as before)
pathway_filter_values <- list(
    uuid = selected_uuids,
    pathway = butyrate_pathways$pathway
)

# Load pathway data
pathway_experiment <- loadParquetData(
    con = con_pathways,
    data_type = "pathabundance_unstratified",
    filter_values = pathway_filter_values,
    include_empty_samples = TRUE,
    dry_run = FALSE
)

pathway_experiment
#> class: TreeSummarizedExperiment 
#> dim: 8 24 
#> metadata(0):
#> assays(1): abundance
#> rownames(8): ARGDEG-PWY: superpathway of L-arginine, putrescine, and
#>   4-aminobutanoate degradation CENTFERM-PWY: pyruvate fermentation to
#>   butanoate ... PWY-5677: succinate fermentation to butanoate PWY-7218:
#>   photosynthetic 3-hydroxybutanoate biosynthesis (engineered)
#> rowData names(1): pathway
#> colnames(24): 56aa2ad5-007d-407c-a644-48aac1e9a8f0
#>   b3eaf3ab-43ef-4830-ab6d-12bafed3c61e ...
#>   0c817272-f873-475f-a401-dfe46a679a9f
#>   fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd
#> colData names(52): uuid humann_header ...
#>   ZhangM_2023_uncurated_Sample.Name ZhangM_2023_uncurated_SRA.Study
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

This demonstrates the flexibility of the piecewise approach: you can create multiple connections to different data types and query them with the same workflow.

Custom View: Advanced dplyr Operations

The real power of the piecewise workflow is using dplyr to manipulate DuckDB views before loading data. You can access any view with tbl() and apply dplyr operations.

Example 1: Filter on Additional Columns

# Access the view and add custom filtering
custom_filter <- tbl(con, "relative_abundance_uuid") %>%
    filter(!is.na(additional_species))

# Preview the data (lazy query, not executed yet)
custom_filter
#> # Source:   SQL [?? x 26]
#> # Database: DuckDB 1.5.2 [unknown@Linux 6.17.0-1010-azure:R 4.7.0/:memory:]
#>    clade_name_kingdom clade_name_phylum clade_name_class    clade_name_order    
#>    <chr>              <chr>             <chr>               <chr>               
#>  1 k__Bacteria        p__Firmicutes     c__Clostridia       o__Eubacteriales    
#>  2 k__Bacteria        p__Firmicutes     c__Erysipelotrichia o__Erysipelotrichal…
#>  3 k__Bacteria        p__Firmicutes     c__Clostridia       o__Eubacteriales    
#>  4 k__Bacteria        p__Bacteroidota   c__Bacteroidia      o__Bacteroidales    
#>  5 k__Bacteria        p__Firmicutes     c__Bacilli          o__Lactobacillales  
#>  6 k__Bacteria        p__Bacteroidota   c__Bacteroidia      o__Bacteroidales    
#>  7 k__Bacteria        p__Bacteroidota   c__Bacteroidia      o__Bacteroidales    
#>  8 k__Bacteria        p__Firmicutes     c__Clostridia       o__Eubacteriales    
#>  9 k__Bacteria        p__Firmicutes     c__Clostridia       o__Eubacteriales    
#> 10 k__Bacteria        p__Firmicutes     c__Clostridia       o__Eubacteriales    
#> # ℹ more rows
#> # ℹ 22 more variables: clade_name_family <chr>, clade_name_genus <chr>,
#> #   clade_name_species <chr>, clade_name_terminal <chr>,
#> #   NCBI_tax_id_kingdom <chr>, NCBI_tax_id_phylum <chr>,
#> #   NCBI_tax_id_class <chr>, NCBI_tax_id_order <chr>, NCBI_tax_id_family <chr>,
#> #   NCBI_tax_id_genus <chr>, NCBI_tax_id_species <chr>,
#> #   NCBI_tax_id_terminal <chr>, clade_name <chr>, NCBI_tax_id <chr>, …

View the generated SQL:

dplyr::show_query(custom_filter)
#> <SQL>
#> SELECT relative_abundance_uuid.*
#> FROM relative_abundance_uuid
#> WHERE (NOT((additional_species IS NULL)))

Load the data with the custom view:

custom_experiment <- loadParquetData(con = con,
                                    data_type = "relative_abundance",
                                    filter_values = filter_values,
                                    custom_view = custom_filter,
                                    include_empty_samples = TRUE,
                                    dry_run = FALSE)
custom_experiment
#> class: TreeSummarizedExperiment 
#> dim: 7 24 
#> metadata(0):
#> assays(1): relative_abundance
#> rownames(7):
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_prausnitzii|t__SGB15317
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_prausnitzii|t__SGB15318
#>   ...
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_prausnitzii|t__SGB15342
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium|s__Faecalibacterium_sp_CLA_AA_H233|t__SGB15315
#> rowData names(19): clade_name clade_name_kingdom ...
#>   NCBI_tax_id_terminal additional_species
#> colnames(24): 0c817272-f873-475f-a401-dfe46a679a9f
#>   677be4e3-722b-4e43-bd5a-36d8fbed6f86 ...
#>   b3eaf3ab-43ef-4830-ab6d-12bafed3c61e
#>   fe3de3ca-3a14-4bd8-ae1c-0dad69edc9cd
#> colData names(56): uuid db_version ...
#>   ZhangM_2023_uncurated_Sample.Name ZhangM_2023_uncurated_SRA.Study
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

Example 2: Aggregate Before Loading

You can perform aggregations in DuckDB before bringing data into R. This is memory-efficient for large datasets:

# Calculate summary statistics for Akkermansia muciniphila across all samples
# First check what Akkermansia species are available
akk_species <- load_ref("clade_name_ref") %>%
    filter(grepl("Akkermansia", clade_name_genus)) %>%
    pull(clade_name_species) %>%
    unique()

akk_species  # Should show "Akkermansia_muciniphila" if available

# Aggregate abundance data in DuckDB before loading into R
aggregated_view <- tbl(con, "relative_abundance_uuid") %>%
    filter(clade_name_species == "Akkermansia_muciniphila") %>%
    group_by(clade_name_species) %>%
    summarize(
        mean_abundance = mean(relative_abundance, na.rm = TRUE),
        median_abundance = median(relative_abundance, na.rm = TRUE),
        max_abundance = max(relative_abundance, na.rm = TRUE),
        n_samples_detected = sum(relative_abundance > 0, na.rm = TRUE),
        .groups = "drop"
    )

# Show the SQL that will be executed
show_query(aggregated_view)

# Execute and collect the aggregated result (small, just summary stats)
aggregated_data <- collect(aggregated_view)
aggregated_data

This aggregation happens in DuckDB before loading into R, making it much more memory-efficient than loading all individual sample abundances first and then aggregating in R.

Example 3: Combine Data from Multiple Connections

You can query different data types separately and combine them in R. This example pairs taxonomic and pathway data:

# Get species present in our samples (aggregate to species level)
# Aggregating here avoids duplicate rows from subspecies/strains
taxa_data <- tbl(con, "relative_abundance_uuid") %>%
    filter(uuid %in% selected_uuids) %>%
    filter(clade_name_species %in% !!unique(feature_table$clade_name_species)) %>%
    group_by(uuid, clade_name_species) %>%
    summarize(relative_abundance = sum(relative_abundance, na.rm = TRUE), .groups = "drop") %>%
    collect()

# Get butyrate pathway data for same samples
pathway_data <- tbl(con_pathways, "pathabundance_unstratified_uuid") %>%
    filter(uuid %in% selected_uuids) %>%
    filter(grepl("butanoate|butyrat", pathway, ignore.case = TRUE)) %>%
    filter(!grepl("\\|", pathway)) %>%  # Exclude stratified
    group_by(uuid) %>%
    summarize(total_butyrate_pathways = sum(abundance, na.rm = TRUE), .groups = "drop") %>%
    collect()

# Join in R on uuid (sample ID)
joined_data <- taxa_data %>%
    inner_join(pathway_data, by = "uuid") %>%
    arrange(desc(relative_abundance))

This demonstrates querying different data types separately (aggregating in DuckDB before loading) then combining in R. The joined data pairs each sample’s Faecalibacterium abundance with its total butyrate pathway abundance, enabling correlation analysis to test hypotheses like: “Do samples with higher Faecalibacterium also have higher butyrate biosynthesis capacity?”

sessionInfo()
#> R Under development (unstable) (2026-04-12 r89873)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] DT_0.34.0                        DBI_1.3.0                       
#> [3] dplyr_1.2.1                      parkinsonsMetagenomicData_0.99.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.41.1     httr2_1.2.2                    
#>  [3] xfun_0.57                       bslib_0.10.0                   
#>  [5] htmlwidgets_1.6.4               Biobase_2.71.0                 
#>  [7] lattice_0.22-9                  tzdb_0.5.0                     
#>  [9] crosstalk_1.2.2                 yulab.utils_0.2.4              
#> [11] vctrs_0.7.3                     tools_4.7.0                    
#> [13] generics_0.1.4                  curl_7.0.0                     
#> [15] stats4_4.7.0                    parallel_4.7.0                 
#> [17] tibble_3.3.1                    blob_1.3.0                     
#> [19] pkgconfig_2.0.3                 Matrix_1.7-5                   
#> [21] dbplyr_2.5.2                    desc_1.4.3                     
#> [23] S4Vectors_0.49.1-1              assertthat_0.2.1               
#> [25] lifecycle_1.0.5                 stringr_1.6.0                  
#> [27] compiler_4.7.0                  treeio_1.35.0                  
#> [29] textshaping_1.0.5               Biostrings_2.79.5              
#> [31] Seqinfo_1.1.0                   codetools_0.2-20               
#> [33] htmltools_0.5.9                 sass_0.4.10                    
#> [35] yaml_2.3.12                     lazyeval_0.2.3                 
#> [37] pkgdown_2.2.0                   pillar_1.11.1                  
#> [39] crayon_1.5.3                    jquerylib_0.1.4                
#> [41] tidyr_1.3.2                     BiocParallel_1.45.0            
#> [43] SingleCellExperiment_1.33.2     DelayedArray_0.37.1            
#> [45] cachem_1.1.0                    abind_1.4-8                    
#> [47] nlme_3.1-169                    tidyselect_1.2.1               
#> [49] digest_0.6.39                   stringi_1.8.7                  
#> [51] duckdb_1.5.2                    purrr_1.2.2                    
#> [53] arrow_23.0.1.2                  TreeSummarizedExperiment_2.19.0
#> [55] fastmap_1.2.0                   grid_4.7.0                     
#> [57] cli_3.6.6                       SparseArray_1.11.13            
#> [59] magrittr_2.0.5                  S4Arrays_1.11.1                
#> [61] utf8_1.2.6                      ape_5.8-1                      
#> [63] withr_3.0.2                     readr_2.2.0                    
#> [65] rappdirs_0.3.4                  bit64_4.6.0-1                  
#> [67] rmarkdown_2.31                  XVector_0.51.0                 
#> [69] matrixStats_1.5.0               bit_4.6.0                      
#> [71] otel_0.2.0                      hms_1.1.4                      
#> [73] ragg_1.5.2                      evaluate_1.0.5                 
#> [75] knitr_1.51                      GenomicRanges_1.63.2           
#> [77] IRanges_2.45.0                  rlang_1.2.0                    
#> [79] Rcpp_1.1.1-1                    glue_1.8.0                     
#> [81] tidytree_0.4.7                  BiocGenerics_0.57.0            
#> [83] vroom_1.7.1                     jsonlite_2.0.0                 
#> [85] R6_2.6.1                        MatrixGenerics_1.23.0          
#> [87] systemfonts_1.3.2               fs_2.0.1