Skip to contents

'parquet_colinfo' returns the column info associated with a parquet file made from a particular output file type.

Usage

parquet_colinfo(data_type)

Arguments

data_type

Single string: value found in the data_type' column of output_file_types() and also as part of the name of a file in the repo https://huggingface.co/datasets/waldronlab/metagenomics_mac or https://huggingface.co/datasets/waldronlab/metagenomics_mac_examples.

Value

Data frame with columns 'general_data_type', 'col_name', 'col_class', 'description', 'se_role', and 'position'

Examples

parquet_colinfo("viral_clusters")
#>    general_data_type                 col_name col_class
#> 1     viral_clusters          m_group_cluster character
#> 2     viral_clusters              genome_name character
#> 3     viral_clusters                   length       int
#> 4     viral_clusters      breadth_of_coverage     float
#> 5     viral_clusters   depth_of_coverage_mean     float
#> 6     viral_clusters depth_of_coverage_median     float
#> 7     viral_clusters         m_group_type_k_u character
#> 8     viral_clusters  first_genome_in_cluster character
#> 9     viral_clusters            other_genomes character
#> 10    viral_clusters                     uuid character
#> 11    viral_clusters               db_version character
#> 12    viral_clusters                  command character
#> 13    viral_clusters         metaphlan_header character
#> 14    viral_clusters         original_columns character
#>                                                                     description
#> 1                                     The marker gene group or cluster detected
#> 2                                 The database identifier of the matched genome
#> 3                          Length of the genome (in base pairs) in the database
#> 4                                 The proportion of the genome covered by reads
#> 5                                The average sequencing depth across the genome
#> 6                                 The median sequencing depth across the genome
#> 7        Whether the matched genome has known (kVSG) or unknown (uVSG) taxonomy
#> 8  The first genome in the matched cluster, if that cluster's taxonomy is known
#> 9              Other genomes in the same cluster that share the same marker set
#> 10                                                                  Sample UUID
#> 11                                     MetaPhlAn database version(s) referenced
#> 12                                                      MetaPhlAn command given
#> 13                                                MetaPhlAn's custom header row
#> 14                                              Original MetaPhlAn column names
#>    se_role        ref_file
#> 1    rdata            <NA>
#> 2    rname genome_name_ref
#> 3    rdata            <NA>
#> 4    assay            <NA>
#> 5    assay            <NA>
#> 6    assay            <NA>
#> 7    rdata            <NA>
#> 8    rdata            <NA>
#> 9    rdata            <NA>
#> 10   cname            <NA>
#> 11   cdata            <NA>
#> 12   cdata            <NA>
#> 13   cdata            <NA>
#> 14   cdata            <NA>