Resources for Understanding and Utilizing High Throughput Genomic Data

The major advancements in sequencing technologies obtained over the past couple of decades have resulted in the collection of enormous amounts of sequencing and other data from a variety of biological sources. With these new datasets, a large variety of research questions can now be addressed which previously were unattainable. However, many of these new research directions require a deep understanding the new technologies and their applications, the implementation of novel adaptations to existing tools, or the development of entirely new tools. As we explore new research space and develop new tools and resources for our own program of research, we share them with the broader research community with the goal that others may benefit from their usage in their own research projects.

Torch-eCpG

Torch-eCpG is a fast, reliable, and scalable computational tool to perform expression quantitative methylation (eQTM) mapping to identify expression-associated CpG loci (eCpGs). A manuscript describing the tool, "Torch-eCpG: A fast and scalable eQTM mapper for thousands of molecular phenotypes with graphical processing units", is published with BMC Bioinformatics. The software and usage instructions are available online at: https://github.com/kordk/torch-ecpg

shinyGAStool

ShinyGAStool is an open source software program developed with Thomas Hoffman that enables the user to perform a candidate gene association analysis from large datasets in an easy to use interface. With a four-step workflow, shinyGAStool successfully allows the user to access genome-wide datasets, incorporate metadata (e.g., phenotypic data), select genes and SNPs to evaluate, and identify co-variates, and perform the regression analysis. ShinyGAStool is implemented as a shiny application in the R programming language. Our manuscript describes the tool: ShinyGAStool: A user-friendly tool for candidate gene association studies. The software and usage instructions are publicly available online at: https://github.com/kordk/shinyGAStool

stoch_epi_lib

Understanding the role of epigenetic changes in gene expression is a fundamental question of molecular biology. Predictions of gene expression values from epigenetic data have tremendous research and clinical potential. Mentored by Eric Mjolsness and Timothy Downing at UC Irvine, Jim Brunner (Mayo Clinic), Jacob Kim (Columbia University) and I developed stoch_epi_lib, a novel stochastic dynamical systems model that predicts gene expression levels from methylation data of genes in a given gene regulator network. A preprint of the article is available at bioRxiv Systems Biology and at arXiv Molecular Networks. The software and usage instructions are publicly available online at: https://github.com/kordk/stoch_epi_lib

SARS-CoV-2 Genome Browser

In support of the research community’s response to the COVID-19 pandemic, from March through June 2020 Kiley Charbonneau and Maureen Lewis from the lab performed literature reviews and provided annotations of human and viral genome mappable data for the SARS-CoV-2 Genome Browser for the Crowd-Sourced Data track. Our efforts were acknowledged in the UCSC Genome Browser News May 4th data release for SARS-CoV-2 genome browser (May 4, 2020) and the associated manuscript. It was a small but potentially impactful contribution by non-clinicians in the UCSF School of Nursing's research community to support the herculean efforts of our clinician colleagues during the core of the pandemic.

Multiple Alignment of Reference and Short readS (MARSS)

Multiple Alignment of Reference and Short readS (MARSS) is a software tool developed with Samantha Danison and Grant Pogson to generate Multiple Species Consensus Alignments and quality control statistics for comparative genomics analyses of regions of a reference genome from aligned reads. It also implements a test to identify and score potential paralogs in whole genome or transcriptome sequencing comparative genomics studies. MARSS was developed with high throughput in mind and can be implemented in high performance computing environment for comparative genome analyses using next-generation short read technology. https://github.com/kordk/marss

Methodological Approaches

C Harris, C Miaskowski, A Dhruva, J Cataldo, K Kober. Multi-Staged Data-Integrated Multi-Omics Analysis for Symptom Science Research. Biological Research for Nursing. 2021. PMID 33827270.

KP Singh, C Miaskowski, A Dhruva, E Flowers, KM Kober. Mechanisms and Measurement of Changes in Gene Expression. Biological Research for Nursing. 2018. PMC6346310