Presentation Title

The Effect of Biological Properties and Sequencer Performance on De novo Assembly of Microbial Genomes

Format of Presentation

Poster to be presented Friday March 31, 2017

Abstract

Current genome sequencing technologies rely on first physically or chemically fragmenting an organism’s genome into small, machine-readable fragments prior to shotgun, or bulk, sequencing. The sequenced fragments must then be assembled into the correct order, a computationally expensive task that requires bioinformatics assembly software driven by mathematical algorithms. Different algorithms have been developed for this task, but generally fragments are joined to one other by matching overlapping reads. Groups of joined fragments are called contigs, and if insufficient data are available, contigs cannot be joined to reveal the true nature of each chromosome and plasmid in the original genome. When resequencing a genome, reference data can be provided to order sequencing reads, but for de novo sequencing of novel organisms, one relies heavily on assembly software models of a genome. The overreaching goal of this project is to determine how biological properties of genomes and sequencing performance affects genome assembly. Since the summer of 2016, I have sequenced a total of 15 bacterial genomes and one yeast genome. All draft genomes were prepared using SPAdes, a graph-based assembler, and statistical analyses were carried out to determine the effect of genome GC content, the presence of plasmids, predicted genome size, mean sequencing read length and the amount of sequencing data available on both the number of contigs in draft assemblies and the size of the contig at which half of the genome was captured. By understanding what factors affect the accuracy of genome assembly, we can use sequencing data with confidence.

Department

Biological Sciences

Faculty Advisor

Nancy Flood and Jonathan Van Hamme

This document is currently not available here.

Share

COinS
 

The Effect of Biological Properties and Sequencer Performance on De novo Assembly of Microbial Genomes

Current genome sequencing technologies rely on first physically or chemically fragmenting an organism’s genome into small, machine-readable fragments prior to shotgun, or bulk, sequencing. The sequenced fragments must then be assembled into the correct order, a computationally expensive task that requires bioinformatics assembly software driven by mathematical algorithms. Different algorithms have been developed for this task, but generally fragments are joined to one other by matching overlapping reads. Groups of joined fragments are called contigs, and if insufficient data are available, contigs cannot be joined to reveal the true nature of each chromosome and plasmid in the original genome. When resequencing a genome, reference data can be provided to order sequencing reads, but for de novo sequencing of novel organisms, one relies heavily on assembly software models of a genome. The overreaching goal of this project is to determine how biological properties of genomes and sequencing performance affects genome assembly. Since the summer of 2016, I have sequenced a total of 15 bacterial genomes and one yeast genome. All draft genomes were prepared using SPAdes, a graph-based assembler, and statistical analyses were carried out to determine the effect of genome GC content, the presence of plasmids, predicted genome size, mean sequencing read length and the amount of sequencing data available on both the number of contigs in draft assemblies and the size of the contig at which half of the genome was captured. By understanding what factors affect the accuracy of genome assembly, we can use sequencing data with confidence.