Buyers Speak

Technology trends in DNA sequencing

August 2020

Next-generation sequencing (NGS) has emerged as a powerful technology to unlock the genome underpinnings. The power of high-throughput DNA sequencing technologies is being harnessed by researchers to address an increasingly diverse range of biological and anthropological problems, covering the broad areas including analysis of genome variations, expression dynamics, epigenetic landscape, evolutionary trends, and interaction of proteins with nucleic acids. Further, in the applied field, NGS has the potential to accelerate the early detection of pathogens, genetic disorders, and the identification of pharmacogenetics markers to customize respective treatments.

Understanding the value of DNA sequencing since the dawn of Sanger sequencing as the first-generation method, technocrats introduced advanced technologies of DNA sequencing–second- and third-generation NGS technologies. The advancements in NGS technologies have not only reduced the cost of sequencing, but have also produced massive sequencing data in a short time. Toward the end of 20th century, and in the early 21st century, lots of second-generation sequencing platforms, such as Roche (454, pyrosequencing), Illumina (Hiseq), and Thermo Scientific (Ion Torrent) were developed and improved. Currently, HiSeq X10 has become the undisputed powerhouse (flagship) machine for Illumina platform, churning out 1.6–1.8 Tb per run. In principle, Illumina platform uses bridge amplification and sequencing by synthesis (SBS) approaches. SBS technology is a widely adopted NGS technology, responsible for generating more than 90 percent of the world sequencing data.

Challenges in NGS technologies
Long repetitive regions present in the genomes pose difficulty in complete sequencing. The short reads, produced by different NGS methods in conjunction with computational efforts aimed to overcome this challenge, often result in inaccurate assemblies. The repetitive sequences also cause problem during de-novo genome sequencing endeavors. The signal quality deteriorates as the read-length grows. In order to preserve read quality, long DNA molecules must be broken up into small segments. Thus, the second-generation sequencers provide a trade-off between quality and quantity (read length). So, in an effort to overcome the inherent challenges in the field of genomics, which could not be addressed by Illumina, the so-called third-generation Pacific Bioscience (PacBio), with long-read sequencing technology, has been introduced. This PacBio utilizes single-molecule real-time (SMRT) sequencing technology. It harnesses the natural process of DNA replication, and enables real-time observation of DNA synthesis. SMRT sequencing is built upon two key innovations–zero-mode waveguides (ZMWs) and phospholinked nucleotides. SMRT sequencing is ideal for a variety of research applications that offers many benefits, including longest average read lengths, highest consensus accuracy, and uniform coverage. Recently, considering the problem of big machine size and sequencing cost, Nanopore (Oxford) has come up with a cost-effective and small-size instrument, which is about the size of a flash drive. However, the sensitivity of the pore-forming proteins in these systems to local environmental stress has a large impact on the longevity of the units.

In summary, currently both Illumina HiSeq and PacBio are most preferable platforms for DNA sequencing, but they do have their individual pros and cons that need to be addressed in future upgrades. Overall, to minimize the trade-off between quality and quantity, Illumina HiSeq platform is preferred as there is a higher error-rate among PacBio systems. Also, PacBio platform is much expensive to purchase and operate as compared to the low-cost and highly accurate Illumina platform. Thus, it is advisable to purchase Illumina HiSeq X10 for large-scale commercial purpose or HiSeq 2500/2000 for research organizations. If one has the option of buying two machines without any budget constraints, it should be Illumina 2500 (short reads, up to 150bp) and PacBio Sequel II (for long reads, up to 10kb).

Recent developments in NGS methods and utility
Although genomics is a relatively naïve field, but over time and across the globe researchers are exploring its diverse and insightful utility. Indeed, the advancement in sequencing technologies has allowed us witness a surge in -omics terminology and their application-based studies including transcriptomics, haplomics, epigenomics, metagenomics, nutrigenomics etc. Undoubtedly, in the past one decade different methods of bulk sequencing of particular cells or tissues have immensely uncovered the underlying morphology and physiology. However, increasing evidence suggests that at times genome integrity and gene expression are heterogeneous, even in similar cell types, and this stochastic behavior reflects cell type composition and can also trigger cell fate decisions.

Thus, in order to have more resolution or understanding this cell-to-cell variability, the method of single cell sequencing has emerged. Even though this method was initially showcased in 2009, it gained real pace only in the last 5 years, spanning wide range from human to plants. This explosive growth of single cell sequencing studies is mainly attributed to simultaneous improvement in low cost sequencing and cell isolation methods. Noteworthy, this is equally supported with evolving computational algorithms for meaningful patterns and insights. As the sequencing-based research expands, now we have several derivative approaches such as single cell expression (spatial gene regulation), single cell ATAC (assessing chromatin state dynamics), single cell CNVs (copy number variants detection) etc. Moreover, single-cell multiple sequencing technique (scCOOL-seq) that allows simultaneous analysis of single-cell chromatin state/nuclear niche localization, copy number variations, ploidy and DNA methylation, which can showcase an altogether integrated view of the cell state. Currently, it is apparent that medicinal research is rapidly harnessing this single cell-based exploration, whereas plant biology is lagging little bit due to technical challenges in isolation of specific cells. However, efficient Microfluidic technology has provided some assistance. These methods of single-cell isolation has gained popularity due to its low sample consumption and low analysis cost together with the fact that it enables precise fluid control for better resolution.

Overall, we believe that the ultimate sequencing platform would work on single DNA or RNA molecules without any (pre-) amplification and without use of optical steps. It would provide reads of Mb to Gb in length with high read accuracy and no GC bias. It should be flexible enough to generate as many sequence reads as desired necessary for the specific research question at hand. In addition, it should be both cheap to acquire and run, easy to operate, have short run time and should involve simple or no library pre-preparation steps.

Key market trends impacting future growth
It is not misleading to say that future world supremacy would belong to the countries having big data assets. As per one recent report, NGS market is forecasted grow at 19.6 percent for 2019 to 2026. Primarily, continuous technological advancement in NGS platforms is driving the market growth. Further, reduction in cost is another factor helping in market growth. Asia-Pacific region leads as the fastest-growing market for NGS service, whereas North America is at the top in accumulating NGS data, followed by European nations. Globally, there has been commencement of several big genome projects that will churn out data of immense potential. The most ambitious one being the internationally collaborated Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth’s eukaryotic biodiversity over a period of 10 years. Scientists realized that out of ~2.3 million species that are actually known, only fewer than 15,000, mostly microbes, have been completely or partially sequenced. Thus, understanding the overall life relationship across Mother Earth would not be feasible without accommodating all of them.

The idea itself emerged in the year 2015 but the strategic blueprint of sequencing at species-level not genus was formulated at a global meeting in 2017. No surprise that Asia’s top scientific countries like India and China are also part of this global initiative. Additionally, these two nations have also initiated several other big indigenous NGS projects to explore local biome. One such project in China is aimed to collect DNA samples of 10 percent (70 million) of its male population. The idea is to catalogue the polymorphism associated with Short Tandem Repeats on the Y-chromosome (Y-STRs) to be useful for forensic studies. With acquisition of huge sequencing data (~200 petabytes in case of EBP) at unprecedented rate, the shortage of storage would not seem much problematic as commercial partners such as Amazon Web Service, Google and other’s are also backing the efforts. However, shortage of skilled personnel for analysis of data along with legal and ethical issues of data ownership may hinder its value-based utilization.

In the field of disease-associated genetic screening, the NGS is highly useful in identifying the monogenic diseases with locus heterogeneity, such as hereditary cancers, mitochondrial diseases, etc. Going by the example of the recent global spread of SARS-CoV-2 virus that led to COVID-19 pandemic, the segment of pathogen diagnostics; understanding susceptibility to and resistivity against the infection; and genetic screening application is also expected to grow very fast in coming years. However, with such advent in disease genomics, in order to provide a customized treatment, the promising concept of genome-based precision medicine will see it future course of realization.

Medical Buyer

Technology trends in DNA sequencing