Whole-Genome CNV Analysis: A Brief Guide

Blog

Whole-Genome CNV Analysis: A Brief Guide

2022-11-10

Copy number variations (CNVs) are genomic alterations that result in abnormal copies of one or more genes. Structural genomic events such as duplications, deletions, translocations, and inversions can cause CNVs.

Like single-nucleotide polymorphisms (SNPs), particular CNVs have been associated with susceptibility to diseases such as cancer, inherited genetic disorders, autoimmune diseases, and others.

At Bionano Genomics, we equip clinical research labs with N_xClinical which we believe may be the most comprehensive and up-to-date cytogenetics and molecular genetics solution. It’s one system for analyzing and interpreting all genomic variants from microarray and next-generation sequencing (NGS) data.

This guide briefly introduces whole-genome CNV analysis, how it works, and how labs are taking advantage of it today.

A Brief Introduction to NGS-Based Copy Number Analysis

The development of NGS technology has dramatically improved our ability to detect all types of genomic variations, from single nucleotide variant (SNV) to CNV and other structural variations. Using NGS data for CNV analysis has gained huge attention in recent years thanks to new technologies and better algorithms that enable the simultaneous detection of CNVs and SNVs.

Since NGS technology is now the most common method for high throughput assessment of sequence variants (SeqVar) with wide acceptance, the ability to also obtain CNV and LOH status of a sample from NGS is very appealing as it would mean a single workflow and reduced cost.

NGS-based CNV analysis techniques also enable labs to map the precise location of a variant (depending on the detection approach).

How NGS-Based CNV Calling and Analysis Works

There are four main methods of detecting CNVs with NGS data:

Read-Pair (RP)
Split-Read (SR)
Read-Depth (RD)
Assembly (AS)

Each of these four methods specializes in detecting a specific form or size range of CNV, resulting in a trade-off in breakpoint accuracy. None of these methodologies is perfect; each brings advantages and disadvantages. To address this, many labs combine different methods, such as read-depths with read-pairs, or read-depths with split-reads, to achieve a more holistic analysis.

As Dr. Fen Guo, Clinical Laboratory Director at PerkinElmer Genomics notes, the utility of these methods often hinges on the quality of the NGS data available.

“There’s a general sense that some methods are better than others—for example, that the split-read method is superior for accurate breakpoint identification because of the nature of this methodology, while the read-depths can detect the dosages of CNVs and works better on a wide range of CNV sizes from small to large CNVs in the genome. But in addition to recognizing the inherent differences between these methods and what they’re capable of, so much depends on the quality of the data—the read depths, the coverage, and the data uniformity.”

Dr. Fen Guo, Ph.D., FACMG, FCCMG
Clinical Laboratory Director
PerkinElmer Genomics

To give a little more background and tease out some of these important nuances, we briefly summarize each NGS CNV calling method below.

Read-Pair

The read-pair methodology was the first to demonstrate the usefulness of NGS data for CNV detection.

It works by comparing the insert size between the actual sequences’ read-pairs with the expected size based on a reference genome. Labs using this method can identify CNVs by mapping the discordance between mapped paired reads whose distances significantly differ from the predetermined average insert size.

Split-Read

The split-read methodology uses reads from paired-end sequencing where only one pair has reliable mapping, and the other either entirely or partially fails to map to the genome.

The unmapped reads are a potential source of breakpoints at the single base-pair level. However, this method has limited ability in identifying large-scale sequence variants (1Mb or longer).

Read-Depth

The read-depth method is based on the hypothesis of a correlation between the depth of coverage of a genomic region and the copy number of the region.

This method can detect CNVs of various size (from whole chromosomes down to hundreds of bases). The resolution of this approach is primarily based on depth of coverage where smaller events can be detected at higher depth.

Assembly

In theory, all forms of genetic variation—including CNVs—can be detected by the assembly of short reads if the reads are sufficiently long and accurate.

This method was designed to better identify structural variation. However, it’s used less in CNV detection due to the overwhelming demand it can put on computational resources.

Watch our free webinar—Copy Number Variant Detection by NGS: Coverage, Uniformity & Resolution—to see Dr. Guo introduce the main methods utilized for calling CNVs using NGS data and share clinical cases that illustrate how the coverage and uniformity of NGS data contribute to the resolution of CNV calling.

Calling CNVs from Whole-Genome Sequencing Data

Whole-genome data has broad utility as it can detect SNVs, insertions/deletions, copy number changes, and both large and small structural variants. Thanks to recent technological innovations, the latest genome sequencers can perform whole-genome sequencing more efficiently than ever.

Unlike narrower approaches to detecting and characterizing CNVs from NGS data such as whole-exome sequencing or gene panels, which analyze a limited portion of the genome, whole-genome data delivers a comprehensive view of the entire genome and has a higher resolution compared to capture-based methods.

This makes it ideal for discovery applications, such as identifying causative variants and novel genome assembly.
It’s also useful for informing difficult diagnoses in particular clinical contexts as its uniform coverage enables labs to identify much smaller CNVs.

“Take the DMD gene, for example- the nature of the gene is small exons interspersed by large introns. Using traditional capture-based methodology to enrich the coding region only, you’ll likely lose the resolution you need to call tiny events, such as a single exon deletion or duplication which is an importable portion of the variants spectrum. Using genome sequencing or a specifically designed genome-level DMD assay, you can achieve uniform coverage across the gene. The uniform coverage not only facilitates the identification of smaller deletion and/or duplication but also helps to precisely identify the breakpoint which is critical for accurate copy number variant assessment.”

— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics

Compared to exome data, which only captures one to two percent of the genome and relies on capture-based or PCR-based enrichment, genome data comprises the entire genome—sequencing the coding regions and the non-coding regions. Recent research has suggested that many disease-causing variants may be found in the non-coding regions and are therefore missed by analyzing exomes alone.

Whole-genome data is unique in being PCR-free and non-biased. As a result, PCR-free sequencing methodologies used to call CNVs from whole-genome data provide more uniform coverage across both coding and non-coding regions of DNA. This uniform coverage can increase the likelihood of finding a disease-causing mutation.

Also, because of the uniform coverage, whole-genome data requires relatively lower coverage depths across the genome. Running the same CNV calls from exome data may, for example, require 100 times the coverage, while the same results could be achieved with only 40 times coverage with genome data.

Whole-genome sequencing is also widely regarded as the superior data modality for accurate breakpoint detection.

“In many cases, whole-genome data enables you to identify breakpoints even at the single nucleotide level because of the uniform coverage across the genome. In addition, whole-genome data also provides insight into some challenging regions such as those involved with trinucleotide repeat disorders.”

— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics

Watch our free webinar—Genome sequencing reveals cause of multi-generational split hand/split foot with long bone deficiency—to see how Dr. Raymond C. Caylor, Assistant Director, Molecular Diagnostic Laboratory at Greenwood Genetic Center, utilized genome sequencing and Bionano’s N_xClinical software, to provide a diagnosis for a multi-generational family with split hand/split foot with long bone deficiency.

Software for Detecting and Analyzing CNVs from WGS

High-quality detection of CNVs from NGS data has been a long-standing challenge for clinical research labs. Most “out-of-the-box” NGS analysis software tools can’t easily detect or visualize CNVs. Their capabilities are typically limited to certain variant types and sizes or focused on detecting SNVs.

Without robust and convenient CNV calling capabilities, labs are left with an incomplete picture of genomic aberrations and, therefore, can’t thoroughly investigate their patient samples and provide complete results.

Today’s software tools for detecting, analyzing, and interpreting CNVs from NGS data can be broadly divided into two categories: homegrown tools and commercial software.

Homegrown tools are typically bespoke systems developed from scratch and integrated with free online CNV tools.
Commercial software are purpose-built systems labs purchase and integrate into their workflow with CNV-calling capabilities.

Homegrown CNV tools, while sometimes advantageous from a cost perspective if the lab has very specific and unchanging CNV calling needs, bring several disadvantages that can exact high practical and efficiency costs on a lab.

For example:

Homegrown systems and CNV freeware typically apply very narrowly to a specific NGS data type and only that data type. Working with multiple NGS data types—panels, whole-exome, and whole-genome data, for example—means working across various tools that likely don’t integrate elegantly or at all. Adding more tools means compounding workflow inefficiencies that cost labs—and by extension patients—valuable time.
Building a homegrown CNV analysis tool almost certainly requires bioinformatics expertise. Teams can’t build a robust CNV calling tool without a team of bioinformatics specialists to establish, optimize, scrape, and train a database. The development effort here can be enormous before such a system is refined to the point where it’s ready for clinical use. Most labs simply don’t have an in-house bioinformatics team to build and continuously maintain and refine such a tool.
Homegrown CNV tools often depreciate quickly. NGS is not a static field. New capabilities give labs regular opportunities to advance the speed and quality of their genomic analysis. But without a development team that updates their tools, labs often invest significant time and resources in building bespoke tools that quickly fall behind the curve.

Commercial CNV software, on the other hand, enables teams to invest in efficiencies and capabilities that don’t always require in-house bioinformatics or development expertise. These tools tend to be far more user-friendly and keep pace with new developments in NGS capabilities. However, not all CNV software is equal in performance, capability, and ease of use.

As Dr. Guo explains, many of the commercial tools in use today treat CNV analysis as an add-on capability:

“From my experience using several software platforms, many commercial platforms that tout CNV analysis were built for SNV calling and interpretation. CNV calling was added on, but the primary interface is still designed for SNV analysis. Many labs needing to call CNVs need to interface with this data at the genomic level and get the whole picture—especially labs coming from the microarray world that want to use a familiar platform.”

— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics

Dr. Guo urges teams to be thoughtful when evaluating commercial tools against their particular needs—both today and tomorrow:

“You have to be very careful when thinking about the best commercial tool for the type of CNV calling you need to do. Think about the primary purpose you’ll be using it for. Are you only going to be using panels? Exome data only? Or do you think you’ll want software that analyzes all types of NGS data? Here at PerkinElmer Genomics, we use panels, exome, and genome data, which is why we use software [Bionano’s N_xClinical] that covers everything.

Secondly, most CNV software will give you deletions, duplications, and copy numbers. But not all of them call AOH, which is important for imprinting disorders and cancer.

Thirdly, you have to consider the differences in analytical performance between software. You don’t want a high false-positive rate or false-negative rate.

And lastly—and most importantly for me—if you or anyone on your team is a naturally visual person, you need to look at the data visualization and user interface. It needs to be user-friendly and not get in its own way. The copy number events across the genome should be easy to visualize and identify.”

— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics

So, to quickly recap the key considerations when evaluating commercial CNV calling software:

Commercial software is typically more robust, capable, and user-friendly than freeware and homegrown tools.
But in comparing one commercial tool to another, it’s critical to evaluate your needs against its capabilities.
Not all commercial software can analyze and interpret data across multiple NGS data types.
Not all commercial software that calls CNVs also calls AOH, which is critical in certain contexts.
False-positive and false-negative rates can vary between tools.
User interfaces also vary between tools.

N_xClinical for CNV Detection and Analysis from NGS Data

Here at Bionano Genomics, we equip labs with the single-source software solution they need to overcome these challenges with a single software solution.

We believe N_xClinical may be the most comprehensive and up-to-date solution for cytogenetics and molecular genetics in one system for analyzing and interpreting all genomic variants, including CNVs, from microarray and NGS data.

N_xClinical software is platform-independent. It accepts various data types that enable clinical research laboratories to process CNVs, SNVs, AOH/LOH (and soon structural variants (SVs) from OGM)—all in a single place.
These aberrations visualized in one software provide a complete picture of a sample’s genome, enabling labs to work significantly more efficiently and confidently.
In short, N_xClinical brings genuine CNV clarity and resolution to an otherwise difficult data type.

We’ve perfected two algorithms for the detection of CNV and AOH from almost all NGS assays.

Both are available with N_xClinical, the genomics software solution that enables labs to detect CNVs and AOH regions, and visualize SNVs in context, across all microarray and NGS platforms simultaneously—all from a single screen.

One algorithm, the “Self-reference” algorithm, can be used for all WGS data regardless of sequencing depth.
The second algorithm is the “Multi-Scale Reference” (MSR) algorithm that is also applicable to all NGS data. The MSR algorithm is able to create “virtual” bins with sizes proportional to the expected number of reads offering high-resolution detection of events in areas of interest (e.g. exons) while also providing a helpful genome-wide backbone.

Calling CNVs from Whole-Genome Sequencing Data with N_xClinical

With higher-depth NGS, smaller CNVs can be detected and integrated with sequence variants to provide a holistic view of the sample.

In “Figure 3” below, the ideogram shows regions of copy number gain (blue bars), loss (red bars), AOH (yellow shading), Allelic Imbalance (purple shading), as well as various types of Sequence Variants (e.g., SNV, In/Del, etc.) as colored “lollipops”.

WGS - Figure 1 & 2

As described in Chaubey et al., Journal of Molecular Diagnostics, vol. 22, No. 6 June 2020, researchers used 10x WGS and validated that the N_xClinical algorithm detected all CNVs and AOH that were found by high-resolution SNP arrays.

“Figure 2” above shows a small exonic deletion detected using 10x WGS with the MSR algorithm.

Free tutorial for N_xClincial users

Copy number analysis by NGS: Urban legend or true reality?

Are you an active N_xClinical user considering an update? In this 25-minute webinar, Soheil Shams, Founder & CEO of BioDiscovery, a Bionano Genomics company, uses multiple example oncology cases to demonstrate the most effective workflow and case review benefits of the Knowledgebase in N_xClinical 6.0.

Watch the tutorial →

Want to learn more about N_xClinical?

Book a free personalized demo to assess fit and see N_xClinical in action. Let us know you’re interested and we’ll connect on an initial consultation to answer questions and dive a little deeper before demonstrating N_xClinical—either with example data or your own.

Request a demo →

*N_xClinical software is for research use only. It is designed to assist clinicians and it is not intended as a primary diagnostic tool. It is each lab’s responsibility to use the software in accordance with internal policies as well as in compliance with applicable regulations.

Optical Genome Mapping

Software

Nucleic Acid Purification

Data Services

Focus Areas

Applications

Learn

Library

Blog