Stop by and view this poster at the meeting:
Maximizing the value of detected somatic changes across 12,000 TCGA tumor samples
The Cancer Genome Atlas (TCGA) is an amazing resource, with almost 12,000 tumors across more than thirty different cancer types. The TCGA project has already proven useful in large-scale studies1, 2, 3, 4 for which the primary goal for copy number analysis was to identify statistically significant recurrent copy number alternation events. In these cases the GISTIC5 algorithm was used to generate the results which is appropriate for population studies, but is not sensitive to the accuracy of calls made on each individual sample. Therefore, the publically available data, when viewed on a per sample bases, is highly over-segmented and has not been baseline corrected for the tumor ploidy. We believe this reduces the utility of this resource and have undertaken an effort to use a combination of analytical tools as well as human curation to generate a high quality database of TCGA copy number data that can be used for various types of down-stream analyses.
Using the raw (level 1) data, and using the matched “normal” samples to minimize the number of CNPs, we applied the SNP-FASST2 calling algorithm (a multi-state HMM algorithm designed to handle mosaic events), with systematic correction to correct for GC and fragment length biases. The settings were optimized per cancer type to generate accurate calls for events from samples with as low as 20% tumor burden. We applied automated ploidy and %tumor calling methods, such as ASCAT6, but found that in many cases the algorithm did not generate a solution or reported inaccurate solutions. Therefore, we opted to perform manual baseline adjustment for majority-aneuploid samples and found the resultant data were more consistent with the control sets in terms of median number of copy number events, median CNV length, and sample ploidy. The median number and variance of CNVs was substantially reduced to be in line with the control data sets. The number of samples with incorrect initial ploidy assignments ranged from 15% to more than 50%, depending on the tumor type. The resultant CNV database, along with associated clinical annotations and SNVs from exome sequencing, is now available through the Nexus DB repository to the general scientific community and should provide greater utility for further research.
- Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061-1068 (2008)
- Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615 (2011)
- Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519-525 (2012)
- Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumors. Nature 490, 61-70 (2012)
- Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007)
- Van Loo, P. et. al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910-16915 (2010)