An index of next-generation datasets is available, and the viewer is integrated into certain projects in the Cancer Genome Workbench.
If you see a message like "could not create the Java virtual machine", try running a lower-memory version of the program. If while running the program you receive an "out of memory" error message, try running the program with additional memory, or viewing narrower regions of the genome.
Maximum memory allocation Link 1 gb (recommended) launch program 512 mb launch program 768 mb launch program 1.5 gb launch program
Note that at the moment the program uses a self-signed certificate, so you will have to click through a security warning to run it. This is necessary to allow the program to open .bam files on your computer.
Below is a screenshot of the launch screen:
The launcher contains three tabs:
Data filesThis tab lets you specify the .bam files you want to view and how to handle the reference sequence.
- .bam files: you may either type in the full filenames of your files, or use the "Browse..." button to select them. Note that each .bam file must have a corresponding ".bai" index file present (these are generated with the samtools program's "index" command, e.g. "samtools index whatever.bam").
- allow multiple files: if checked, the .bam file Browse... button will append each file to the list rather than replacing it.
- reference sequence: specifies the reference genome files you want to use. Currently supported formats are FASTA (samtools .fai indexed, or a directory containing one sequence per file), UCSC .2bit and UCSC .nib. The .2bit file for hg18 is available here; please note this file is very large (800 mb, but you only have to download it once!). The "Browse..." button lets you select a local file on your system. If you select a .nib file, the program will expect that all .nib files for your reference genome are present in the same directory.
- use reference sequence: if checked, displays the reads in the assembly against the reference sequence you specify. If you uncheck this option you can view your data without a reference sequence -- the program will attempt to generate a simple consensus sequence from your data (any regions without sequence coverage will contain Ns). Note that without a reference sequence the program's program's ability to detect homozygous non-reference SNPs will be severely hampered.
Annotation databaseThis tab lets you configure the UCSC genome annotation database to use, if annotations are desired. This is configured to use the public UCSC annotation server for human build hg18 by default, but may be configured to use a different database or locally-installed mirror.
- enable database queries: if checked, enables UCSC annotation queries (default)
- clear database cache: query results are cached locally to improve performance and reduce the load on the database. Check this option to empty the local cache.
- database name, server, username, password: database connection and login credentials to the mySQL annotation database.
OptionsSpecifies miscellaneous options to start the program:
- load optical/PCR duplicates: whether to parse reads marked as "optical/PCR" duplicates (see SAM specification). These are hidden by default even if you choose to include them as they may represent monoclonal artifacts. Leaving this unchecked will skip loading them altogether, saving memory and improving the application's performance.
The following settings control the variant detection process:
Clicking the "Find SNPs/indels" button then proceeds with the analysis. The results may then be browsed using the spinner control directly to the right of the "SNPs" button in the toolbar. Iterating through the selections centers the display on each selected variant, highlighting it with a blue vertical line in the display. Other SNP sites also visible onscreen will be highlighted with a green line.
Label command-line switch Description Default value Minimum nucleotide quality -min-quality Minimum quality score for including a particular nucleotide in any SNP calculation. Applies to both reference and alternative alleles. 10 Minimum mapping quality -min-mapq Minimum read mapping quality required to use a read in variant detection (see MAPQ field in SAM specification, section 2.2.1). 1 Minimum coverage -min-coverage The minimum number of reads passing all quality filters required at a variant site. Only reads associated with the variant are counted -- i.e. for a putative G/T SNP, reads showing other calls are not included. 4 Minimum frequency of alternative allele -min-minor-frequency Minimum frequency (non-reference read count divided by reference read count) to consider any putative variant. A value of 0 disables this check. Note that this is in the pooled set of data; take for example a run on 2 BAM files, one normal and one tumor, containing perfectly-distributed genomic data (unlikely!). A variant which is homozygous for the reference allele in normal and heterozygous for the alternative allele in tumor would have a minor allele frequency of 0.25 (25%). 5% Minimum observations of alternative allele -min-alt-allele-count Required minimum number of reads supporting non-reference allele 3 Minimum unique read names supporting alternative allele -min-unique-alt-reads Required minimum number of unique read names supporting the variant allele. Set this to a number greater than one to ensure the variant is observed in more than a single mate pair. This can be a concern in low-coverage regions where both reads in a mate pair amplify the same region. 1 Minimum unique read mapping start positions supporting alternative allele -min-unique-alt-read-start Requires the set of reads supporting the variant allele to show a minimum number of unique read mapping alignment start positions. Setting this value higher helps avoid monoclonal effects by increasing supporting read mapping diversity, but requires higher read coverage. 2 Minimum observations of alternative allele to enable uniqueness filters -unique-filter-coverage Sets a minimum number of observations of alternative allele before enforcing two filters: (1) minimum number of reads with flanking sequence and (2) minimum unique start positions for alternative allele. Prevents these filters from discarding variants in low-coverage areas. 6 Minimum alternative reads with flanking sequence -min-alt-flanking-reads [count]
Requires a minimum number of reads showing the non-reference allele to have flanking sequence of a particular size. at least 1 sequence with 10 or more nt of flanking sequence Minimum quality of flanking sequence -min-flanking-quality [quality]
Require all reads to have flanking sequence of a certain minimum quality. Not enforced if the site is near the end of a read. If you are trying to detect low-frequency variants, or variants in regions with low coverage, a lower quality value may be appropriate. quality 15+ for 5 flanking nt Maximum allowable high-quality mismatches to reference sequence -mmf-max-hq-mismatches [count]
Sequence will be entirely disqualified for SNP-calling purposes if it has more than the specified number of mismatches of specified or better sequence quality. Note that mismatches corresponding to known dbSNP SNPs are not included in this count. 3 mismatches of quality 15+ Maximum allowable low-quality mismatches to reference sequence -mmf-max-lq-mismatches [count]
Sequence will be entirely disqualified for SNP-calling purposes if it has more than the specified number of low-quality mismatches. Note that mismatches corresponding to known dbSNP SNPs are not included in this count. 6 mismatches of quality 3+ Mismap filter: max ratio of suspicious mismatches to usable ones -mismap-frequency [value] When the mismatch filter rejects a read for having too many disagreements from the consensus, the positions of high-quality mismatches are recorded. When a putative SNP is evaluated, a ratio is generated of the count of these suspicious base calls to the count of usable reads showing the SNP allele. If the ratio is above the specified level, the SNP is rejected as likely being in a mismapped region. For example take a candidate SNP where the reference is a G and the alternative allele is a T. Evaluation of the reads found 6 cases where a T at this position was found in a read with an unacceptably high level of mismatches. 10 other reads were found in acceptable reads (which possibly didn't fully overlap the problematic region). The ratio would be 6/10 or 0.6; under the default settings the SNP would be rejected as a possible artifact of mismapped reads. 0.5 Ignore reads with non-primary alignments -skip-non-primary
Whether to skip reads with having "alignment is not primary" flag set (see SAM specification, section 2.2.2, flag 0x0100). don't use non-primary alignments read-end mismatch filter TBD Ignores high-quality mismatches clustered near the starts or ends of reads, which tend to cause false positive SNP calls. By default these ignore 2 or more mismatches within the first/last 6 nucleotides, and 3 or more mismatches within the first/last 10 nucleotides. enabled mismapped deletion filter TBD When a read contains a deletion near the start or end of the sequence, alignment software can struggle to map it properly because there aren't enough bases available on the other side of the deletion. This leads to short stretches of disagreement with the reference sequence which can lead to false positive SNP calls. This filter examines putative SNPs falling within called deletions, discarding reads disagreeing with the reference sequence where the disagreement occurs within 10 nt of the end of a read. enabled mate pair disagreement filter TBD If multiple reads for a mate pair are present at a SNP site, checks whether all their allele calls agree. If they don't, the reads are excluded from consideration in the SNP call calculation. enabled
Note that the interactive variant detector will only work within the region loaded in the viewer. To detect variants in an entire dataset (or target region), invoke the variant detector from the command line.