SNP Discovery in C. briggsae (build 4)
A major aim of the C. briggsae genetic map consortium is to develop a reliable set of single nucleotide polymorphisms (SNPs) for the organism. The current
release contains 23,829 SNPs from the "HK104" mapping strain,
9,111 SNPs from the "VT847" strain,
5,164 SNPs from the "HK105" strain, and
322 SNPs from the "PB800" strain.
Some 23,130 HK104 SNPs are now positioned on
the "cb3" sequence assembly.
Current Release (build 4)
SNP discovery was performed on shotgun sequence traces of mapping strain HK104, as well as strains VT847, HK105, and PB800.
Like the previous release, build 4 used only the ssahaSNP program
(SSAHA2)
to call SNPs due to its robust and efficient performance. Unfortunately, Polyphred
(v5.04)
and Polybayes (v3.0)
were unable to run efficiently when the entire read set and reference genome sequences were provided as input.
The reference genome used for SNP discovery was obtained from Wormbase
(cb25/agp8) which is organized by ultra (fingerprint) contig.
The flanking sequences for build 4 SNPs are repeat-masked to lower case by RepeatMasker with a customized
C. briggsae repeat library.
It should be cautioned that nearby SNPs have NOT yet been marked in the flanking sequences for this build.
SNP Discovery Results By Strain (build 4)
|
|
HK104 |
|
VT847 |
|
HK105 |
|
PB800 |
Sequence traces examined:   |
13,632 |
|
14,976 |
|
2,112 |
|
384 |
Traced aligned by SSAHA:   |
7,530 |
|
9,213 |
|
1,680 |
|
123 |
Total aligned base pairs:   |
4,562,172 |
|
5,761,972 |
|
1,038,254 |
|
75,508 |
Apparent SNP density:   |
1/163 bp |
|
1/475 bp |
|
1/168 bp |
|
1/197 bp |
Unique SNP loci detected:   |
23,829 |
|
9,111 |
|
5,164 |
|
322 |
|
C. briggsae VT847 Strain SNPs
SNPs discovered from the VT847 shotgun traces are available for download with this release. To differentiate them from HK104 SNPs, this strain's SNP IDs are in the format
cbvXXXXX (note the v). For HK104, flanking sequences build 2
and ultracontig positions from the 42,730 build 3 SNPs were used to include as many previously-issued SNP ID's as possible.
New! C. briggsae HK105 and PB800 Strain SNPs
SNPs discovered from the HK105 and PB800 shotgun traces are available for download with this release. Their SNP IDs are in the format
cbhXXXXX (note the h) and cbpXXXXX (note the p), respectively.
Confidence Rank Field
Both sets of build 4 SNPs now include a column containing the confidence_rank value for each SNP. The best rank is 1, which indicates
a polymorphism with multiple observations and high-quality flanking sequence. A rank of 2 indicates good flanks but only one observation. SNPs that map to repetitive
regions and/or homopolymer repeats will have confidence ranks of 3 or higher.
Fixes from Previous Release (build 3)
The SNPs for each strain in build 4 are somewhat reduced compared to build 3 for two reasons.
First, while preparing build 3 SNPs for genotyping assay design, we realized that there were several instances of "duplicate" SNPs
with the same ID number. These arose because we used 25-bp flanking sequence "keys" (which should be unique within the genome) to cross-reference the discovered SNPs with previous builds.
These duplicates have been removed from build 4 until we find a solution to the problem.
The second reason for fewer SNPs is that we figured out (with some help from Jim Mullikin's group at Sanger) how to incorporate read quality scores into SSAHA-SNP's polymorphism calling.
Now, all SNPs produced by SSAHA-SNP had phred scores above the minimum quality threshold.
Integration with Genetic Map
The HK104 SNPs have been cross-referenced with the
C. briggsae Genetic Map (v3.1)
The chromosome and genetic distance(s) for the ultracontig are provided with each SNP. The remaining SNPs on ultracontigs that have not been genetically mapped
were labeled with chromosome "CbUn" and a zero value for genetic distance. In April 2007 we generated RFLP assays for
the HK104 snip-SNPs.
Integration with "cb3" Sequence Assembly
The HK104 SNPs have now been positioned on the "cb3" sequence assembly from Wormbase, which is by chromosome. Most SNPs (17,740) were positioned using
information in the assembly AGP files, but some (5,390) were mapped by BLAST alignment of their flanking sequence. There were
699 SNPs which we could not position on the cb3 assembly by either method.
C. briggsae SNP Downloads:
   
HK104 SNPs
   
VT847 SNPs
   
HK105 SNPs
   
PB800 SNPs
|