For KEGG analysis, 181 sequences were classified into immune system, and they were involved in 14 immune-response pathways. Overall, functional analysis of our 454 database identified candidate genes potentially involved in growth, reproduction, stress and immunity. Further experiments are needed to validate the functions and expression patterns thereby of these candidate genes, and investigate their potential roles in the gonad development and reproduction. SSR and SNP discovery As an important aquacultural shellfish in China, the application of marker-assisted selection (MAS) or genome-wide marker-assisted selection (G-MAS) in the P. yessoensis breeding program is expected to be a fertile research area. However, few genetic markers are currently available for this species [4], [5].
The transcriptome data obtained by 454 sequencing provided an excellent source for mining and development of gene-associated markers. [13], [32], [38], [39]. In total, 2,748 SSRs were identified from the assembled sequences (Table 3). Of 2,494 SSR-containing sequences, 420 (16.8%), had been annotated, and can be considered as priority candidates for maker development. The most frequent repeat motifs were trinucleotides, which accounted for 39.4% of all SSRs, followed by dinucleotides (21.1%), tetranucleotides (15.5%), pentanucleotides (14.6%), and hexanucleotides (9.4%). Based on the distribution of SSR motifs, AT motifs represented the most abundant dinucleotide motifs. These motifs corresponded to approximately 55.5% of the dinucleotide motifs. Among trinucleotide repeats, ATC (33.
8%) was the most common motif, followed by AAC (17.9%), AGG (14.0%) and AAT (11.4%). The most abundant tetranucleotide motif was AAAC (22.8%), while AAAAT (15.0%) and AGCAGG (14.8%) were the most abundant repeat motifs for pentanucleotides and hexanucleotides, respectively. Table 3 Summary of simple sequence repeat (SSR) types in the P. yessoensis transcriptome. Potential SNPs were detected using the QualitySNP program. We identified 34,841 high-quality SNPs and 14,358 indels from 10,107 contigs (Fig. 3). The predicted SNPs included 20,958 transitions, 12,804 transversions. The overall frequency of all types of SNPs in the transcriptome, including indels, was one per 156 bp. Of the predicted SNPs, 40,063 (81.
9%) were identified from contigs covered by ten or more reads , suggesting majority of SNPs identified in this study were covered at sufficient sequencing depth and more likely represent ��true�� SNPs. Among the SNPs, 31,696 (64.4%) were identified from contigs with annotation information. These SNPs would also Drug_discovery be priority candidates for maker development and should be very useful for further genetic or genomic studies on this species. Figure 3 Classification of single nucleotide polymorphisms (SNPs) identified in the P. yessoensis transcriptome. In conclusion, we first performed de novo transcriptome sequencing for the Yesso scallop P.