We refer to these sequences as probable unique sequences, because there are nearly no identical sequences found in other organisms (Figure 1). Figure 1 Pictorial representation of the bioinformatics strategy employed to churn out the unique genic regions from Las genome. The input and output of each step are shown in oval or square boxes. Actions taken are noted to the left side of the arrow mark, while the information used is indicated to the right side of the arrow. We performed the sequence similarity searches first by using stringent E-value of ≤ 1 × 10-3 against nt database (Figure 1). This search resulted in ~200 sequences that are unique to Las. This set of sequences is relatively high to validate experimentally;
therefore, to further reduce the number #I-BET-762 mouse randurls[1|1|,|CHEM1|]# of unique sequences, we performed the second sequence similarity search with a relaxed E-value of ≤ 1. This search resulted in 38 unique sequences. The E-value of ≤ 1 excludes the sequences with even little similarity to other organisms. Therefore, the resulting 38 unique sequences are
considered unique to Las and constitute the promising candidates for qRT-PCR based detection (Figure 1). We further searched the 38 unique sequences of Las against the phylogenetically closely related Lso, Lam, and Lcr. Because these OSI-027 price organisms are closely related, we used the stringent E-value threshold of ≤ 1 × 10-3 for this similarity search. In order to achieve this E-value, the sequences need to be highly similar between the Las,
Lso, Lam, and Lcr. Therefore, this close species filter procedure potentially eliminates all the Las sequence targets that could lead to false positive results in qRT-PCR based molecular diagnostic assays. Consequently, we further Wilson disease protein eliminated four conserved sequences from the list of 38 unique sequences, resulting in a total of 34 potential sequence signatures. We could not apply this close species filter step against Laf genome as its genome is yet to be sequenced. Five (~15%) of the 34 unique gene sequences namely CLIBASIA_05545, CLIBASIA_05555, CLIBASIA_05560, CLIBASIA_05575 and CLIBASIA_05605 are in the prophage region of the Las genome. All these five unique sequences are located upstream of the genomic locus CLIBASIA_05610 encoding a phage terminase. There are possibly 30 genes that represent the complete prophage genome within the Las genome [25, 44], of which 16 open reading frames (ORFs) are upstream of the phage terminase, while the remaining 13 ORFs are downstream. The prophage genes CLIBASIA_05610 (primer pair 766 F and 766R) and CLIBASIA_05538 (primer pair LJ900F and LJ900R) have been targeted in previous studies by both conventional as well as qRT-PCR based assays [25, 44]. We further analyzed the genomic orientation of the 34 unique genes. This analysis revealed that 15 (~44%) of them are oriented on the sense strand, while the remaining 19 (~56%) were present on the anti-sense strand (Additional file 3: Figure S1).