Paired-end and mate-pair sequencing libraries were prepared using sample preparation kits from Illumina (San Diego, CA). DNA was sheared to 200 base pairs (bp) for the paired-end libraries and to 3 kilobases (kb) for the mate-pair libraries using a Covaris S-series sample preparation selleck kinase inhibitor system. Each library was run on a single lane of an Illumina GA IIx sequencer, for 38 cycles per end, except for the Pav Ve013 and Pav Ve037 paired-end libraries, which were run for 82 cycles per end. Paired-end reads were assembled
using the CLC Genomics Workbench S3I-201 in vitro (Århus, Denmark), using the short-read de novo assembler for Pav BP631 and the long-read assembler for the other strains. The resultant contigs were scaffolded with the mate-pair data using SSPACE [37]. Scaffolds were ordered and oriented relative to the most closely related fully sequenced genome sequence (Pto DC3000 for PavBP631; Psy B728a for the other strains) using the contig mover tool in Mauve [20]. Automated gene prediction and annotation was carried out using the RAST annotation server [38]. These Whole buy JQ1 Genome Shotgun projects
have been deposited at DDBJ/EMBL/GenBank under the accession numbers AKBS00000000 (Pav BP631), AKCJ00000000 (Pav Ve013) and AKCK00000000 (Pav Ve037). The versions described in this paper are the first versions, AKBS01000000, AKCJ01000000 and AKCK01000000. Our methods have been shown to correctly assemble >95% of the coding sequences, including >98% of single-copy genes for the fully sequenced strain P. syringae pv. phaseolicola (Pph) 1448A [36]. The amino acid translations of the predicted ORFs from each strain were compared to each other and to those from 26 other publically available P. syringae genome sequences using BLAST [39] and were grouped into orthologous gene families using orthoMCL [40]. ROS1 Pav ORFs that were less than 300 bp in length and that did not have orthologs in
any other strain were excluded from further analyses. The DNA sequences of the remaining Pav-specific ORFs were compared to all other strains using BLASTn and those that matched over at least 50% of their length with an E-value < 10-20 were also excluded. The amino acid translations of the remaining Pav-specific genes were searched against GenBank using BLASTp to determine putative functions and the taxonomic identities of donor strains. Genomic scaffolds containing blocks of Pav-specific genes were compared to the genome sequences of the most closely related Pav reference strain and to the database strain with the most hits to ORFs in the cluster using BLASTn and similarities were visualized using the Artemis Comparison Tool [41].