Input And Assemblies
There are two steps required when adding data to the software.
(1) Project Setup
Segminator II requires a project to be set up prior to data input. This involves entering a project name (using the "Add Project" menu option) and providing a template in FASTA format. Further help is available under in the "Parameters and Help" page.
(2) Adding Data
After setting up a project, datasets in FASTQ format can be added using the "Add Dataset" menu. (Note: When adding the dataset using the supsequent popup window please make sure that the correct project is selected in the dropdown list - otherwise the data will be mapped to the wrong project template). When a dataset is added the software will generate an assembly by first mapping and then pairwise aligning each read (Figure 3). All additional datasets added to the project will be mapped and pairwise aligned to the same template using the parameters specified under the "Parameters" menu.
Sample data, consisting of a partial env template and some HIV-1 reads, obtained using the 454 Life Sciences, is available here.
Further details of assembly generation, and the data structures used to store individual reads are discussed in our submitted paper which will be shortly added here as a link (if accepted!).
A variation to the standard method of mapping occurs if the user sets the "Replace template with con. during mapping" option under the "Parameters -> Miscellaneous" menu to true. In this case for each dataset added after the initial mapping has been performed a consensus across each site of the template will be taken. This consensus will replace the original user defined project generic one for that dataset and the dataset will be remapped. The benefits this approach are discussed in [PLoS Comput Biol. 2010 Dec 16;6(12):e1001022].
(1) Project Setup
Segminator II requires a project to be set up prior to data input. This involves entering a project name (using the "Add Project" menu option) and providing a template in FASTA format. Further help is available under in the "Parameters and Help" page.
(2) Adding Data
After setting up a project, datasets in FASTQ format can be added using the "Add Dataset" menu. (Note: When adding the dataset using the supsequent popup window please make sure that the correct project is selected in the dropdown list - otherwise the data will be mapped to the wrong project template). When a dataset is added the software will generate an assembly by first mapping and then pairwise aligning each read (Figure 3). All additional datasets added to the project will be mapped and pairwise aligned to the same template using the parameters specified under the "Parameters" menu.
Sample data, consisting of a partial env template and some HIV-1 reads, obtained using the 454 Life Sciences, is available here.
![]() |
Figure 3 - The assembly generation pipeline: The preprocessing of the template sequence prior to read mapping is outlined (center). The fragments titled "k-mers" are all the unique words within the template sequence. These are stored along with their corresponding locations. On the left hand side all k-mers of equal length, extracted from the read, are shown. The plot indicates the frequency of k-mer matches across the template sequence for a single read. Grey boxes indicate processing events that take place within the framework. The right hand side indicates the beginning of second iteration of this pipeline after a data specific template has been generated. The condition for this second iteration to begin is illustrated in the white circle.. |
Further details of assembly generation, and the data structures used to store individual reads are discussed in our submitted paper which will be shortly added here as a link (if accepted!).
A variation to the standard method of mapping occurs if the user sets the "Replace template with con. during mapping" option under the "Parameters -> Miscellaneous" menu to true. In this case for each dataset added after the initial mapping has been performed a consensus across each site of the template will be taken. This consensus will replace the original user defined project generic one for that dataset and the dataset will be remapped. The benefits this approach are discussed in [PLoS Comput Biol. 2010 Dec 16;6(12):e1001022].



