################################ Quick Start Unistrap and phylo3d ################################ ************ Introduction ************ This document introduces phylo3d and multistrap. Phylo3d makes it possible ton estimate a protein phylogenetic tree using the variation of intramolecular distances as an evolutionnary character. The multistrap strategy makes it possible to combine several branch boostrap support indexes into a single support value. It has been designed to allow the combination of Felsenstein type supports drawn on Minimum Evolution trees (ME), Maximul Likelihood trees (ML) and phylo3d protein structure based trees (IMB), but it can also allow the combination of any replicate support generated by the user. Tutorial data is available from `Github `_. Note that this directory also contains the data required for the initial validation of the procedure `Github `_ The principke is very straightorward: a reference tree is computed, and a set of N collections of replicates are generated and used to generate N collection of boostrap support values for the reference tree. When this is done, each branch of the reference tree has N boostrap support values. These values are eventually combined into a single value, using either the average (default), the geometric mean, the minimum or the maximum value. .. note:: Note: ML trees are generated using iqtree that must be installed, and ME trees are generated using fatsME (see below for the installation procedure) *************************** Installation from binaries ************************** :: ##: Install iqtree from: http://www.iqtree.org/ ##: Install fastME from: http://www.atgc-montpellier.fr/fastme/binaries.php ##: Get the latest stable version from http://www.tcoffee.org/Packages/Stable/Latest/ ##: Or the latest Beta Version from http://www.tcoffee.org/Packages/Beta/Latest/ ##: download the *.tar.gz file ##: tar -xvf T-COFFEE_distribution_Version_XXXXX.tar.gz ##: cd T-COFFFEE_distribution_Version_XXXXX ##: ./install all ##: add the instructions given at the bottom of the install output to your .profile or your .bashrc ******** Examples ******** Produce a phylo3D IMD tree with 100 boostrap replicates ============================================================================== In the following example, an ML tree will be generated, then 5 ML, 5 ME and 5 IMD boostrap replicates will be generated. By default the number of replicates is set to 100. In the list [ml me ml imd], the first argument indicates what will be the reference tree, ml in that case and the following arguments inticate what will be the replicates. .. note:: Data available in: `Github `_ .. note:: the number of replicate methods is not limited .. note:: note that since we want to use IMD, we need tructural information. The structures of each sequence are in the directory and are declared using the template_list file in which each sequence is associated with a PDB structure. PDB structures are expected to have been extracted and process so as to match the corresponding sequence. :: $$: t_coffee -other_pg seq_reformat -in PF03143_tmalign.fa -in2 PF03143_ref.template_list -treemode fastme -action +replicates 100 +phylo3d +treelist2bs -output newick This exemple can be adpated to output all the replicates (original tree is the first one followed by 100 replicates) :: $$: t_coffee -other_pg seq_reformat -in PF03143_tmalign.fa -in2 PF03143_ref.template_list -treemode fastme -action +replicates 100 +phylo3d +print_replicates -output newick Or output all the distance matrices :: $$: t_coffee -other_pg seq_reformat -in PF03143_tmalign.fa -in2 PF03143_ref.template_list -treemode fastme -action +replicates 100 +phylo3d +print_replicates -output dm Combine branch boostrap support values using multistrap ======================================================== In the following example, an ML tree will be generated, then 5 ML, 5 ME and 5 IMD boostrap replicates will be generated. By default the number of replicates is set to 100. In the list [ml me ml imd], the first argument indicates what will be the reference tree, ml in that case and the following arguments inticate what will be the replicates. .. note:: Data available in: `Github `_ .. note:: the number of replicate methods is not limited .. note:: note that since we want to use IMD, we need tructural information. The structures of each sequence are in the directory and are declared using the template_list file in which each sequence is associated with a PDB structure. PDB structures are expected to have been extracted and process so as to match the corresponding sequence. :: $$: t_coffee -other_pg seq_reformat -in PF03143_tmalign.fa -in2 PF03143_ref.template_list -action +replicates 5 +multistrap_mode avg +multistrap ml me ml imd In the following example we will use reftree.nwk to produce a combined boostrap using three sets of pre-computed replicates. In practice, this means the the original boostrap supports of reftree will be removed (if present), and replace by the combined boostrap supports gathered from the replicated trees provided in boostrap1, bootstrap2 and bootstrap3 .. note:: It is your responsability to ensure that the number of replicates are similar across the boostrap1, 2 and 3 .. note:: the number of replicates and the number of replicate methods are limited by your resources only :: $$: t_coffee -other_pg seq_reformat -action +multistrap_mode avg +multistrap reftree.nwk bootstrap1 bootstrap2 bootstrap3 reproduce the multistrap validation =================================== In the following example, an ML tree will be generated, then 5 ML, 5 ME and 5 IMD boostrap replicates will be generated. By default the number of replicates is set to 100. In the list [ml me ml imd], the first argument indicates what will be the reference tree, ml in that case and the following arguments inticate what will be the replicates. .. note:: Data available in: `Github `_ :: $$: ./process_list.pl 't_coffee -other_pg seq_reformat -action +phylo3d_bm $' list.txt > auc.txt;./auc2analyze.pl auc.txt >auc.analyze.txt;cat auc.txt | grep TABLE >auc.analyze.tsv