T-Coffee Web Server

Warning

This chapter is currently under maintenance. Its aim is to describe what the T-Coffee web server is doing programmatically when you are using it; it provides technical details about the restrictions and commands in use.

In this chapter we describe briefly the available T-Coffee web servers and their usage (command lines and examples files); if you need more advanced options, you should use the T-Coffee package via command lines. The web server is a reduced version of the T-Coffee package, containing all T-Coffee modes for aligning protein, RNA or DNA sequences; it also contains the evaluation and downstream analysis tools (TCS, iRMSD/APDB, STRIKE, T-RMSD). Currently, all reformatting utilities are not available on the web server, however, you can choose some reformatting options related to the ouput format. Here we present briefly the webserver and the T-Coffee commands it contains.

General

Most of the following command lines used by the web server contains lots of different options called with flags; here is a summary of the options common to all comand lines:

  • -in: your input file
  • -output: specifies the output files you require
  • -maxnseq: maximum number of sequences you can run (variable depending on the mode)
  • -maxlength: maximum length of your sequences (variable depending on the mode)
  • -case=upper: all residues/nucleotids will be upper case
  • -seqnos=off: the size of the sequences is not indicated on the MSA
  • -outorder=input: orders the sequences in the final MSA as in the input dataset
  • -run_name: name of the job on the cluster
  • -multi_core=4: uses only 4 cores on your server when running the job
  • -quiet=stdout: standard verbose output

T-Coffee Simple MSA

The central part of the web server is the T-Coffee aligner, you can use it to align any kind of sequences (Protein, RNA or DNA alike). The command it runs is the following:

$#: t_coffee -in=data_93c5fbb0.in -mode=regular -output=score_html clustalw_aln fasta_aln \
    score_ascii phylip -maxnseq=150 -maxlen=10000 -case=upper -seqnos=off -outorder=input \
    -run_name=result -multi_core=4 -quiet=stdout

Protein Sequences

Expresso (using 3D structures)

$#: t_coffee -in=data_93c5fbb0.in -mode=expresso -blast=LOCAL -pdb_db=/db/pdb/derived_data_\
    format/blast/2016-01-01/pdb_seqres.fa -evaluate_mode=t_coffee_slow -output=score_html \
    clustalw_aln fasta_aln score_ascii phylip -maxnseq=150 -maxlen=2500 -case=upper -seqnos= \
    off -outorder=input -run_name=result -multi_core=4 -quiet=stdout

M-Coffee (combining multiple methods)

$#: t_coffee -in=data_93c5fbb0.in  Mpcma_msa Mmafft_msa Mclustalw_msa Mdialigntx_msa Mpoa_msa \
    Mmuscle_msa Mprobcons_msa Mt_coffee_msa -output=score_html clustalw_aln fasta_aln \
    score_ascii phylip -tree -maxnseq=150 -maxlen=2500 -case=upper -seqnos=off -outorder=input \
    -run_name=result -multi_core=4 -quiet=stdout

PSI/TM-Coffee (transmembrane proteins)

Two options are available (in addition to the choice of the database): without transmembrane prediction or with prediction (displayed color coded on the html file).

$#: tmcoffee.sh -in data_9df741d4.in -mode psicoffee -blast_server LOCAL --search-db 'UniRef50 \
    -- Very Fast/Rough' --search-type '' -prot_min_sim 50 -prot_max_sim 90 -prot_min_cov 70 \
    --search-out 'clustalw_aln fasta_aln score_ascii phylip score_html' -maxnseq 1000 -maxlen \
    =5000 -case upper -seqnos=off -outorder input -run_name result -multi_core 4 -quiet=stdout

$#: tmcoffee.sh -in data_9df741d4.in -mode psicoffee -blast_server LOCAL --search-db 'UniRef50 \
    -- Very Fast/Rough' --search-type 'transmembrane' -prot_min_sim 50 -prot_max_sim 90 \
    -prot_min_cov 70 --search-out 'clustalw_aln fasta_aln score_ascii phylip score_html'
    -maxnseq 1000 -maxlen 5000 -case upper -seqnos off -outorder input -run_name result º
    -multi_core 4 -quiet=stdout

PSI-Coffee (homology extension)

$#: t_coffee -in=data_93c5fbb0.in -mode=psicoffee -blast=LOCAL -protein_db=/db/ncbi/201511/ \
    blast/db/nr.fa -output=score_html clustalw_aln fasta_aln score_ascii phylip -maxnseq=150 \
    -maxlen=2500 -case=upper -seqnos=off -outorder=input -run_name=result -multi_core=4 \
    -quiet=stdout

RNA Sequences

R-Coffee (using 2D prediction)

$#: t_coffee -in=data_29091222.in -method=mafft_msa muscle_msa probconsRNA_msa -output= \
    score_html clustalw_aln fasta_aln score_ascii phylip -maxnseq=150 -maxlen=2500 -case=upper \
    -seqnos=off -outorder=input -run_name=result -tree -special_mode=rcoffee -method_limits= \
    consan_pair 5 150 -multi_core=4 -quiet=stdout

SARA-Coffee (using 3D structures)

SARA-Coffee is a bit complicated to run, it uses several third party packages and is run through a script and environment variables we set up; here is what it looks like:

##: export X3DNA=/data/www-cn/sara_coffee_package/X3DNA;
##: export PDB_DIR=/data/www-cn/sara_coffee_package/PDBdir/;
##: export NO_REMOTE_PDB_DIR=1;
##: unset MAFFT_BINARIES;
##: cd $CACHE_4_TCOFFEE
##: ln -s /data/www-cn/sara_coffee_package/pdb_entry_type.txt);
$#: t_coffee -in data_3e6e7aec.in -method sara_pair -template_file \
    /data/www-cn/sara_coffee_package/TEMPLATEFILE,RNA -extend_mode rna2 -relax_lib 0 -transform \
    dna2rna -run_name=result -output score_html clustalw_aln -case=upper -seqnos=off -outorder= \
    input -multi_core=4 -pdb_min_sim 0 -quiet stdout

RM-Coffee (combining multiple methods)

Not yet available…

DNA Sequences

M-Coffee (combining multiple methods)

For now, M-Coffee by default is the same for DNA, RNA and protein sequences alike. There is no specific M-Coffee for DNA sequences.

Pro-Coffee (homologous promoter regions)

$#: t_coffee -in=data_476efe5f.in -mode=procoffee -output=score_html clustalw_aln fasta_aln \
    score_ascii phylip -maxnseq=150 -maxlen=10000 -case=upper -seqnos=off -outorder=input \
    -run_name=result -multi_core=4 -quiet=stdout

Evaluation Tools

TCS (Transitive Consistency Score)

##: tcs.sh -infile data_a98d61a6.in -in Mproba_pair -score 1 -output clustalw_aln fasta_aln \
    phylip score_ascii tcs_weighted tcs_replicate score_html -maxnseq 1000 -maxlen 8000 \
    -seqnos=off -run_name result -multi_core 4 --filter-type column --filter-min 4 --filter-max \
    9 --filter-gap yes -quiet=stdout

iRMSD/APDB (MSA structural evaluation) (under maintenance…)

$#: t_coffee -other_pg apdb -aln data_c7151320.in -apdb_outfile default -outfile default \
    -io_format hsg3 -output score_html -maximum_distance 10 -md_threshold 2.0 -similarity_ \
    threshold 70 -template_file EXPRESSO -run_name result -quiet stdout

T-RMSD (structural clustering)

$#: t_coffee -in=data_b89d3438.in -mode=expresso -cache=$PWD -blast=LOCAL -pdb_db=/db/pdb/ \
    derived_data_format/blast/2016-01-01/pdb_seqres.fa -evaluate_mode=t_coffee_slow -output= \
    aln score_html -maxnseq=150 -maxlen=2500 -case=upper -outorder=input -run_name=result \
    -multi_core=4 -quiet=stdout; t_coffee -other_pg trmsd result.aln -template_file \
    result_pdb1.template_list -output color_html 2>&1; [ -e result.struc_tree.consensus ]

STRIKE (MSA evaluation with single structure)

::

$#: wget ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt $#: install ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST

$#: t_coffee -other_pg strike <sequence file> -template_file PDB -pdb_db <pdb_seqres> -blast_server LOCAL