[go: up one dir, main page]

T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach.[1] It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from Protein Data Bank (PDB) files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported (FASTA, Protein Information Resource (PIR)).

T-Coffee
Developer(s)Cédric Notredame, Centro de Regulacio Genomica (CRG) - Barcelona
Stable release
13.45.0.4846264 / 15 October 2020; 4 years ago (2020-10-15)
Preview release
13.45.33.7d7e789 / 23 December 2020; 3 years ago (2020-12-23)
Repository
Operating systemUnix, Linux, Windows, macOS
TypeBioinformatics tool
LicenceGPL
Websitewww.tcoffee.org

Algorithm

edit

T-Coffee algorithm consist of two main features, the first by, using heterogeneous data sources, can provide simple and flexible means to generate multiple alignments. T-coffee can compute multiple alignments using a library that was generated using a mixture of local and global pair-wise alignments.[1]

The second is the "Optimization method", used to find the multiple alignment that best fits the pair-wise alignments in the input library using a progressive strategy that can be compared to the one used in ClustalW. The Optimization method has the advantage of being fast and robust. The information in the library is used to carry out progressive alignments and facilitates the duty of considering the alignments between all the pairs while carrying out every step of the progressive multiple alignments.[1]

Generating a primary library of alignments

edit

The library incorporates a set of pair-wise alignments between all of the sequences to be aligned, the alignments are not required to be consistent. Inside the library, there can be found information on each of the N(N-1)/2 in where N is the number of sequences. Two alignment sources are used for each pair of sequences, one of them classified as local, and the other as global.[1]

Global alignments are constructed using ClustalW on the sequences, two at a time, and sed to give one full-length alignment between each pair of sequences. The local alignments are the ten top-scoring non-intersecting local alignments gathered using the Lalign program of the FASTA package.[1]

Each alignment is represented in the library as a list of pair-wise residue matches, each pair is a constraint; however, some constraints are more relevant than others. the importance of each constraint depends on which are more likely to be correct. While computing the multiple alignments, priority is given to the most reliable residue pairs by using a weighting scheme.[1]

Combination of the libraries

edit

Efficient combination of local and global alignment information is an important factor of T-Coffee. By using the ClustalW and Lalign primary libraries it can be achieved with a process of addition. Any duplicated pair between both libraries is merged into a single entry with the weight of the total sum of both pairs. Else, a new entry is created for the pair. Pairs with a weight of zero will not be represented.[1] For each pair of aligned residues in the library, it is possible to assign a weight that belongs to the degree to which those residues align consistently. This is called Library extension.

Comparisons with other alignment software

edit

While the default output is a Clustal-like format, it is sufficiently different from the output of ClustalW/X that many programs supporting Clustal format cannot read it; fortunately ClustalX can import T-Coffee output so the simplest fix for this issue is usually to import T-Coffee's output into ClustalX and then re-export. Another possibility is to request the strict ClustalW output format with the option "-output=clustalw_aln".

An important specificity of T-Coffee is its ability to combine different methods and different data types. In its latest version, T-Coffee can be used to combine protein sequences and structures, RNA sequences and structures. It can also run and combine the output of the most common sequence and structure alignment packages.

T-Coffee comes along with a sophisticated sequence reformatting utility named seq_reformat. An extensive documentation is available online.

Variations

edit
  • M-Coffee: a special mode of T-Coffee that makes it possible to combine the output of the most common multiple sequence alignment packages (Muscle, ClustalW, Mafft, ProbCons, etc.). The resulting alignments are slightly better than the individual one, but most importantly the program indicates the alignment regions where the various packages agree upon. Regions of high agreement are usually well aligned.[2]
  • Expresso and 3D-Coffee: these are special modes of T-Coffee making it possible to combine sequence and structures in an alignment. The structure based alignments can be carried out using the most common structural aligners such as TMalign, Mustang, and sap.[3][4][5][6]
  • R-Coffee: a special mode of T-Coffee making it possible to align RNA sequences while using secondary structure information.[7][8]
  • PSI-Coffee: aligns distantly related proteins using homology extension (slow and accurate)[9][10]
  • TM-Coffee: aligns transmembrane proteins using homology extension[11]
  • Pro-Coffee: aligns homologous promoter regions[12]
  • Accurate: automatically combine the most accurate modes for DNA, RNA and proteins (experimental).[13]
  • Combine: combines two (or more) multiple sequence alignments into one.[1][9]

Evaluation

edit

Transitive Consistency Score (TCS) is an extended version of the T-Coffee scoring scheme.[14] It uses T-Coffee libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. TCS has been shown to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees against Heads-or-Tails, GUIDANCE, Gblocks, and trimAl.[15]

See also

edit

References

edit
  1. ^ a b c d e f g h Notredame C, Higgins DG, Heringa J (2000-09-08). "T-Coffee: A novel method for fast and accurate multiple sequence alignment". J Mol Biol. 302 (1): 205–217. doi:10.1006/jmbi.2000.4042. PMID 10964570. S2CID 10189971.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. ^ Wallace, Iain M.; O'Sullivan, Orla; Higgins, Desmond G.; Notredame, Cedric (2006). "M-Coffee: combining multiple sequence alignment methods with T-Coffee". Nucleic Acids Research. 34 (6): 1692–1699. doi:10.1093/nar/gkl091. ISSN 1362-4962. PMC 1410914. PMID 16556910.
  3. ^ Armougom, Fabrice; Moretti, Sébastien; Poirot, Olivier; Audic, Stéphane; Dumas, Pierre; Schaeli, Basile; Keduas, Vladimir; Notredame, Cedric (2006-07-01). "Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee". Nucleic Acids Research. 34 (Web Server issue): W604–608. doi:10.1093/nar/gkl092. ISSN 1362-4962. PMC 1538866. PMID 16845081.
  4. ^ Zhang, Yang; Skolnick, Jeffrey (2005). "TM-align: a protein structure alignment algorithm based on the TM-score". Nucleic Acids Research. 33 (7): 2302–2309. doi:10.1093/nar/gki524. ISSN 1362-4962. PMC 1084323. PMID 15849316.
  5. ^ Konagurthu, Arun S.; Whisstock, James C.; Stuckey, Peter J.; Lesk, Arthur M. (2006-08-15). "MUSTANG: a multiple structural alignment algorithm". Proteins. 64 (3): 559–574. doi:10.1002/prot.20921. ISSN 1097-0134. PMID 16736488. S2CID 14074658.
  6. ^ Sun, Zheng; Tian, Weidong (2012). "SAP--a sequence mapping and analyzing program for long sequence reads alignment and accurate variants discovery". PLOS ONE. 7 (8): e42887. Bibcode:2012PLoSO...742887S. doi:10.1371/journal.pone.0042887. ISSN 1932-6203. PMC 3413671. PMID 22880129.
  7. ^ Wilm, Andreas; Higgins, Desmond G.; Notredame, Cédric (May 2008). "R-Coffee: a method for multiple alignment of non-coding RNA". Nucleic Acids Research. 36 (9): e52. doi:10.1093/nar/gkn174. ISSN 1362-4962. PMC 2396437. PMID 18420654.
  8. ^ Moretti, Sébastien; Wilm, Andreas; Higgins, Desmond G.; Xenarios, Ioannis; Notredame, Cédric (2008-07-01). "R-Coffee: a web server for accurately aligning noncoding RNA sequences". Nucleic Acids Research. 36 (Web Server issue): W10–13. doi:10.1093/nar/gkn278. ISSN 1362-4962. PMC 2447777. PMID 18483080.
  9. ^ a b Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (Jul 2011). "T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension". Nucleic Acids Res. 39 (Web Server issue): W13–7. doi:10.1093/nar/gkr245. PMC 3125728. PMID 21558174.
  10. ^ Kemena C, Notredame C (2009-10-01). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–65. doi:10.1093/bioinformatics/btp452. PMC 2752613. PMID 19648142.
  11. ^ Chang JM, Di Tommaso P, Taly JF, Notredame C (2012-03-28). "Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee". BMC Bioinformatics. 13: S1. doi:10.1186/1471-2105-13-S4-S1. PMC 3303701. PMID 22536955.
  12. ^ Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C (Apr 2012). "Use of ChIP-Seq data for the design of a multiple promoter-alignment method". Nucleic Acids Res. 40 (7): e52. doi:10.1093/nar/gkr1292. PMC 3326335. PMID 22230796.
  13. ^ "T-Coffee Server". tcoffee.crg.eu. Retrieved 2023-12-26.
  14. ^ Chang, JM; Di Tommaso, P; Lefort, V; Gascuel, O; Notredame, C (1 July 2015). "TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction". Nucleic Acids Research. 43 (W1): W3-6. doi:10.1093/nar/gkv310. PMC 4489230. PMID 25855806.
  15. ^ Chang, J.M.; Di Tommaso, P.; Notredame, C. (Jun 2014). "TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction". Molecular Biology and Evolution. 31 (6): 1625–37. doi:10.1093/molbev/msu117. PMID 24694831.
edit