We recommend using the latest version of TREE-QMC so that you can advantage of the quartet weighting options proposed by Zhang & Mirarab, Mol Biol Evol, 2022. The original algorithm for unweighted quartets can also be called from the latest TREE-QMC method using the following command:
./treeqmc --fast -i <input gene trees> -o <output species tree>
TREE-QMC is a quartet-based method for estimating species trees from gene trees, like the popular method ASTRAL. To learn more about TREE-QMC, check out Han & Molloy, Genome Res, 2023.
TREE-QMC is based on the Quartet Max Cut (QMC) framework introduced by Sagi Snir and Satish Rao; see Snir & Rao, IEEE/ACM TCBB, 2010 and Avni, Cohen, & Snir, Syst Biol, 2015.
TREE-QMC uses MQLib for its max cut heuristic; see Dunning, Gupta, & Silberholz, INFORMS Journal on Computing, 2018.
We recommend working through this tutorial.
To build TREE-QMC, use commands:
git clone https://github.com/molloy-lab/TREE-QMC.git
cd TREE-QMC
cd MQLib
make
cd ..
g++ -std=c++11 -O2 -I MQLib/include -o TREE-QMC src/*.cpp MQLib/bin/MQLib.a
To run TREE-QMC, use command:
./TREE-QMC -i <input file> -o <output file name>
To see the TREE-QMC usage options, use command:
./TREE-QMC -h
The output should be
TREE-QMC version 2.0.0
COMMAND: ./TREE-QMC -h
=================================== TREE-QMC ===================================
This is version 2.0.0 of TREe Embedded Quartet Max Cut (TREE-QMC).
USAGE:
./TREE-QMC (-i|--input) <input file> [(-o|--output) <output file>]
[(--polyseed) <integer>] [(--maxcutseed) <integer>]
[(-n|--normalize) <normalization scheme>]
[(-x|--execution) <execution mode>]
[(-v|--verbose) <verbose mode>] [-h|--help]
OPTIONS:
[-h|--help]
Prints this help message.
(-i|--input) <input file>
Name of file containing gene trees in newick format (required)
IMPORTANT: current implementation of TREE-QMC requires that the input
gene trees are unrooted and binary. Thus, TREE-QMC suppresses roots
and randomly refines polytomies during a preprocessing phase; the
resulting trees are written to "<input file>.refined".
[(-o|--output) <output file>]
Name of file for writing output species tree (default: stdout)
[(--polyseed) <integer>]
Seeds random number generator with <integer> prior to arbitrarily
resolving polytomies. If <integer> is set to -1, system time is used;
otherwise, <integer> should be positive (default: 12345).
[(--maxcutseed) <integer>]
Seeds random number generator with <integer> prior to calling the max
cut heuristic but after the preprocessing phase. If <integer> is set to
-1, system time is used; otherwise, <integer> should be positive
(default: 1).
[(-n|--normalize) <normalization scheme>]
Initially, each quartet is weighted by the number of input gene
trees that induce it. At each step in the divide phase of wQMC and
TREE-QMC, the input quartets are modified with artificial taxa. We
introduce two normalization schemes for artificial taxa and find
that they improve empirical performance of TREE-QMC in a simulation
study. The best scheme is run by default. See paper for details.
-n 0: none
-n 1: uniform
-n 2: non-uniform (default)
[(-x|--execution) <execution mode>]
TREE-QMC uses an efficient algorithm that operates directly on the
input gene trees by default. The naive algorithm, which operates on a
set of quartets weighted based on the input gene trees, is also
implemented for testing purposes.
-x 0: run efficient algorithm (default)
-x 1: run naive algorithm
[(-v|--verbose) <verbose mode>]
-v 0: write no subproblem information (default)
-v 1: write CSV with subproblem information (subproblem ID, parent
problem ID, depth of recursion, total # of taxa, # of artifical
taxa, species names)
-v 2: write CSV with subproblem information (info from v1 plus # of
of elements in f, # of pruned elements in f, # of zeroes in f)
[--shared <use shared taxon data structure to normalize quartet weights>]
Do NOT use unless there are no missing data!!!
Contact: Post issue to Github (https://github.com/molloy-lab/TREE-QMC/)
or email Yunheng Han (yhhan@umd.edu) & Erin Molloy (ekmolloy@umd.edu)
If you use TREE-QMC in your work, please cite:
Han and Molloy, 2023, "Improving quartet graph construction for scalable
and accurate species tree estimation from gene trees," Genome Research,
http:doi.org/10.1101/gr.277629.122.
================================================================================