A blog on some snippets of my work in bioinformatics. Hopefully you find something useful here and avoid stupid mistakes I made.
Following up on building a multiple sequence alignment, a phylogenetic tree can be inferred and visualised. This short tutorial will work on a simple toy dataset using 4 betacoronavrius genomes (Accession numbers: AY597011, JX869059, MN908947, AY278491) retrieved from NCBI. See this tutorial on how to retrieve them. This will cover inferrence of tree.
conda install -c bioconda mafft iqtree
Build the MSA
Assuming your virus genomes are in a single fasta file called virus.fna
mafft --auto --thread 20 ./virus.fna > ./virus.msa.fna
This will create a alignment file called virus.msa.fna in the same directory. Adjust the paths if necessary.
Note: In this example MSA of a whole virus genome is OK because they are relatively tiny. Of course it wouldn’t be feasible to do such operation on genomes of larger size
Note: Adjust the --thread parameters based on your setup.
Infer the phylogenetic tree
Optional: create a new directory to store your treefiles
mkdir -pv ./treedir
Note: IQtree outputs a bunch of files at the end of the tree inference. It would be best to put them into a directory. Adjust the name treedir accordingly. It is always a good practice to include -pv in your mkdir command to prevent overriding existing directories.
iqtree -v -s ./virus.msa.fna --prefix treedir/virus_tree -B 1000 --boot-trees -T 20
Note: The -B <INT> adjusts the number of bootstraps with the minimum of 1000, along with --boot-trees flag, creates bootstrap of your phylogenetic tree. It is always ideal to do this step.
Note: Adjust the -T parameter for number of threads.
The directory of treedir will contain a multitude of files starting with the name as specified in the --prefix flag. Among those files, you will need the prefix.contree file, represeting a phylogenetic tree in newick format.