This article was last revised 1349 days ago, and some of the information may be out of date.If you need to update urgently, please leave a message in the comment area below.
BeforeBuild phylogenetic tree"The article has roughly introduced the use of MEGA, but some parts are not clear. Here we will discuss the existing resourcesexplainNext.Please correct any mistakes.
Collect homologous sequence
To collect homologous sequences using BLAST, refer to "here", but the sequence obtained by this collection method contains multiple species.If you need to perform homology comparison with specific species, such as the commonly used model plants Arabidopsis and rice, there will be some problems in the search process.Here are my solutions for these two species.
1. Arabidopsis
Two websites are mainly used to obtain Arabidopsis gene sequence:
TAIR (Arabidopsis Information Resources) provides a large amount of data on Arabidopsis, including complete genome sequence, gene structure, gene product information, gene expression, DNA and seed bank, genome map, genetic and physical markers, publications and information about the Arabidopsis research community.
PLANT TRANSCRIPTION FACTOR DATABASE (PlnTFDB for short) currently contains 2657 protein models, of which the protein sequence of Arabidopsis thaliana is sorted from TAIR.
Common sequences can be obtained in PlnTFDB.As shown in the figure below, click "Eudicot" and "Arabidopsis thaliana" to enter the Arabidopsis database.
Click the transcription factor family listed in the table, such as "zf HD".
Click "Check all" to select all the sequences, and then click "Retrieve" to download them directly.fastaFormat.
If there is no desired gene family in the table, you can use TAIR for BLAST search.
Enter the TAIR homepage, fill in the gene family name in the search box, select the protein database, and click Search.
Taking my HSP60 as an example, the following results are obtained after searching.Select the gene closest to the desired gene, such as the last one.
Click "Send to BLAST" and click "Run BLAST" on the next page.Because I don't know what these parameters do, I directly use the default parameters for BLAST search.
Then the gene sequence list with TAIR landing number was obtained.
Exclude the part with E value greater than 0.01, and save the rest.Because this is the login number of the gene, it is necessary to further search for the corresponding protein.
Sort out the above login numbers, and use TAIR to download the fasta files in batches.
Because the Pfam code has been obtained in HMMER previously, it is much easier to search the rice sequence.
Find it on the front page of Rice Genome Annotation ProjectProtein Domain Search, fill in the Pfam code in the Pfam profile search box, and click Search.
No more maps here. Sort out the login numbers in the "Model" column of the search results.
You can get the search results, copy and paste them for future use.
Sorting homologous sequence
We need to rename these sequences in order to keep the beauty of the later evolution tree as much as possible.Follow the sequenceProtein lengthSort from small to large, and then remove the comments after the login number, and rename the login number.Be careful to keep the original file for a rainy day.
The Arabidopsis thaliana downloaded in batch has been included in the fileLENGTH=1234The format of renaming after sequence sorting can beATFBA1;Rice is more troublesome. I only knowNational Rice Data CenterUse the login number to search, click the gene ID to get detailed gene data, including the protein length, and the renaming format after sequence sequencing can beOsFBA1。
After sorting them all out, you can follow the example below, leave one line blank between each sequence, and then put the sequences of all races in the same file, and then.txtThe suffix is modified to.fasta。
>ATFBA1sequence>JcFBA2sequence>OsFBA3sequence
Click "View" on the top of File Explorer, check "File Extension", and then you can modify the file suffix.Be sure to keep the original file for a rainy day.
Build evolutionary tree
1. Sequence alignment
stayMEGA Home PageDownload the corresponding version of the program according to your system.According to the principle of using the new instead of the old, the latest version is recommendedMEGA X(64bit)。Available hereBackup Download。
With MEGA X installed by default,.fastaThe file will be opened using MEGA X by default.So double click the sorted.fastaSequence file, open it, and the following interface will pop up.
If the fasta file cannot be opened with MEGA X by default, you can also click "File" and "Open a file" to open the fasta file.
Then we click the "W" above and click“Align Protein”To use the built-in ClusterW for sequence alignment.
Select "OK" in the pop-up window and select all sequences.Then select "OK" in "ClusterW options", and perform sequence alignment in the default configuration.
Be careful not to close the window and wait for the end of the comparison.
Save the comparison results.Click Data and save as shown in the figure.megFormat.
2. Build an evolutionary tree
SelectPHYLOGENY, select the first columnConstruct/Test Maximum ……, import the.megfile
After that, all default, wait for program analysis, and the analysis duration depends on the number of sequences to get the evolution tree.
3. Beautify the evolution tree
Not written yet
4. Export evolution tree
Click "Image" to output pictures in various formats. It is recommended to use BMP format here. If you can't open it, you can try itHoneyviewTo browse these pictures.
Copyright notice:Unless otherwise stated, all articles are my own creation. Please contact the author for reprinting and quotation, and indicate the source (author, original link, etc.).
• Please fill in the real email when commenting so that you can receive the reply reminder. • Initial comments need to be reviewed, and comments irrelevant to the article should be posted on the message page.
The two software CLUSTAL and MUSCLE built in MEGA are based on MSA (multiple sequence alignment), but MUSCLE has significant speed advantages, and CLUSTAL is more accurate in theory.However, in order to pursue accuracy, it is better to use the mafft with better algorithm, which can also handle more complex alignment.It would be better for MEGA to compare those with small difference and short sequence, which is better than mindless MUSCLE.
Hahaha, I can't help but change the subject halfway..So far, three tutors have been changed, that is, three topics have been changed. My heart is tired