Using HMMER to analyze gene sequence

This article was last revised 741 days ago, and some of the information may be out of date. If you need to update urgently, please leave a message in the comment area below.

HMMER It is a software used to analyze gene sequences in bioinformatics, which can quickly determine the similarity between two groups of sequences. The latest version is 3.1. however HMMER official website At present, only the Linux version is available, and the Windows version is in a shutdown state. Maybe the sequence analysis software can be used smoothly on Linux? However, after glancing at the official document, it said: "We have been developing HMMER4 since 2011, but it has been in a slow state of development", which may be the real reason for stopping HMMER on the Windows side.

Install HMMER

1. Download

Since you can't set up a virtual machine specifically to use Linux HMMER ~~(It seems that it is not impossible)~~ , then look for the historical version. Through the known download link of Linux version, it is found that HMMER places all historical software in This place The last version of Windows HMMER is 3.0. Portal / backups

After downloading and decompressing, there are a lot of things that can't be understood. There is nothing that can be opened by double clicking .exe File. You need to type a command to start the program. Search the installation method^one , the following is a detailed demonstration.

2. Installation

Windows HMMER is relatively simple to install.

Windows10 users open control panel , Search environment variable , click Edit system environment variables , Select environment variable

Windows11 users can open the "Start Menu" and directly type "Environment Variables" in the upper search bar.

Click the first match.

stay System variable Found in Path , click edit

Click on the right newly build Button, fill in the HMMER route

If you don't know what the path is, go to the place where you put the HMMER, copy the things in the address bar according to the figure below, paste them into the variables above, and confirm all the way after adding.

3. Test

next test See if it can be used. You need something that can input commands. Commonly used are CMD and Windows PowerShell. Choose one from the other.

Open CMD: WINDOWS key+R, enter cmd , click OK to open command prompt

Open Windows PowerShell: facing the start menu Right click , Select Windows PowerShell

Then enter

 hmmscan -h

It is recommended to copy and paste directly to avoid errors. If something similar to the following appears, it means that it is installed.

Use HMMER

Refer to online tutorials^two , demonstrate the functions I currently use: hmmbuild and hmmsearch
hmmbuild : Create an hmm model (roughly)
hmmsearch : Analyze similarity (there should be no error)

1. Get the pfam ID

Need to use this gene Hidden Markov model Then sequence alignment can be carried out. To obtain the hidden Markov model, we need to first obtain the conserved protein domain of the gene pfam ID of. You can obtain the pfam id through the work of others in the references or by searching NCBI yourself. The latter is described below.

get into NCBI Protein Database , input keywords, species keywords need their Latin names or official English, take the MADS box gene of Jatropha curcas as an example, such as "MADS box Jatropha curcas"

Select any item in the search results to view the details, and then click "Identify Reserved Domains" on the right to search and analyze the protein conservative domain

It can be seen from the results that the target protein is hit in the K-box family, and the pfam id has been given in the list

2. Download the protein conservative domain comparison sequence

Click the above figure directly pfam id , jump to the details of the conserved domain of the protein family, click Source pfam

Click on the pfam details page Alignments , Select Stockholm Format, click generate Download the multiple comparison sequence, and the file format is .txt

3. Download the species comparison sequence

Download the genome protein data of Jatropha curcas. Entering NCBI FTP Site , found genomes , using Ctrl + F Search for their Latin names on the web. The Latin scientific name of Jatropha curcas is Jatropha curcas, so we can try to search with Jatropha

Select protein in the next directory to download protein.fa.gz , unzip, get protein.fa Files, here too backups

4. Comparative analysis

Copy the two files downloaded above to the folder of HMMER , of course, it can also be placed in another folder, and then open the command prompt.

The command prompt defaults to C: Users Users> Operation in this directory, and we need to switch to the directory where HMMER is located to continue the operation.

If HMMER is not in disk C, you need to switch the drive letter, such as disk D, and enter

D:

Press Enter to enter Disk D. Other disks are similar. Then switch to the location of HMMER

 Cd HMMER installation position

such as

 cd D:\hmmer

If an error is reported, it may be because the path contains Chinese or other non English characters. You need to use English single quotation marks to enclose the path, such as this

 Cd 'D:  Pretend to have Chinese  hmmer'

Next, use the hmmbuild Command to convert the obtained protein conservative domain comparison sequence into the hmm model. The file I downloaded is PF01486_seed.txt , then enter

 Hmmbuild hmm Files to be converted

such as

 hmmbuild PF01486.hmm PF01486_seed.txt

Then compare the converted hmm with the Jatropha curcas protein sequence, and use hmmsearch Commands, entering

 hmmsearch PF01486.hmm protein.fa > PF01486.out

End of operation, generate PF01486.out File, right click to open in Notepad

You can see the comparison results.

If this article helps you, you may as well click "Enjoy a cup of coffee" below to give me some material rewards, which will give me more motivation to write. ~~But it does not mean that this article will continuously update XD~~

Short book: Install hmmer software under windows to scan domain mode ↩
OmicsClass： Using NCBI and Pfam databases to find information about conserved domains of gene families ↩

Author: mikusa
Link to this article: https://www.himiku.com/archives/hmmer.html
Copyright notice: Unless otherwise stated, all articles are my own creation. Please contact the author for reprinting and quotation, and indicate the source (author, original link, etc.).

• Please fill in the real email when commenting so that you can receive the reply reminder.
• Initial comments need to be reviewed, and comments irrelevant to the article should be posted on the message page.

already existing thirty-two Comments

LYT

2022-04-06 17:53

zero zero

What does it mean when I enter the location of the hmmer that it is not an internal or external command, nor an executable program or batch file? The previous steps are all right. qaq

mikusa reply @LYT

2022-04-06 17:57

The environment variable has not been added to the system

LYT reply @mikusa

2022-04-11 15:19

Add hmmer to the path? I added it to the system variable - path - according to the steps, I created it (ノ) ο °) ノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノノ

2022-04-11 16:56

It shouldn't be. Are you sure to fill in the path of hmmer accurately?

lyt reply @mikusa

2022-04-16 19:12

Solved Thank you! How to set the output result threshold? For example, how to screen out family members whose E-value is higher than 1e-05 55555 Thank you for your reply!

ganyi

2022-01-29 17:06

Big Brother hmmsearch has no tutorial. I can't figure it out myself,,

ganyi reply @ganyi

2022-01-29 17:09

Failed to open sequence file protein.fa for reading appears

2022-01-29 17:54

zero one

Now it has been solved. Thank you, Big Brother

mikusa reply @ganyi

2022-01-29 17:57

It means that the file protein.fa cannot be opened. Do you want to see if it is downloaded incorrectly?

2022-01-29 17:58

This tutorial is just for temporary use. After all, it is not professional. Sorry

ganyi reply @mikusa

2022-01-29 18:00

I made a mistake. Now that the problem is solved, thank you for your tutorial ̀⌄•́ ๑)૭

Linlu reply @ganyi

2022-09-23 21:45

How did you solve it

2021-09-27 16:14

Hello, brother, I want to ask that when HMMER uses hmmbuild to convert the obtained protein conservative domain comparison sequence into the hmm model, an error is reported, "Error: Alignment file PF03106_seed.txt doesn't exist or is not readable" is displayed, but my PF03106_seed.txt file is in the HMMER folder, why is this

mikusa reply @Lu

2021-09-27 18:31

If it does not exist, could it be that the file name was entered incorrectly when the command was entered?
If the file cannot be opened, close the file background and try again.

Lu reply @mikusa

2021-09-28 19:19

one zero

Thank you, brother. It has been solved

zzt reply @mikusa

2023-07-06 22:38

Hello brother, I also appear this "Error: Alignment file PF03106_seed.txt doesn't exist or is not readable", but the file is not opened in the background, and the file name is not entered incorrectly. What is the reason? Thank you, brother!

mikusa reply @zzt

2023-07-06 22:58

Is the wrong path entered before the error file does not exist?

Wang Ping

2020-06-13 15:51

E:\hmmer>hmmsearch.exe PF01535.hmm potato_pep.fasta > PF01535.out

Error: Failed to open hmm file PF01535.hmm for reading.

My good food, please save me!!! (qq mailbox number receives messages)

mikusa reply @Wang Ping

2020-06-13 16:26

It said that the file could not be opened. Please close the file background and try again

Wang Ping reply @mikusa

2020-06-13 20:51

Really, it has been solved, thank you

February bud

2020-05-21 09:05

Thank you so much~~~~

2020-05-20 14:34

I may be the version of a fool. I didn't find its Windows version on the official website. It's useless to find its source code according to other people's methods

mikusa reply @February bud

2020-05-20 15:45

If you can't find it, use my backup

Six dollars five

2020-05-10 23:09

Big Boss, can your hmmer be used normally? Did you see the prompt [Alignment file PFxxxxx_seed.txt doesn't exist or is not ready] recently! I suddenly can't use Ai