|HAMAP-Scan User Manual
The HAMAP-Scan tool allows you to classify and annotate protein sequences by using the collection of HAMAP family profiles and annotation
rules. As a result, you will get a list of matching profiles and the match scores for your sequences. Sequences that are matched by HAMAP
profiles can also be processed by our automatic annotation system to be partly or completely annotated in the UniProtKB format by HAMAP
How to use the tool - Submission
Step 1 - Enter your PROTEIN sequence(s)
You have the possibility to either:
Enter PROTEIN sequences
: Enter or paste your protein sequence(s) in the text box. The maximum number of sequences that can
be submitted this way is 1’000. For more sequences, use the 'Upload a file' option.
The following input formats are supported:
Upload a file
- UniProtKB accession(s) or identifier(s) (e.g. Q8ZHG0 or AGUA_YERPE) – each on a separate line
- Protein sequence(s) in FASTA format
: Upload a file containing your protein sequence(s) in FASTA format (up to 100’000 sequences).
Step 2 - Choose 'Scan' or 'Scan & Annotate'
In this step, you can choose whether you want to scan your sequences against the HAMAP family profile collection to find all matches, or if
you want to scan and annotate your sequences by using the HAMAP annotation rules.
Select the first option ('Scan'
) to scan your sequence(s) for matches against the HAMAP family profile collection. You will
get a list of all (trusted and weak) matches of your sequences along with their match score.
Select the second option ('Scan & Annotate'
) to scan your sequence(s) against the HAMAP family profile collection and
process trusted matches by our automatic annotation pipeline. Your sequence(s) will be partly or completely annotated in the UniProtKB format
by HAMAP profile-associated annotation rules (to get more information about our automatic annotation system, please consult the
“What is HAMAP?
If you select this option, all your sequences must originate from the same organism and you must enter the taxonomic identifier (TaxID) that
represents this organism. You can obtain the TaxID from the UniProt or NCBI taxonomy databases. If an organism is not listed in the taxonomy
database, please enter the TaxID of a very similar species, or the TaxID of a more general taxonomic node (e.g. for Actinobacteria). HAMAP
family profiles and HAMAP annotation rules currently cover mainly bacteria and archaea, but you may try to scan sequences from any
Step 3 - Submission
If you chose to paste sequences in the text box:
- After clicking “submit”, the results will be displayed directly in the browser window when they become
- You can optionally enter a valid email address to be notified by email once the job is finished. You can enter a
job title, which will be included in the subject of the results email. You will receive an email once your HAMAP-Scan results are available, even if no matches have been found. Your results will be available on our server for one
If you chose to upload a file:
- Enter a valid email address to be notified by email once the job is finished. You can enter a job
title, which will be included in the subject of the results email. You will receive an email once your HAMAP-Scan results are available, even if no matches have been found. Your results will be available on our server for one
For 'Scan & Annotate'
- Please provide a valid email address to be notified by email once the job is finished.
- Optionally, you can:
- Enter a job title. If you enter a job title, it will be included in the subject of the results email.
- Enter a password. If you enter a password, then the same password will be requested before you can download your
After clicking “submit”, you will receive immediately a submission confirmation email that summarizes your request
(including job title and password, if available) and contains a three-letter request code that will be required to retrieve your
You will receive an email once your HAMAP-Scan results
are available, even if no matches have been found. Your results
will be available on our server for at least one month.
How to use the tool – Retrieve results from 'Scan & Annotate'
Via this form you will be able to retrieve the results of your previous submissions to HAMAP 'Scan & Annotate'. Each submission has been
registered under a three-letter code that is required to retrieve the results.
Download results of a previous scan
Enter the three-letter request code
that was displayed when you submitted your data to 'Scan & Annotate' and, if
necessary, the password. If you don't remember the code or the password, please refer to your submission confirmation email.
Click on “download” and the HAMAP-Scan results
will be displayed directly in your browser window.
Get a list of your latest jobs
Enter your email
address and click on “download”. You will receive an email with a link to the list of the
latest jobs you have submitted via this email. For each job, you can click on the message in the "Status" column; if you hadn't submitted a password, you will be directed to the
results page or to the submission summary of the scan if it has not completed yet; otherwise you will be directed to HAMAP-Scan results form
and your password will be required to view the results of the job in question.
The results of your scan are presented on a webpage, which opens immediately when your scan has finished, or - if you provided an email
address during the submission process - which can be accessed via the link in the email that was sent to you.
The result of a HAMAP-Scan (without the annotation option) is displayed as a list of your input sequence(s)
and their matches to HAMAP family profiles.
Each line corresponds to a match and contains the following data:
Identifier of the sequence. Contains the UniProtKB AC/ID (if provided in the input) or the FASTA header,
and the length of the sequence.
Accession number of the profile that produces a match to the sequence.
The accession number is clickable and opens the corresponding profile page.
Name of the profile.
Trusted cutoff for the match score of the profile.
This value is a curated threshold score for each profile.
Sequences with match scores above this cutoff are considered trusted members of the protein family.
Calculated score of your sequence against the profile.
Region (start and end positions) within you sequence that matched to the profile and that was used to calculate the match score.
Indicates if your sequence is a "trusted" match
(i.e. the match score of the sequence is above the trusted cutoff of the HAMAP profile)
or a "weak" match (i.e. the sequence has a match score below the trusted cutoff).
Indicates also if "no match" has been found for a sequence.
A sequence may produce matches on more than one profile,
in which case every match will be represented on a separate line and therefore,
a sequence may be present multiple times in the table.
Also sequences that have no match to any HAMAP profile are listed in the results table,
in order to provide a complete list of the sequences that were submitted to the scan.
In some instances, PROSITE profiles are also used to classify protein sequences for annotation with HAMAP annotation rules.
If your sequence matches one of these PROSITE profiles, it will be listed in the results.
Filter: You can choose to show only a subset of the results table.
You have the possibility to filter your results to show only trusted matches, weak matches, or sequences without match, or any combination thereof.
Download: you can download result table as a tab-delimited text file or as Excel file.
The content of the downloaded results table corresponds to the filtered results as displayed on the webpage (see above).
Scan & Annotate results
The results of a job submitted with the 'Scan & Annotate' option are provided in the form of a number of downloadable text files.
The number and contents of these files are determined by the nature of the results obtained.
The files are preceded by a piece of explanatory text and the number of sequences they contain, and are available via a link for download.
All original sequences submitted by the user are annotated with sufficient information to reconstitute a minimal entry in UniProtKB format.
This includes a (temporary) accession number, the date of annotation, a description of the protein, and taxonomic information,
which is derived by querying the UniProt taxonomy database with the taxonomic identifier provided by the user.
The FASTA header of each submitted sequence is stored in an additional section – the 'internal' section.
However these minimal entries do not contain yet any annotations from matching HAMAP rules.
The possible files for download are as follows:
- (always present)
This file is the original, untouched file submitted to HAMAP-Scan.
This file provides a mapping of the original input sequences (via the FASTA header) to the accession numbers
of the minimal entries in UniProtKB format.
This file contains all protein sequences submitted to HAMAP-Scan in the original order. Sequences are annotated with sufficient information to reconstitute a minimal entry in UniProtKB format. Protein sequences that have a trusted match to a HAMAP family profile are annotated by the associated HAMAP rule. Sequences with more than one trusted match are currently not annotated by any of the associated rules. The results (annotated and not-annotated sequences) are also available as separate files (see below).
This file contains all protein sequences that have a trusted match to a HAMAP family profile and
that have been annotated by the HAMAP rule associated with it.
This file contains all protein sequences that have a trusted match to more than one HAMAP profile,
and whose match regions overlap. Currently these sequences are not annotated by any of the associated
rules. Matches (trusted and weak) to HAMAP family profiles are listed in the internal section of each entry.
This file contains all protein sequences with multiple, non-overlapping matches to HAMAP family profiles.
Currently these sequences are not annotated by any of the associated
rules. Matches (trusted and weak) to HAMAP family profiles are listed in the internal section of each entry.
This file contains all sequences with no trusted match to any HAMAP family profile.
Weak matches to HAMAP family profiles (if available) are listed in the internal section of each entry.
Note that the sequences annotated by HAMAP are returned in UniProtKB
In addition to the annotations, our system generates a series of comments and warnings which are
contained in an additional section - the 'internal' section - of each annotated entry.
These lines are listed below:
**HA Submitted Name: tr|G2WCJ5|G2WCJ5_YEASK Enolase-phosphatase E1 OS=Saccharomyce...
The FASTA header of the originally submitted sequence.
(present in all entries in all files)
**HA FAM; Method MF_03117; ENOPH; Trusted match; 65.167 (+8.7).
**HA FAM; Method MF_01681; MTNC; Weak match; 28.654 (-5.6).
All matches to HAMAP family profiles are stored with one separate line for each match.
Each line specifies the accession number of the matching profile, the identifier of the profile, the match quality (trusted or weak), and the match score (with the score difference to the trusted cutoff score of the profile in parenthesis).
(present in all entries matching a HAMAP family. May also be present in the '_no_match' file in entries having weak matches to (a) profile(s).)
**HA SAM; Annotated by HAMAP 1.9.9; MF_03117.3; MF_03117; 19-MAR-2014 11:08:36.
This specifies which version of the HAMAP pipeline was used as well as the number and version of the rule,
the accession number of the matching profile, and the time and date of the annotation.
(This line and all the following lines are currently only found in the '_trusted' file with entries annotated by a HAMAP annotation rule.)