HAMAP-Scan User Manual

The HAMAP-Scan tool allows you to classify and annotate protein sequences by using the collection of HAMAP family profiles and annotation rules. As a result, you will get a list of matching profiles and the match scores for your sequences. Sequences that are matched by HAMAP profiles can also be processed by our automatic annotation system to be partly or completely annotated in the UniProtKB format by HAMAP annotation rules.

Submission
Retrieval
- Download results of a previous scan
- Get a list of your latest jobs
Results
- Scan results
- Scan & Annotate results

How to use the tool - Submission

Step 1 - Enter your PROTEIN sequence(s)

You have the possibility to either:

Enter PROTEIN sequences: Enter or paste your protein sequence(s) in the text box. The maximum number of sequences that can be submitted this way is 1’000. For more sequences, use the 'Upload a file' option.
The following input formats are supported:

UniProtKB accession(s) or identifier(s) (e.g. Q8ZHG0 or AGUA_YERPE) – each on a separate line^*
Protein sequence(s) in FASTA format

*All UniProtKB/Swiss-Prot accessions/identifiers and all UniProtKB/TrEMBL accessions/identifiers of entries belonging to reference proteomes are accepted.

Upload a file: Upload a file containing your protein sequence(s) in FASTA format (up to 100’000 sequences).

Step 2 - Choose 'Scan' or 'Scan & Annotate'

In this step, you can choose whether you want to scan your sequences against the HAMAP family profile collection to find all matches, or if you want to scan and annotate your sequences by using the HAMAP annotation rules.

Select the first option ('Scan') to scan your sequence(s) for matches against the HAMAP family profile collection. You will get a list of all (trusted and weak) matches of your sequences along with their match score.

Select the second option ('Scan & Annotate') to scan your sequence(s) against the HAMAP family profile collection and process trusted matches by our automatic annotation pipeline. Your sequence(s) will be partly or completely annotated in the UniProtKB format by HAMAP profile-associated annotation rules (to get more information about our automatic annotation system, please consult the “ What is HAMAP? ” document).
If you select this option, all your sequences must originate from the same organism and you must enter the taxonomic identifier (TaxID) that represents this organism. You can obtain the TaxID from the UniProt or NCBI taxonomy databases. If an organism is not listed in the taxonomy database, please enter the TaxID of a very similar species, or the TaxID of a more general taxonomic node (e.g. for Actinobacteria). HAMAP family profiles and HAMAP annotation rules currently cover mainly bacteria and archaea, but you may try to scan sequences from any species.

Step 3 - Submission

For 'Scan':

If you chose to paste sequences in the text box:

After clicking “submit”, the results will be displayed directly in the browser window when they become available.
You can optionally enter a valid email address to be notified by email once the job is finished. You can enter a job title, which will be included in the subject of the results email. You will receive an email once your HAMAP-Scan results are available, even if no matches have been found. Your results will be available on our server for one week.

If you chose to upload a file:

Enter a valid email address to be notified by email once the job is finished. You can enter a job title, which will be included in the subject of the results email. You will receive an email once your HAMAP-Scan results are available, even if no matches have been found. Your results will be available on our server for one week.

For 'Scan & Annotate':

Please provide a valid email address to be notified by email once the job is finished.
Optionally, you can:
Enter a job title. If you enter a job title, it will be included in the subject of the results email.
Enter a password. If you enter a password, then the same password will be requested before you can download your results.

After clicking “submit”, you will receive immediately a submission confirmation email that summarizes your request (including job title and password, if available) and contains a three-letter request code that will be required to retrieve your results.

You will receive an email once your HAMAP-Scan results are available, even if no matches have been found. Your results will be available on our server for at least one month.

How to use the tool – Retrieve results from 'Scan & Annotate'

Via this form you will be able to retrieve the results of your previous submissions to HAMAP 'Scan & Annotate'. Each submission has been registered under a three-letter code that is required to retrieve the results.

Download results of a previous scan

Enter the three-letter request code that was displayed when you submitted your data to 'Scan & Annotate' and, if necessary, the password. If you don't remember the code or the password, please refer to your submission confirmation email.
Click on “download” and the HAMAP-Scan results will be displayed directly in your browser window.

Get a list of your latest jobs

Enter your email address and click on “download”. You will receive an email with a link to the list of the latest jobs you have submitted via this email. For each job, you can click on the message in the "Status" column; if you hadn't submitted a password, you will be directed to the results page or to the submission summary of the scan if it has not completed yet; otherwise you will be directed to HAMAP-Scan results form and your password will be required to view the results of the job in question.

HAMAP-Scan results

The results of your scan are presented on a webpage, which opens immediately when your scan has finished, or - if you provided an email address during the submission process - which can be accessed via the link in the email that was sent to you.

Scan results

The result of a HAMAP-Scan (without the annotation option) is displayed as a list of your input sequence(s) and their matches to HAMAP family profiles. Each line corresponds to a match and contains the following data:

Your sequence(s)^1,2:	Identifier of the sequence. Contains the UniProtKB AC/ID (if provided in the input) or the FASTA header, and the length of the sequence.
Profile AC ³ :	Accession number of the profile that produces a match to the sequence. The accession number is clickable and opens the corresponding profile page.
Profile name:	Name of the profile.
Trusted cutoff:	Trusted cutoff for the match score of the profile. This value is a curated threshold score for each profile. Sequences with match scores above this cutoff are considered trusted members of the protein family.
Match score:	Calculated score of your sequence against the profile.
Match region:	Region (start and end positions) within you sequence that matched to the profile and that was used to calculate the match score.
Match quality:	Indicates if your sequence is a "trusted" match (i.e. the match score of the sequence is above the trusted cutoff of the HAMAP profile) or a "weak" match (i.e. the sequence has a match score below the trusted cutoff). Indicates also if "no match" has been found for a sequence.

¹A sequence may produce matches on more than one profile, in which case every match will be represented on a separate line and therefore, a sequence may be present multiple times in the table.

²Also sequences that have no match to any HAMAP profile are listed in the results table, in order to provide a complete list of the sequences that were submitted to the scan.

³In some instances, PROSITE profiles are also used to classify protein sequences for annotation with HAMAP annotation rules. If your sequence matches one of these PROSITE profiles, it will be listed in the results.

Options:

Filter: You can choose to show only a subset of the results table. You have the possibility to filter your results to show only trusted matches, weak matches, or sequences without match, or any combination thereof.
Download: you can download result table as a tab-delimited text file or as Excel file. The content of the downloaded results table corresponds to the filtered results as displayed on the webpage (see above).

Scan & Annotate results

The results of a job submitted with the 'Scan & Annotate' option are provided in the form of a number of downloadable text files. The number and contents of these files are determined by the nature of the results obtained. The files are preceded by a piece of explanatory text and the number of sequences they contain, and are available via a link for download.

All original sequences submitted by the user are annotated with sufficient information to reconstitute a minimal entry in UniProtKB format. This includes a (temporary) accession number, the date of annotation, a description of the protein, and taxonomic information, which is derived by querying the UniProt taxonomy database with the taxonomic identifier provided by the user. The FASTA header of each submitted sequence is stored in an additional section – the 'internal' section. However these minimal entries do not contain yet any annotations from matching HAMAP rules.

The possible files for download are as follows:

Input files - (always present)

<request_code>_in: This file is the original, untouched file submitted to HAMAP-Scan.
<request_code>_map: This file provides a mapping of the original input sequences (via the FASTA header) to the accession numbers of the minimal entries in UniProtKB format.

Results files

<request_code>_all: This file contains all protein sequences submitted to HAMAP-Scan in the original order. Sequences are annotated with sufficient information to reconstitute a minimal entry in UniProtKB format. Protein sequences that have a trusted match to a HAMAP family profile are annotated by the associated HAMAP rule. Sequences with more than one trusted match are currently not annotated by any of the associated rules. The results (annotated and not-annotated sequences) are also available as separate files (see below).
<request_code>_trusted: This file contains all protein sequences that have a trusted match to a HAMAP family profile and that have been annotated by the HAMAP rule associated with it.
<request_code>_overlap: This file contains all protein sequences that have a trusted match to more than one HAMAP profile, and whose match regions overlap. Currently these sequences are not annotated by any of the associated rules. Matches (trusted and weak) to HAMAP family profiles are listed in the internal section of each entry.
<request_code>_fusion: This file contains all protein sequences with multiple, non-overlapping matches to HAMAP family profiles. Currently these sequences are not annotated by any of the associated rules. Matches (trusted and weak) to HAMAP family profiles are listed in the internal section of each entry.
<request_code>_no_match: This file contains all sequences with no trusted match to any HAMAP family profile. Weak matches to HAMAP family profiles (if available) are listed in the internal section of each entry.

Note that the sequences annotated by HAMAP are returned in UniProtKB format. In addition to the annotations, our system generates a series of comments and warnings which are contained in an additional section - the 'internal' section - of each annotated entry. These lines are listed below:


            **HA Submitted Name: sp|B1AIC1|ATPA_UREP2

Taken from the FASTA header of the originally submitted sequence.
(present in all entries in all files)


            **HA FAM; Method MF_01346; ATPA; Trusted match; 47.061 (+12.1).


            **HA FAM; Method MF_00309; VATA; Weak match; 9.086 (-32.1).


            **HA FAM; Method MF_00310; VATB; Weak match; 17.265 (-27.2).


            **HA FAM; Method MF_01347; ATPB; Weak match; 14.251 (-29.7).

All matches to HAMAP family profiles are stored with one separate line for each match. Each line specifies the accession number of the matching profile, the identifier of the profile, the match quality (trusted or weak), and the match score (with the score difference to the trusted cutoff score of the profile in parenthesis).
(present in all entries matching a HAMAP family. May also be present in the '_no_match' file in entries having weak matches to (a) profile(s).)


            **HA SAM; Annotated by HAMAP 3.71.2; MF_01346.48; MF_01346; 08-OCT-2021 11:40:58.

This specifies which version of the HAMAP pipeline was used as well as the number and version of the rule, the accession number of the matching profile, and the time and date of the annotation.
(This line and all the following lines are currently only found in the '_trusted' file with entries annotated by a HAMAP annotation rule.)