Input formats

Supported input file format

InterProScan 5 supports the FASTA file format.

An example of a simple FASTA format file containing unaligned sequences:

> seq1 Description of seq1.
AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCTAGTACGTCA
TAGTA
> seq2 Description of seq2.
CGATCGATCGTACGTCGACTGATCGTAGCTACGTCGTACGTAG
CATCGTCAGTTACTGC

Supported sequence format

InterProScan 5 supports unaligned sequences only. Sequences should contain only valid IUPAC amino acid or nucleic acid characters. In addition gap (‘-‘), period (‘.’), asterix or underscore symbols are not allowed and should produce warnings and InterProScan will exit immediately.

Example for supported protein sequence:

MPIGSKERPTFFEIFKTRCNKADLGPISLNWFEELSSEAPPYNSEPAEESEHKNNNYEPN

Example for supported nucleic acid sequence:

atgaaatataaacgcattgtgtttaaagtgggcaccagcagcctgaccaacg

Unsupported sequences:

-RFLLLSLARFSNNRFGVQLLQIANVNLKVRRYG (illegal gap character at the start)

RFLLLSL--ARFSNNRFGVQLLQIANVNLKVRRYG (illegal gap character in the middle)

RFLLLSLARFSNNRFGVQLLQIANVNLKVRRYG* (illegal asterix character at the end)

RFLLLSL_ARFSNNRFGVQLLQIANVNLKVRRYG (illegal underscore character)

RFLLLSL.ARFSNNRFGVQLLQIANVNLKVRRYG (illegal period character)