Running InterProScan 5 in CONVERT mode

InterProScan 5’s CONVERT mode allows you to reformat an existing InterProScan XML result file into any other possible output format (TSV, GFF3, JSON). For compatibility reasons you can also convert XML results into InterProScan 4.8 raw format (RAW). This will give our users enough time to migrate their pipeline to InterProScan 5.

Please note it is NOT possible to reformat any non-XML format. XML is the richest data type and is therefore the only format which allows us to produce any other format of interest.

For more information on InterProScan formats available see `output formats <OutputFormats.html>__.

To enable InterProScan 5 to run in CONVERT mode you need to set the mode option to ‘CONVERT’.

Usage instructions

./interproscan.sh -mode convert

You will see the following usage instructions:

Welcome to InterProScan 5RC7
usage: java -XX:+UseParallelGC -XX:+AggressiveOpts
            -XX:+UseFastAccessorMethods -Xms512M -Xmx2048M -jar
            interproscan-5.jar

Please give us your feedback by sending an email to
interhelp@ebi.ac.uk
 -b,--output-file-base <OUTPUT-FILE-BASE>   Optional, base output filename
                                            (relative or absolute path).
                                            Note that this option and the
                                            --outfile (-o) option are
                                            mutually exclusive.  The
                                            appropriate file extension for
                                            the output format(s) will be
                                            appended automatically. By
                                            default the input file
                                            path/name will be used.
 -d,--output-dir <OUTPUT-DIR>               Optional, output directory.
                                            Note that this option and the
                                            --outfile (-o) option or the
                                            --output-file-base (-b) option
                                            are mutually exclusive. The
                                            appropriate file extension for
                                            the output format(s) will be
                                            appended automatically. By
                                            default the input file
                                            path/name will be used.
 -f,--formats <OUTPUT-FORMATS>              Optional, case-insensitive,
                                            comma separated list of output
                                            formats. Supported formats are
                                            TSV, XML, JSON, and GFF3.
                                            Default for protein sequences
                                            are TSV, XML and GFF3, or
                                            for nucleotide sequences
                                            GFF3 and XML.
 -i,--input <INPUT-FILE-PATH>               Optional, path to fasta file
                                            that should be loaded on
                                            Master startup. Alternatively,
                                            in CONVERT mode, the
                                            InterProScan 5 XML file to
                                            convert.
 -o,--outfile <EXPLICIT_OUTPUT_FILENAME>    Optional explicit output file
                                            name (relative or absolute
                                            path).  Note that this option
                                            and the --output-file-base
                                            (-b) option are mutually
                                            exclusive. If this option is
                                            given, you MUST specify a
                                            single output format using the
                                            -f option.  The output file
                                            name will not be modified.
                                            Note that specifying an output
                                            file name using this option
                                            OVERWRITES ANY EXISTING FILE.
 -T,--tempdir <TEMP-DIR>                    Optional, specify temporary
                                            file directory (relative or
                                            absolute path). The default
                                            location is temp/.
Copyright (c) EMBL European Bioinformatics Institute, Hinxton, Cambridge,
UK. (http://www.ebi.ac.uk) The InterProScan software itself is provided
under the Apache License, Version 2.0
(http://www.apache.org/licenses/LICENSE-2.0.html). Third party components
(e.g. member database binaries and models) are subject to separate
licensing - please see the individual member database websites for
details.

Example Usage

# Convert from XML format to all other available formats
./interproscan.sh -mode convert -f tsv,gff3,raw -i /path/to/existing_output_file.xml -b /path/to/output_file_basename

# Convert from XML format to TSV format (which automatically includes all available InterPro entry/GO term/pathways information)
./interproscan.sh -i /path/to/existing_output_file.xml -mode convert -f tsv -o /path/to/new_output_file.tsv