Precalculated match lookup ========================== InterProScan uses a lookup service to check whether or not a protein submitted to it has been encountered before and, therefore, if matches exist. (see "How to Run" in User documentation). This generic mechanism is based upon a REST web service that retrieves data from a `BerkeleyDB `__ database. The client to this service is built into the InterProScan software, to allow lookup from the web service. The web service can be installed and run "out of the box", using `Jetty `__. The service support two simple queries: - "Do these sequences need to be analysed?" This query returns protein sequences that have **not** been analysed previously. Proteins are considered to have been analysed previouslyeven if they have no matches. - **Input**: Set of protein sequence MD5 checksums - **Output**: MD5 checksums of proteins that have **not** been analysed previously - "What are the matches for these sequences?" - **Input**: Set of protein sequence MD5 checksums - **Output**: Simple "BekerkeleyMatchXML" document containing all matches. Both of these services are used in InterProScan - the former to ensure that protein sequences with no matches are not re-analysed needlessly. Incorporation into InterProScan --------------------------------- The hook into this service is from the `ProteinLoader `__ class, into which is injected a `BerkeleyPrecalculatedProteinLookup `__, which is an implementation of the `PrecalculatedProteinLookup `__ interface. A `MatchHttpClient `__ instance is injected into the `BerkeleyPrecalculatedProteinLookup `__ class, which is used to query the web service. The client is configured from properties to set the URL of the web service, should users wish to install the web service locally. The `BerkeleyPrecalculatedProteinLookup `__ then uses the client to query for pre-calculated matches / proteins that have been previously analysed. Complete InterProScan Protein objects with a set of Matches are returned from the `BerkeleyPrecalculatedProteinLookup `__ to the `ProteinLoader `__ instance. The `ProteinLoader `__ then persists these matches and ensures that the Protein objects included are **not** scheduled for reanalysis.