Curation Protocol

As an application scenario, ComPath was used to generate the pathway mappings between three of the major public pathway databases (KEGG, Reactome, and WikiPathways). The protocol established for the mapping is described below.

Mapping Files

A total of 3 mapping files were curated, one for each pairwise comparison:

  • KEGG vs Reactome
  • KEGG vs WikiPathways
  • Reactome vs WikiPathways

Description

For each pairwise pathway database comparison (resource 1 vs resource 2), we generated all possible mappings between pathways in each database (KEGG-WikiPathways, KEGG-Reactome, and WikiPathways-Reactome) and prioritized them based on the follow two independent metrics that have been proposed to calculate pathway similarity (Belinky et al., 2015):

  1. Lexical similarity between each pair of pathways' names was calculated using the Levenshtein distance (Levenshtein, 1966).
  2. Content similarity between each pair of pathways' genes was calculated using the previously described Szymkiewicz-Simpson coefficient.

With the help of this prioritization exercise, three curators from different areas of expertise (neuroscience, medicine, and biology) independently assigned the mapping types and mark false positives after evaluating the equivalent scope in both pathway descriptions. Links to the pathway descriptions were attached so that the curator can confirm that the focus and context of the pathways remain the same, even if the both have high similarity or name. Furthermore, we investigated possible intra-database mappings within KEGG and WikiPathways since these resources do not yet contain hierarchical relationships.

Inter-curator agreement

The final step of the curation exercise was to agree on the proposed mappings. Therefore, all three curators examined together one by one the combination of all mappings independently generated to reach a consensus on a final set of mappings that are available at ComPath Resources under the MIT License.

Types of Mappings

ComPath curation interface allows for two types of mappings: equivalentTo and isPartOf.

  • equivalentTo. An undirected relationship denoting both pathways refer to the same biological process. The requirements for this relationship are:
    • Scope: both pathways represent the same biological pathway information.
    • Similarity: both pathways must share at minimum of one overlapping gene.
    • Context: both pathways should take place in the same context (e.g., cell line, physiology).
  • isPartOf. A directed relationship denoting the hierarchical relationship between the pathway 1 (child) and 2 (parent). The requirements are:
    • Subset: The subject (pathway 1) is a subset of pathway 2 (e.g., Reactome pathway hierarchy).
    • Similarity: same as above.
    • Context: same as above.

References

  • Belinky, F., et al. PathCards: multi-source consolidation of human biological pathways. Database, 2015 (2015).
  • Levenshtein, V. I. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady. 10 (8): 707–710 (1966).

About

ComPath is developed and maintained in an academic capacity by Daniel Domingo-Fernández and Charles Tapley Hoyt at the Fraunhofer SCAI Department of Bioinformatics. This web application relies on data loaded from KEGG, Reactome, and WikiPathways Restful APIs, as well as MSigDB. More information here.