Information Extraction Results

Large-scale extraction of regulatory gene/protein networks from Medline

Organism specific partionions of Medline annotated with gene expression regulation
We provide annotation of sentences from Medline in the following bracketed structure:

Named entities are bracketed [_nx... ... ]. NX abbreviates noun chunk. The following letters identify the semantic type of the noun chunk, f.ex. nxprot abbreviates protein noun chunk.
Relations are named [_ev.... ev refers to an event. The letters following ev indicate the template. For gene expression this is indicate through expr. The specific type of protein-gene interaction is indicated as well. For activation it is act, for repression it is rep and for neutral regulation it is reg.
The last group of letters indicate if it's a verbal relation (v) or a nominal relation, whether it's active or passive (a vs. p) and whether it's negated (n).

Organism specific Partionions of Medline with protein phosporylation and dephosphorylation annotated
We provide annotation of sentences from Medline in the following bracketed structure:

Named entities are bracketed [_nx... ... ]. NX abbreviates noun chunk. The following letters identify the semantic type of the noun chunk, f.ex. nxprot abbreviates protein noun chunk .
Relations are indicated with [_ev..., with ev referring to an event. The letters following ev indicate the template, i.e. the phosphorylation (phos), the dephoshphorylation (dphos), or autophosphorylation (autophos)template.
The last group of letters indicates whether it's a verbal relation (v) or a nominal relation, whether it's active or passive (a vs. p) and whether it's negated (n).

Re-tokenising the corpus. Example: B/CD28-responsive was formally tokenised as 3 tokens, i.e. B, /, CD28, and -responsive.
Disambiguating the PoS-annotation. Example: IN|CC for of/or has been changed such that of, /, and or each occur as separate token with each its own PoS-tag.
Correcting the PoS-annotation. A series of wrong PoS-annotations has been changed. Example: the PoS-tags -, XT, CT, and N are annotated, but not part of UPenn tagset. We've put in the correct PoS-tags.
Adapting the tagset. We have adapted the tags such, that auxialliary verbs that derive from be are annotated with VB.... Verbs that derive from have are annotated with VH.... The others are annotated with VV....

Part-of-speech tagging was performed using Tree-tagger with a custom parameter file:

Jasmin Saric, Lars J. Jensen, and Isabel Rojas
"Large-scale Extraction of Gene Regulation for Model Organisms in an ontological context"
In Silico Biology, 5, 0004, 2004
(Available online)
Jasmin Saric, Lars J. Jensen; Rossitza Ouzounova, Isabel Rojas, and Peer Bork
"Extraction of regulatory gene expression networks from PubMed"
Proceedings of the ACL 2004 Conference, Barcelona, Spain, 2004
(PDF).