|
||||||
|
Objective
One of the important features of the proteomic research is the stuff's ability for the rapid re-orienting towards the scientific background of the ongoing project. To follow and to mine the overwhelming amounts of the published data in the intellectual way we are developing the SOTA. Knowledge Accumulation In course of the proteomic project implementation the papers, relevant to the topic, are extracted from the PubMed / PubMed Central. The uploaded texts are represented as the numerical vectors of words occurrences. Texts are then organized using SOM (self-organizing map), and than split into the clusters by k-means algorithms. On the other hand, the words themselves produce the SOM and clustered. The latter is used to provide the annotations for the clusters of texts. User interacts with the SOM, by eliminating irrelevant texts (or even whole clusters) and by appending the new articles. Generating the Hypotheses Obtaining the list of identified proteins one converts them to the descriptions. The description can be given (1) as a set of articles, citing the protein/gene name and (2) as gene onthologies. The descriptions are fitted to the existing self-organizing map. By navigating between the descriptions shown on the map of texts, one gets the corresponding route on the SOM of terms (words) - providing the experiment-dependent list of terms. Unraveling the Protein Networks Performing the cluster analysis of the hyperplane produced by COVAG one can get the lists of proteins (spots), which might be involved into the same metabolic processes within the cell. Web SOTA: CANCER MARKERS application provides the output of the SOM performed to the 475 PMCentral abstracts related to the markers of cancer.
Contact: andrey.lisitsa@ibmc.msk.ru |