InterPro database collects domain annotations from diverse source. It provides functional analysis of proteins by classifying them into families. Protein functional similarity can be inferred from shared domains between proteins because protein domains are thought of basic blocks of proteins. How to automatically and quantitatively compare functional similarity between genes and domains? The Ara2Rice tool uses the term frequency-inverse document frequency (TF-IDF) and Latent semantic analysis (LSA) to the gene-domain count matrix, then the genes are represented by 100-dimential vectors, which can further used for gene/domain similarity calculation and functional network construction.
How to use the tool
The tool mainly consists of four tabs for gene/domain similarity calculation. First one for gene, second one for domain similarity comparison, third and fourth ones for gene and domain functional networks. In the left input panel, users can input a gene/domain or a list of genes/domains by selcting or entering. The similarity data can be ordered by ascending or descending by clicking the small triangle at the column header (also for Tables 2, 3 and 4). Or users can select a module number to browse the gene/domain functional network and its annotation. A download button is provided for downloading the gene/domain vector data. In the gene/domain network tab, if a module is inputted, in the right main panel, four subtabs will be returned. Tab 1 is the gene/domain list of the module. It provides the following information about the query module: Affymetrix ID, Entrez gene ID, Gene Symbol, kTotal: total connectivity, kWithin: within module connectivity, kOut: out of module connectivity, kDiff: the difference between within module connectivity and out of module connectivity, Module: the numbering of module. Tab 2 is the functional annotation for the module genes/domains. It provides information about the selected module, including GO and pathway enrichment. Tab 3 is the visualization of the module nodes. To visualize a network module, users can select a module in the left panel. To control the network size, users can slide the percentage bar to show only the top connected connections. Users can pull and drag the element in the network. Tab 4 is the module gene expression in Arabidopsis and rice with high expression status. The module level expression is summarized by its first component by PCA analysis, which represents module gene expression.
Please note that similarity calculation comsumes time.
Contact information
If you have any questions or ideas about the tool, please email to weilau@fafu.edu.cn