The aim of my research is to link the relationship between phenotype and genome. In particular, I will develop methods to predict genes related to phenotypic changes from whole genome data such as genome sequences, gene expression data, metabolic profile, epigenetic profile. To examine whether the inferred genes are in fact associated with the expected functions, I will validate each gene function by experimental approaches. My research focuses on methylation, substitution mutation (point mutation) and insertion (duplication) in genes. However, these analyses assume that all genes in available genomes are annotated, which is not the case. Therefore, I will also identify novel genes in intergenic regions using a molecular evolutionary algorithm. Focusing on land plants, I propose two projects. First, using wild strains of Arabidopsis thaliana, I will identify key genes for the plant adaptation to natural environments by focusing on secondary metabolites (Project 1). Second, I will try to identify genes related to agriculturally important traits by experimental analysis of novel coding genes identified by evolutionary analyses (Project 2).

1.Variation of phenotypic effects in Arabidopsis accessions

Being sessile organisms, plants produce a huge diversity of defense chemicals, known as secondary metabolites, which has been selected throughout the course of evolution in different plant lineages to adapt to a variety of biotic and abiotic environments. These metabolites are useful sources for pharmaceuticals such as anticancer and antimalarial drugs. The genomes of many wild isolates of Arabidopsis thaliana have been sequenced. I have grown 76 strains from seed available from Arabidopsis Biological Resource Center (ABRC) and examined the variation in expression of all the genes, as well as the concentrations of approximately 500 secondary metabolites. It should be noted that general association studies have identified a wide range of loci associated with a phenotypic trait using only SNP data. I am currently developing a new method that can infer genes associated with a phenotypic trait by the polymorphism of both SNP and gene expression. Once the phenotypic genes related to each metabolite are identified, I will experimentally validate whether or not knock-out mutants of the targeted genes have altered targeted metabolite profiles. Currently, many targeted genes are associated with the production of metabolites, but the genes are not enzyme-encoding genes. I will further examine the mechanism by which the non-enzyme-encoding genes trigger metabolite production.

2. Identigy novel small coding genes throughout transcriptome, proteome and phenome.

To study non-annotated genes, I am focusing on small coding genes in intergenic regions because the products of many small genes, such as small peptides, have hormone-like functions and can play an important role in cell-cell interactions. However, small coding genes tend not to be annotated as genes. I have identified approximately 7000 novel small coding genes in the A. thaliana genome. I obtained expression profiles of these newly identified genes under about 50 conditions using a custom microarray developed by me. Currently, I am focusing on small coding genes with high expression and high homology in other plant genomes, and on the amino acid composition of hormone-like peptides in A. thaliana. I have identified many hormone-like peptide candidates by analyzing the over-expression of several targeted genes. The identification of hormone-like peptides is very important for crop plants because only the treatment of synthesized peptides induces morphological effect, abiotic or biotic stress tolerance in crop plants but do not generate transgenic plants for giving such the traits. Much genomic data is available for other plant species. Using as much biological information as possible, I would like to infer the key genes for phenotype evolution. Subsequently, I will validate whether or not the predicted phenotypic genes have the predicted function by either over-expression or knock-out analysis. I therefore intend to study comparative genomics either between species or within the same species by both computational and experimental analyses.