David Jones

Biomedical Data Science Laboratory

My lab is on secondment from the UCL Department of Computer Science and mostly focuses on the applications of machine learning or "artificial intelligence" to different biological problems, but particularly in the realm of protein structure and gene function.

Over the past 5 years, we have collected a very large amount of both experimental and predicted functional data (calculated using the UCL Legion Supercomputer) for every human gene e.g. sequence similarity, gene co-expression, predicted gene fusions and so on. So far we have used this data to predict the biological functions of functionally uncharacterised genes with a lot of success e.g. topping the rankings in the international Critical Assessment of protein Function Annotation (CAFA) algorithms challenge in 2011, and this work has led to new developments, including a project funded by Elsevier via the UCL Big Data Institute. More recently we have also applied our techniques to predicting drosophila gene function as part of a large BBSRC funded project, exploiting proteomics data for the first time in this area.

Currently, most of the group are engaged in a new ERC Advanced Grant to look at applications of machine learning and evolutionary covariation to problems such as protein structure and function prediction, modelling and prediction of protein-protein interactions, RNA structure, intrinsically disordered proteins and transmembrane domains. We are also exploring novel aspects of synthetic biology and protein design.

A longer term goal for us is to use our unique collection of bioinformatics data along with new AI techniques to discover novel gene-disease associations

More generally, we are always very keen to try to use our expertise in applying state-of-the-art machine learning techniques to difficult biological problems to tackle other interesting problems that may be posed by the experimentalists working at the Crick. If you want to discuss a collaboration with us or just want to discuss ideas on how AI might be applied to your project, then please do get in touch.

Selected Publications

DWA Buchan, F Minneci, TCO Nugent, K Bryson, DT Jones. Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic acids research (2013) 41:W349-W357

DT Jones, D Cozzetto. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics (2015) 6: 857-863

T Nugent, DT Jones. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proceedings of the National Academy of Sciences (2012) 109: E1540-E1547

DT Jones, DWA Buchan, D Cozzetto, M Pontil. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics (2012) 28: 184-190

YJ Edwards, AE Lobley, MM Pentony, DT Jones. Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol (2009) 10: R50

David Jones

+44 (0)20 379 63300