I am currently a senior research scientist at DeepMind. I work on deciphering the human genome with machine learning. I lead the AlphaMissense poject at DeepMind. My previous work involved modeling RNA splicing and degradation, as well as predicting variant effect for coding and non-coding variants. I received my PhD from the Technical University of Munich (TUM) at Julien Gagneur’s lab on computational biology. Please refer to my Google Scholar for a complete list of my publications.
Email: s6juncheng [at] gmail [dot] com
Publications
Co-first and co-corresponding authors are indicated with + and * respectively.
Genetic variant interpretation
Cheng, J.*, Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A., Wong, L.H., Zielinski, M., Sargeant, T., Schneider, R.G., Andrew, S., Jumper, J., Hassabis, D., Kohli, P., Avsec, Ž.*, 2023. Accurate proteome-wide missense variant effect prediction with AlphaMissense. *Science, 381(6664).
Cheng, J.*, Celik, M.H., Kundaje, A. and Gagneur, J., MTSplice predicts effects of genetic variants on tissue-specific splicing. *Genome Biology, 2021.
Cheng J, Nguyen TY, Cygan KJ, Çelik MH, Fairbrother WG, Gagneur J. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. *Genome Biology. 2019 Dec;20(1):1-5.
Mount SM, Avsec Ž, Carmel L, Casadio R, Çelik MH, Chen K, Cheng J, Cohen NE, Fairbrother WG, Fenesh T, Gagneur J. Assessing predictions of the impact of variants on splicing in CAGI5. *Human Mutation. 2019 Sep;40(9):1215-24.
Cheng J, Çelik MH, Nguyen TY, Avsec Ž, Gagneur J. CAGI 5 splicing challenge: Improved exon skipping and intron retention predictions with MMSplice. *Human Mutation. 2019 Sep;40(9):1243-51.
Biological Discoveries
Cheng J, Maier KC, Avsec Ž, Rus P, Gagneur J. Cis-regulatory elements explain most of the mRNA stability variation across genes in yeast. *RNA. 2017 Nov 1;23(11):1648-59.
Cheng J, Metge F, Dieterich C. Specific identification and quantification of circular RNAs from sequencing data. *Bioinformatics. 2016 Apr 1;32(7):1094-6.
Weigelt CM, Sehgal R, Tain LS, Cheng J, Eßer J, Pahl A, Dieterich C, Grönke S, Partridge L. An Insulin-Sensitive Circular RNA that Regulates Lifespan in Drosophila. *Molecular Cell, 2020 Jul 16;79(2):268-79.
Computational Immunology
Jun Cheng*, Kaïdre Bendjama, Karola Rittner, Brandon Malone* BERTMHC: Improves MHC-peptide class II interaction prediction with transformer and multiple instance learning. Bioinformatics, 2021.
Malone B, Simovski B, Moline C, Cheng J, Gheorghe M, Fontenelle H, Vardaxis I, Tennoe S, Malmberg JA, Stratford R, Clancy T.* Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2: toward universal blueprints for vaccine designs. Scientific Reports. 2020
Bioinformatics & Machine learning
Avsec Ž, Kreuzhuber R, Israeli J, Xu N, Cheng J, Shrikumar A, Banerjee A, Kim DS, Beier T, Urban L, Kundaje A.* The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nature Biotechnology. 2019 Jun;37(6):592-600.
Avsec Ž, Barekatain M, Cheng J, Gagneur J. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks. *Bioinformatics. 2018 Apr 15;34(8):1261-9.
Software
Here is a list of open source software that I developed or had major contribution to. These tools are typically implementation of machine learning models originated from research projects.
AlphaMissense
Implementation of AlphaMissense model.
MMSplice & MTSplice
Predict variant effect on splicing. MMSplice is the winning model of the CAGI5 splicing challenge. MMSplice is also integrated in the popular general purpose variant effect predictor CADD. MTSplice enhances MMSplice by predicting tissue-specific variant effect. Currently, Muhammed Hasan Çelik and I are maintaining the tool.
ggpval
A R package to add statistical test and P-value annotations to ggpplot2
. Currently, the user community and myself are maintaining the tool.
BERTMHC
A python package to re-train and predict with BERTMHC model, a transformer model to predict binding and presentation of peptides by MHC class II.
DCC
A python package to detect circRNAs from next-generation sequence data. Currently, the Dieterich lab is maintaining the tool.
I contributed the following projects:
kipoi
Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for genomics. It currently contains 2133 different models, covering canonical predictive tasks in transcriptional and post-transcriptional gene regulation.
Inviated talks
- Keynote speaker at CHIL 2023: Biological Sequence Modeling in Research and Applications
- Guest Lecture at Imperial College, December 2023
- Guest Lecture Cambridge University Genomic Medicine Society, March 2023
- Guest Lecture Vrije Universiteit Amsterdam Guest Lecture, Nov 2023
- Models, Inference & Algorithms (MIA) seminar at Broad Institute, March 2024
- Oxford ML School, July 2024
- Keynote at ISMB VarICOSI 2024
- CZI workshop: Applications of AI to Rare Disease Diagnosis, October 2024