I am currently a research scientist at Google DeepMind. I work on deciphering the human genome with advanced machine learning techniques. I led the AlphaMissense poject at Google DeepMind, an effort dedicated to the functional interpretation of disease-causing genetic variants. My broader research interests lie in training genomics foundation models capable of addressing diverse and critical challenges in the field. My prior work has included building sequence-based models to elucidate core biological mechanisms, and predicting variant effects for coding and non-coding variants. I received my PhD from the Technical University of Munich (TUM) at Julien Gagneur’s lab on computational biology. Please refer to my Google Scholar for a complete list of my publications.

Email: s6juncheng [at] gmail [dot] com


Publications

Co-first and co-corresponding authors are indicated with + and * respectively.

Genetic variant interpretation

Biological Discoveries

Computational Immunology

Bioinformatics & Machine learning


Software

Here is a list of open source software that I developed or had major contribution to. These tools are typically implementation of machine learning models originated from research projects.


AlphaMissense

Implementation of AlphaMissense model.

MMSplice & MTSplice

Predict variant effect on splicing. MMSplice is the winning model of the CAGI5 splicing challenge. MMSplice is also integrated in the popular general purpose variant effect predictor CADD. MTSplice enhances MMSplice by predicting tissue-specific variant effect. Currently, Muhammed Hasan Çelik and I are maintaining the tool.

ggpval

A R package to add statistical test and P-value annotations to ggpplot2. Currently, the user community and myself are maintaining the tool.

BERTMHC

A python package to re-train and predict with BERTMHC model, a transformer model to predict binding and presentation of peptides by MHC class II.

DCC

A python package to detect circRNAs from next-generation sequence data. Currently, the Dieterich lab is maintaining the tool.


I contributed the following projects:

kipoi

Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for genomics. It currently contains 2133 different models, covering canonical predictive tasks in transcriptional and post-transcriptional gene regulation.

Inviated talks

  • Keynote speaker at CHIL 2023: Biological Sequence Modeling in Research and Applications
  • Guest Lecture at Imperial College, December 2023
  • Guest Lecture Cambridge University Genomic Medicine Society, March 2023
  • Guest Lecture Vrije Universiteit Amsterdam Guest Lecture, Nov 2023
  • Models, Inference & Algorithms (MIA) seminar at Broad Institute, March 2024
  • Oxford ML School, July 2024
  • Keynote at ISMB VarICOSI 2024
  • CZI workshop: Applications of AI to Rare Disease Diagnosis, October 2024