-
Prescient Design • Genentech
- Manhattan, NY
- https://ncfrey.github.io/
- @nc_frey
Starred repositories
A programming framework for agentic AI 🤖
High accuracy RAG for answering questions from scientific documents with citations
Chai-1, SOTA model for biomolecular structure prediction
Joint embedding of protein sequence and structure with discrete and continuous compressions of protein folding model latent spaces. https://www.biorxiv.org/content/10.1101/2024.08.06.606920v1
Benchmark for Biophysical Sequence Optimization Algorithms
Lbster: Language models for Biological Sequence Transformation and Evolutionary Representation
Transform datasets at scale. Optimize datasets for fast AI model training.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
A Modular Architecture for Deep Learning Systems
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)
"Probabilistic Machine Learning" - a book series by Kevin Murphy
Saprot: Protein Language Model with Structural Alphabet (AA+3Di)
Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/task utilities. (ICLR 2024)
Official repository for discrete Walk-Jump Sampling (dWJS)
A concise but complete full-attention transformer with a set of promising experimental features from various papers
A collection of QM data for training potential functions
Official repository for discrete Walk-Jump Sampling (dWJS)
Official code repository for EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation
Code for the ProteinMPNN paper
Interaction Fingerprints for protein-ligand complexes and more
Working with molecular structures in pandas DataFrames
Explainer for black box models that predict molecule properties