[Frontier Information] The first AI gene editor has successfully edited the human genome
On April 23, 2024, Ali Madani's team, co-founder and CEO of AI protein design startup Profluent Bio, published a paper in the bioRxi preprint entitled "Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences" in bioRxi Preprints. The study demonstrates that language models can successfully generate diverse CRISPR-Cas proteins and prove that they can be assembled as functional gene editors in human cells. This is the first time that the human genome has been successfully edited by proteins designed entirely by machine learning. The release of the world's first open-source AI gene editor, OpenCRISPR-1, works similarly to CRISPR-Cas9 but with more than 400 mutations and 95% less off-target editing, thereby increasing its accuracy.
In this article, the researchers used artificial intelligence (AI) techniques, specifically large-scale language models (LLMs), to design a new gene editing tool, OpenCRISPR-1. Here are the main steps they took to design OpenCRISPR-1:
1. dataset construction: the researchers constructed a dataset containing more than one million CRISPR operons by systematically mining 26 trillion bytes of the assembled genome and macro genome data, a dataset called CRISPR-Cas Atlas.
2. AI model training: the researchers used this dataset to train AI models to generate new CRISPR-Cas protein sequences
Figure 1. Overview of the language modeling approach to design CRISPR-Cas systems.
3. Sequence generation: By fine-tuning the AI model, the researchers generated four million sequences, half of which were generated directly from the model and the other half by using the N-terminal or C-terminal sequences of natural proteins as hints to guide the generation of sequences for specific families.
4. Sequence screening: All generated sequences were screened against a series of BLAST and HMM comparison criteria to remove degenerate sequences.
Figure 2 Frequency of generated protein families in different types of CRISPR-Cas system
5. Structure prediction: The researchers used AlphaFold2 to perform structure prediction on the generated sequences to ensure that they would adopt a folding structure similar to that of natural proteins.
Figure 3 Predicted structure of Cas9-like proteins
6. Functional assessment: The researchers functionally assessed the generated Cas9-like proteins in human cells to determine if they could be used as gene editing tools.
Fig. 4 Functional evaluation of synthesized Caspase for editing efficiency and off-target editing efficiency specificity comparison.
7. Specificity editing: The researchers paid particular attention to sequences compatible with SpCas9, a widely used CRISPR-Cas effector protein, to facilitate direct comparison of protein activity on the same genomic targets.
8. Selection of OpenCRISPR-1: After a series of experiments and evaluations, the researchers identified a protein called PF-CAS-182, comparable to SpCas9 in activity and specificity but significantly different from its sequence. This protein was named OpenCRISPR-1 and publicly released to promote its widespread use in research and commercial applications.
9. Further experimental validation: The researchers also performed additional experimental validation of OpenCRISPR-1, including testing its activity on different PAM sequences and its ability to be translated into a bottom-editing system.
Fig. 5 Comparison of editing efficiency, relative activity of OpenCRISPR-1 proteins, and editing efficiency of different target sites
CRISPR gene editing already has the potential to revolutionize the medical field, and adding AI to the mix takes it to a whole new level. Ali Madani et al.'s research is a breakthrough in the application of AI in the biotechnology field and offers new possibilities for developing gene editing technology. OpenCRISPR-1 is just the tip of the iceberg, and the combination of AI and CRISPR will become even closer in the future.
Reference link: https://www.profluent.bio
EDITGENE specializes in CRISPR for years. Through thousands of CRO gene editing projects, EDITGENE continues to optimize and upgrade Cas9 protein to a whole new level. EDITGENE provides high-quality gene editing CRO services, such as CRISPR library screening, gene point mutation, gene knockout, gene knock-in, etc., for universities and research institutes worldwide.
Recent News:
【Star of the Month】Human Epigenetic Knockout Library, Mouse & Human Whole Genome Knockout/ Activation Library
[Literature Review] Improving the Sensitivity of LbuCas13a Detection using Engineered crRNAs
The Knits and Grits behind single-cell CRISPR screening
Comment (4)