CRISPR’s Hidden Universe: FLSHclust Algorithm Unlocks Secret Gene Modules

CRISPR Genetic Editing Concept

Using the novel algorithm FLSHclust, researchers have identified 188 rare and previously unknown CRISPR-linked gene modules, including a groundbreaking type VII CRISPR-Cas system. This discovery, made from an analysis of an extensive 8.8 terrabase pair database containing 8 billion proteins, highlights the untapped diversity of CRISPR systems. Credit:

Researchers using the new FLSHclust algorithm discovered 188 unique CRISPR-linked gene modules, including a novel type VII CRISPR-Cas system, in a massive protein database. This breakthrough enhances our understanding of CRISPR systems and their potential in biotechnological innovations.

Researchers have developed a new algorithm, FLSHclust (“flash clust”), leading to the discovery of 188 rare and previously unknown CRISPR-linked gene modules. This includes a novel type VII CRISPR-Cas system found among billions of protein sequences. The findings of this approach offer new possibilities for exploiting CRISPR systems and exploring the vast diversity of microbial proteins.

CRISPR’s Growing Impact in Biotechnology

CRISPR systems are instrumental in developing a range of innovative biomolecular methods, particularly in CRISPR/Cas-mediated genome editing. The identification of new CRISPR systems can significantly advance these biotechnologies, potentially resulting in safer and more efficient genomic therapies. Traditionally, the CRISPR toolbox has been expanded through computational searches in protein sequence databases.

FLSHclust: A Solution to Protein Data Analysis

However, existing algorithms are struggling to manage the rapidly growing datasets that now contain billions of proteins. To overcome this challenge, Han Altae-Tran and his team created FLSHclust (fast locality-sensitive hashing-based clustering). This new algorithm clusters proteins based on sequence similarity and can analyze extensive protein sequence databases swiftly and effectively, a task that current methods cannot accomplish efficiently.

Innovative Research and Results

To test FLSHclust, Altae-Tran et al. applied it to search for rare CRISPR systems in an 8.8 terrabase pair metagenomic database, which included 8 billion proteins and 10.2 million CRISPR arrays. Their analysis led to the identification of 188 previously unknown CRISPR-associated genes. Notably, they also discovered and detailed a new class of CRISPR system containing Cas-14, type VII, which targets RNA.

Rare CRISPR Systems and Future Potential

According to the findings, the newly identified systems were rare, and many only encompassed a single cluster out of the nearly 130,000 CRISPR-linked clusters revealed by FLSHclust.

“The discovery of previously unknown cas genes and CRISPR systems substantially expands the known CRISPR diversity, emphasizing the functional versatility of CRISPR whereby previously undiscovered proteins and domains are often recruited, either replacing preexisting components or conferring newly identified functions to the preexisting scaffold of Cas proteins,” writes the authors.

“Taken together, the results of the work reveal unprecedented organizational and functional flexibility and modularity of CRISPR systems but also demonstrates that most variants are rare and only found in relatively unusual bacteria and archaea.”

For more on this research, see 188 New CRISPR Systems Unveiled by Smart Algorithm.

Reference: “Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering” by Han Altae-Tran, Soumya Kannan, Anthony J. Suberski, Kepler S. Mears, F. Esra Demircioglu, Lukas Moeller, Selin Kocalar, Rachel Oshiro, Kira S. Makarova, Rhiannon K. Macrae, Eugene V. Koonin and Feng Zhang, 23 November 2023, Science.
DOI: 10.1126/science.adi1910

Be the first to comment on "CRISPR’s Hidden Universe: FLSHclust Algorithm Unlocks Secret Gene Modules"

Leave a comment

Email address is optional. If provided, your email will not be published or shared.