A new open source software package called uQIust that enables protein and RNA structure prediction, molecular simulations, and retrieval and analysis of structural data is now available to investigators. It offers a versatile, efficient, and easy-to-use toolkit for macromolecular structure exploration and analysis, supporting ultrafast clustering and model quality assessment.

Details on the toolkit were recently published in a paper by Jarek Meller, PhD, a professor of environmental health and biomedical informatics at the University of Cincinnati and Cincinnati Children’s, and his colleague Rafal Adamczak of Nicholas Copernicus University in Poland.

“By combining several advanced algorithms and data science methods, uQIust enables ultrafast and low 05/23/2017memory footprint clustering and ranking of very large sets of atomistic or coarse-grained macromolecular structures,” says Meller. “At the same time, uQIust yields results on par with methods that require much higher computational cost in both model quality assessment and clustering analysis.”

The uQIust package combines 1D structural profiles, profile hashing and linear time ranking to enable ultrafast clustering of very large sets of protein or RNA structures.

One key insight involves projecting macromolecular 3D coordinates into a suitable 1D profile and then using profile pre-processing to compute the state frequency vector at each profile position. By using such pre-processing, one can implicitly compare all pairs of models to compute their overall geometric consensus ranking with a linear time complexity algorithm, while yielding results on par with quadratic complexity models.

Another key insight is the use of profile hashing in order to define initial micro-clusters that are then agglomerated in conjunction with 1D-Jury to achieve ultrafast clustering heuristics with a low memory footprint.

A number of widely used methods and utilities for macromolecular structure analysis, including DSSP, RNAview, and FragBag are implemented in uQIust and integrated into workflows for ranking and clustering without the need to use external programs.

The code is freely available to the community at https://github.com/uQlust under the GNU General Public License.

New Software Package uQIust Allows Efficient Clustering and Analysis of Big Macromolecular Data
Tagged on:     

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.