Structures of several large protein complexes and assemblies are difficult to obtain using a single experimental or computational method. Integrative structure determination fills this gap; various types of experimental data are combined along with principles from physics, statistical inference, and prior models to obtain the structure. The different sources of input information may span multiple scales; for example, X-ray data is at the atomic scale, while FRET distances are at the domain scale. However, these sources can be complementary; for example, EM maps may provide the shape of a complex while chemical crosslinks may provide the orientation of binding interfaces. We have used structural, biochemical, biophysical, cell biological, genetic, and in-silico bioinformatics information for deducing the structure of assemblies.
This approach has several advantages. First, the models are more complete than those generated by other methods, since proteins can be modelled at full-length, including regions of unknown structure. Second, different types of information at different scales can be combined objectively, considering the uncertainty of each experiment, and without using arbitrary weights, via the Bayesian inference framework. Third, the approach produces all models consistent with input information, allowing us to quantity the error bars (precision) of the structure. Finally, models generated by the integrative approach are validated by several methods, including by experiments based on the models, providing a high level of confidence to the determined structure.
In contrast to other approaches to structure determination, our specific focus is on determining domain-level organisation of large assemblies, based on medium-resolution (~5-40 A) experimental data. Our models are usually coarse-grained, i.e., represented at worse than atomic resolution, therefore the focus is more on identifying overall domain-level organisation instead of the finer atomic details. Second, the input information can be noisy, ambiguous, sparse, and incoherent (i.e., based on a heterogeneous sample). Therefore, more than one model can fit the data and the integrative modeling approach produces an ensemble of models consistent with the data.
Opportunities exist in our group to work on several such assemblies. Three broad areas include (a) assemblies involved in regulating gene expression, including chromatin remodelers (b) assemblies at cell-cell junctions, and (c) cellular machineries involved in cytoskeletal organization and cell division such as centriolar and centrosomal protein complexes. We collaborate closely with other cell and structural biologists to validate predictions from our models and generate data for the modeling.
Three broad areas we work on
We recently determined the structures of sub-complexes of the Nucleosome Remodeling and Deacetylase, NuRD complex, a chromatin-modifying assembly that regulates gene expression and is conserved across plant and animal species. Using Bayesian integrative structure determination, we combined information from SEC-MALLS, DIA-MS, XLMS, negative stain EM, X-ray crystallography, and NMR spectroscopy, secondary structure, and homology predictions. The integrative structures were further validated by independent cryo-EM maps, biochemical assays, and known cancer-associated mutations. Based on the structures, we proposed a model showing how the two enzymatic modules in the assembly maybe connected by MBD.
Two-state model of MBD3 binding in NuRD
Of late, AI-based methods have enabled amazing advances in structural biology and it is an exciting, fast-paced field! For us, simulation and analysis of large macromolecular assemblies leads us to interesting opportunities for developing new modeling methods. Accordingly, our other focus is in developing rigorous methods and software for computational modeling of protein organization. These methods make structure determination more accurate and efficient by improving upon approaches that are ad hoc, semi-automated, based on trial-and-error, and/or require manual expert intervention. We use algorithms from computational statistics, statistical physics, machine learning and statistical inference, optimization, computer vision, and graph theory.
We developed a method to optimize the sampling-related parameters for modeling assemblies in IMP. Stochastic Optimization of Parameters, i.e. StOP automates the tuning of MCMC parameters such as rigid body and bead move sizes, restraint weights, and replica exchange temperatures.
Schematic of StOP
PrISM is our recently developed method to identify high and low precision regions in an ensemble of integrative models of large macromolecular assemblies. PrISM is now used in the pipeline for validating integrative models deposited in the wwPDB (worldwide Protein Data Bank), making us a part of the PDBDev Model Validation Group!
Precision for integrative structural models