Research Papers

Below is a list of research papers that we have processed data for. This will be updated as and when we get notifications about new publications from end projects.

Asteroids@home

  • [2020] Asteroid models reconstructed from ATLAS photometry (J. Durech, et al)

    The Asteroid Terrestrial-impact Last Alert System (ATLAS) is an all-sky survey primarily aimed at detecting potentially hazardous near-Earth asteroids. Apart from the astrometry of asteroids, it also produces their photometric measurements that contain information about asteroid rotation and their shape. To increase the current number of asteroids with a known shape and spin state, we reconstructed asteroid models from ATLAS photometry that was available for approximately 180,000 asteroids observed between 2015 and 2018. We made use of the light-curve inversion method implemented in the Asteroid@home project to process ATLAS photometry for roughly 100,000 asteroids with more than a hundred individual brightness measurements. By scanning the period and pole parameter space, we selected those best-fit models that were, according to our setup, a unique solution for the inverse problem. We derived ~2750 unique models, 950 of them were already reconstructed from other data and published. The remaining 1800 models are new. About half of them are only partial models, with an unconstrained pole ecliptic longitude. Together with the shape and spin, we also determined for each modeled asteroid its color index from the cyan and orange filter used by the ATLAS survey. We also show the correlations between the color index, albedo, and slope of the phase-angle function. The current analysis is the first inversion of ATLAS asteroid photometry, and it is the first step in exploiting the huge scientific potential that ATLAS photometry has. ATLAS continues to observe, and in the future, this data, together with other independent photometric measurements, can be inverted to produce more refined asteroid models.

    https://arxiv.org/abs/2010.01820

  • [2019] Inversion of asteroid photometry from Gaia DR2 and the Lowell Observatory photometric database (Josef Durech, et al)

    Rotation properties (spin-axis direction and rotation period) and coarse shape models of asteroids can be reconstructed from their disk-integrated brightness when measured from various viewing geometries. These physical properties are essential for creating a global picture of structure and dynamical evolution of the main belt. The number of shape and spin models can be increased not only when new data are available, but also by combining independent data sets and inverting them together. Our aim was to derive new asteroid models by processing readily available photometry. We used asteroid photometry compiled in the Lowell Observatory photometry database with photometry from the Gaia Data Release 2. Both data sources are available for about 5400 asteroids. In the framework of the Asteroids@home distributed computing project, we applied the light curve inversion method to each asteroid to find its convex shape model and spin state that fits the observed photometry. Due to the limited number of Gaia DR2 data points and poor photometric accuracy of Lowell data, we were able to derive unique models for only ~1100 asteroids. Nevertheless, 762 of these are new models that significantly enlarge the current database of about 1600 asteroid models. Our results demonstrate the importance of a combined approach to inversion of asteroid photometry. While our models in general agree with those obtained by separate inversion of Lowell and Gaia data, the combined inversion is more robust, model parameters are more constrained, and unique models can be reconstructed in many cases when individual data sets alone are not sufficient.

    https://arxiv.org/abs/1909.09395

  • [2018] Asteroid models reconstructed from the Lowell Photometric Database and WISE data (Josef Durech, et al)

    Information about the spin state of asteroids is important for our understanding of the dynamical processes affecting them. However, spin properties of asteroids are known for only a small fraction of the whole population. To enlarge the sample of asteroids with a known rotation state and basic shape properties, we combined sparse-in-time photometry from the Lowell Observatory Database with flux measurements from NASA's WISE satellite. We applied the light curve inversion method to the combined data. The thermal infrared data from WISE were treated as reflected light because the shapes of thermal and visual light curves are similar enough for our purposes. While sparse data cover a wide range of geometries over many years, WISE data typically cover an interval of tens of hours, which is comparable to the typical rotation period of asteroids. The search for best-fitting models was done in the framework of the Asteroids@home distributed computing project. By processing the data for almost 75,000 asteroids, we derived unique shape models for about 900 of them. Some of them were already available in the DAMIT database and served us as a consistency check of our approach. In total, we derived new models for 662 asteroids, which significantly increased the total number of asteroids for which their rotation state and shape are known. For another 789 asteroids, we were able to determine their sidereal rotation period and estimate the ecliptic latitude of the spin axis direction. We studied the distribution of spins in the asteroid population. We revealed a significant discrepancy between the number of prograde and retrograde rotators for asteroids smaller than about 10 km. Combining optical photometry with thermal infrared light curves is an efficient approach to obtaining new physical models of asteroids.

    https://arxiv.org/abs/1807.02083

  • [2016] Asteroid models from the Lowell Photometric Database (Josef Durech, et al)

    We use the lightcurve inversion method to derive new shape models and spin states of asteroids from the sparse-in-time photometry compiled in the Lowell Photometric Database. To speed up the time-consuming process of scanning the period parameter space through the use of convex shape models, we use the distributed computing project Asteroids@home, running on the Berkeley Open Infrastructure for Network Computing (BOINC) platform. This way, the period-search interval is divided into hundreds of smaller intervals. These intervals are scanned separately by different volunteers and then joined together. We also use an alternative, faster, approach when searching the best-fit period by using a model of triaxial ellipsoid. By this, we can independently confirm periods found with convex models and also find rotation periods for some of those asteroids for which the convex-model approach gives too many solutions. From the analysis of Lowell photometric data of the first 100,000 numbered asteroids, we derived 328 new models. This almost doubles the number of available models. We tested the reliability of our results by comparing models that were derived from purely Lowell data with those based on dense lightcurves, and we found that the rate of false-positive solutions is very low. We also present updated plots of the distribution of spin obliquities and pole ecliptic longitudes that confirm previous findings about a non-uniform distribution of spin axes. However, the models reconstructed from noisy sparse data are heavily biased towards more elongated bodies with high lightcurve amplitudes.

    https://arxiv.org/abs/1601.02909

Climate Prediction

  • [2019] Increasing mitigation ambition to meet the Paris Agreement’s temperature goal avoids substantial heat-related mortality in U.S. cities (Y. T. Eunice Lo, et al)

    Current greenhouse gas mitigation ambition is consistent with ~3°C global mean warming above preindustrial levels. There is a clear need to strengthen mitigation ambition to stabilize the climate at the Paris Agreement goal of warming of less than 2°C. We specify the differences in city-level heat-related mortality between the 3°C trajectory and warming of 2° and 1.5°C. Focusing on 15 U.S. cities where reliable climate and health data are available, we show that ratcheting up mitigation ambition to achieve the 2°C threshold could avoid between 70 and 1980 annual heat-related deaths per city during extreme events (30-year return period). Achieving the 1.5°C threshold could avoid between 110 and 2720 annual heat-related deaths. Population changes and adaptation investments would alter these numbers. Our results provide compelling evidence for the heat-related health benefits of limiting global warming to 1.5°C in the United States.

    https://advances.sciencemag.org/content/5/6/eaau4373

  • [2019] Attributing the 2017 Bangladesh floods from meteorological and hydrological perspectives (Sjoukje Philip, et al)

    In August 2017 Bangladesh faced one of its worst river flooding events in recent history. This paper presents, for the first time, an attribution of this precipitation-induced flooding to anthropogenic climate change from a combined meteorological and hydrological perspective.

    https://hess.copernicus.org/articles/23/1409/2019/hess-23-1409-2019-discussion.html

  • [2019] Reducing climate model biases by exploring parameter space with large ensembles of climate model simulations and statistical emulation (Sihan Li, et al)

    Understanding the unfolding challenges of climate change relies on climate models, many of which have large summer warm and dry biases over Northern Hemisphere continental mid-latitudes. This work, using the example of the model used in the updated version of the weather@home distributed climate model framework, shows the potential for improving climate model simulations through a multi-phased parameter refinement approach, particularly over northwestern United States (NWUS).

    https://gmd.copernicus.org/articles/12/3017/2019/gmd-12-3017-2019-discussion.html

  • [2018] Impacts of Anthropogenic Forcings and El Niño on Chinese Extreme Temperatures (N. Freychet, et al)

    This study investigates the potential influences of anthropogenic forcings and natural variability on the risk of summer extreme temperatures over China. We use three multi-thousand-member ensemble simulations with different forcings (with or without anthropogenic greenhouse gases and aerosol emissions) to evaluate the human impact, and with sea surface temperature patterns from three different years around the El Niño–Southern Oscillation (ENSO) 2015/16 event (years 2014, 2015 and 2016) to evaluate the impact of natural variability. A generalized extreme value (GEV) distribution is used to fit the ensemble results. Based on these model results, we find that, during the peak of ENSO (2015), daytime extreme temperatures are smaller over the central China region compared to a normal year (2014). During 2016, the risk of nighttime extreme temperatures is largely increased over the eastern coastal region. Both anomalies are of the same magnitude as the anthropogenic influence. Thus, ENSO can amplify or counterbalance (at a regional and annual scale) anthropogenic effects on extreme summer temperatures over China. Changes are mainly due to changes in the GEV location parameter. Thus, anomalies are due to a shift in the distributions and not to a change in temperature variability.

    https://link.springer.com/article/10.1007%2Fs00376-018-7258-8

  • [2018] Anthropogenic contribution to the 2017 earliest summer onset in South Korea (Seung-Ki Min, et al)

    This study examines human contribution to the 2017 extreme May heat and the earliest summer onset in South Korea. To consider small spatial scales, we use high-resolution large-ensemble regional climate model (RCM) and global climate model (GCM) simulations available for the year 2017, each performed with and without anthropogenic forcings.

    https://journals.ametsoc.org/view/journals/bams/100/1/bams-d-18-0096.1.xml

  • [2018] Attributing human influence on the July 2017 Chinese heatwave: the influence of sea-surface temperatures (Sarah Sparrow, et al)

    On 21–25 July 2017 a record-breaking heatwave occurred in Central Eastern China, affecting nearly half of the national population and causing severe impacts on public health, agriculture and infrastructure. Here, we compare attribution results from two UK Met Office Hadley Centre models, HadGEM3-GA6 and weather@home (HadAM3P driving 50 km HadRM3P). Within HadGEM3-GA6 July 2017-like heatwaves were unequaled in the ensemble representing the world without human influences. Such heatwaves became approximately a 1 in 50 year event and increased by a factor of 4.8 (5%–95% range of 3.1 to 8.0) in weather@home as a result of human activity. Considering the risk ratio (RR) for the full range of return periods shows a discrepancy at all return times between the two model results. Within weather@home a range of different counterfactual sea surface temperature (SST) patterns were used, whereas HadGEM3-GA6 used a single estimate. The global mean difference in SST (between factual and counterfactual simulations) is shown to be related to the generalised extreme value (GEV) location parameter and consequently the RR, especially for return periods of less than 50 years. It is suggested that a suitable range of SST patterns are used for future attribution studies to ensure that this source of uncertainty is represented within the simulations and subsequent attribution results. It is shown that the risk change between factual and counterfactual simulations is not purely a simple shift in the distribution (i.e. change in GEV location parameter). For return periods greater than 50 years, the GEV shape parameter is found to strongly influence the RR determined with the GEV scale parameter affecting only the most severe events.

    https://iopscience.iop.org/article/10.1088/1748-9326/aae356

Einstein@home

  • [2021] Einstein@Home all-sky search for continuous gravitational waves in LIGO O2 public data (B. Steltner, et al)

    We conduct an all-sky search for continuous gravitational waves in the LIGO O2 data from the Hanford and Livingston detectors. We search for nearly-monochromatic signals with frequency between 20.0 Hz and 585.15 Hz and spin-down between -2.6e-9 Hz/s and 2.6e-10 Hz/s. We deploy the search on the Einstein@Home volunteer-computing project and follow-up the waveforms associated with the most significant results with eight further search-stages, reaching the best sensitivity ever achieved by an all-sky survey up to 500 Hz. Six of the inspected waveforms pass all the stages but they are all associated with hardware-injections, which are fake signals simulated at the LIGO detector for validation purposes. We recover all these fake signals with consistent parameters. No other waveform survives, so we find no evidence of a continuous gravitational wave signal at the detectability level of our search. We constrain the h0 amplitude of continuous gravitational waves at the detector as a function of the signal frequency, in half-Hz bins. The most constraining upper limit at 163.0 Hz is h0 = 1.3e25, at the 90% confidence level. Our results exclude neutron stars rotating faster than 5 ms with equatorial ellipticities larger than 1e-7 closer than 100 pc. These are deformations that neutron star crusts could easily support, according to some models.

    https://arxiv.org/abs/2009.12260

  • [2020] Exploiting Orbital Constraints from Optical Data to Detect Binary Gamma-ray Pulsars (L. Nieder, et al)

    It is difficult to discover pulsars via their gamma-ray emission because current instruments typically detect fewer than one photon per million rotations. This creates a significant computing challenge for isolated pulsars, where the typical parameter search space spans wide ranges in four dimensions. It is even more demanding when the pulsar is in a binary system, where the orbital motion introduces several additional unknown parameters. Building on earlier work by Pletsch + Clark (arXiv:1408.6962), we present optimal methods for such searches. These can also incorporate external constraints on the parameter space to be searched, for example, from optical observations of a presumed binary companion. The solution has two parts. The first is the construction of optimal search grids in parameter space via a parameter-space metric, for initial semicoherent searches and subsequent fully coherent follow-ups. The second is a method to demodulate and detect the periodic pulsations. These methods have different sensitivity properties than traditional radio searches for binary pulsars and might unveil new populations of pulsars.

    https://arxiv.org/abs/2004.11740

  • [2020] Search for Continuous Gravitational Waves from the Central Compact Objects in Supernova Remnants Cassiopeia A, Vela Jr. and G347.3-0.5 (M.Alessandra Papa, et al)

    We perform a sub-threshold follow-up search for continuous nearly-monochromatic gravitational waves from the central compact objects associated with the supernova remnants Vela Jr., Cassiopeia A, and SNR G347.3−0.5. Across the three targets, we investigate the most promising ~ 10,000 combinations of gravitational wave frequency and frequency derivative values, based on the results from an Einstein@Home search of the LIGO O1 observing run data, dedicated to these objects. The selection threshold is set so that a signal could be confirmed using the newly released O2 run LIGO data. In order to achieve best sensitivity we perform two separate follow-up searches, on two distinct stretches of the O2 data. Only one candidate survives the first O2 follow-up investigation, associated with the central compact object in SNR G347.3-0.5, but it is not conclusively confirmed. In order to assess a possible astrophysical origin we use archival X-ray observations and search for amplitude modulations of a pulsed signal at the putative rotation frequency of the neutron star and its harmonics. This is the first extensive electromagnetic follow-up of a continuous gravitational wave candidate performed to date. No significant associated signal is identified. New X-ray observations contemporaneous with the LIGO O3 run will enable a more sensitive search for an electromagnetic counterpart. A focused gravitational wave search in O3 data based on the parameters provided here should be easily able to shed light on the nature of this outlier. Noise investigations on the LIGO instruments could also reveal the presence of a coherent contamination.

    https://arxiv.org/abs/2005.06544

  • [2020] Discovery of a Gamma-ray Black Widow Pulsar by GPU-accelerated Einstein@Home (L. Nieder, et al)

    We report the discovery of 1.97 ms period gamma-ray pulsations from the 75 minute orbital-period binary pulsar now named PSR J1653-0158. The associated Fermi Large Area Telescope gamma-ray source 4FGL J1653.6-0158 has long been expected to harbor a binary millisecond pulsar. Despite the pulsar-like gamma-ray spectrum and candidate optical/X-ray associations -- whose periodic brightness modulations suggested an orbit -- no radio pulsations had been found in many searches. The pulsar was discovered by directly searching the gamma-ray data using the GPU-accelerated Einstein@Home distributed volunteer computing system. The multi-dimensional parameter space was bounded by positional and orbital constraints obtained from the optical counterpart. More sensitive analyses of archival and new radio data using knowledge of the pulsar timing solution yield very stringent upper limits on radio emission. Any radio emission is thus either exceptionally weak, or eclipsed for a large fraction of the time. The pulsar has one of the three lowest inferred surface magnetic-field strengths of any known pulsar with Bsurf≈4×107G. The resulting mass function, combined with models of the companion star's optical light curve and spectra, suggests a pulsar mass ≳2M⊙. The companion is light-weight with mass ∼0.01M⊙, and the orbital period is the shortest known for any rotation-powered binary pulsar. This discovery demonstrates the Fermi Large Area Telescope's potential to discover extreme pulsars that would otherwise remain undetected.

    https://arxiv.org/abs/2009.01513

  • [2020] Einstein@Home Discovery of the Gamma-ray Millisecond Pulsar PSR J2039-5617 Confirms Its Predicted Redback Nature (C. J. Clark, et al)

    The Fermi Large Area Telescope gamma-ray source 3FGL J2039.6−5618 contains a periodic optical and X-ray source that was predicted to be a "redback" millisecond pulsar (MSP) binary system. However, the conclusive identification required the detection of pulsations from the putative MSP. To better constrain the orbital parameters for a directed search for gamma-ray pulsations, we obtained new optical light curves in 2017 and 2018, which revealed long-term variability from the companion star. The resulting orbital parameter constraints were used to perform a targeted gamma-ray pulsation search using the Einstein@Home distributed volunteer computing system. This search discovered pulsations with a period of 2.65 ms, confirming the source as a binary MSP now known as PSR J2039−5617. Optical light curve modelling is complicated, and likely biased, by asymmetric heating on the companion star and long-term variability, but we find an inclination i > 60°, for a low pulsar mass between 1.1M⊙ < Mpsr < 1.6M⊙ and a companion mass of 0.15--0.22 M⊙, confirming the redback classification. Timing the gamma-ray pulsations also revealed significant variability in the orbital period, which we find to be consistent with quadrupole moment variations in the companion star, suggestive of convective activity. We also find that the pulsed flux is modulated at the orbital period, potentially due to inverse Compton scattering between high-energy leptons in the pulsar wind and the companion star's optical photon field.

    https://arxiv.org/abs/2007.14849

  • [2019] Results from an Einstein@Home search for continuous gravitational waves from Cassiopeia A, Vela Jr. and G347.3 (Jing Ming, et al)

    We report results of the most sensitive search to date for periodic gravitational waves from Cassiopeia A, Vela Jr. and G347.3 with frequency between 20 and 1500 Hz. The search was made possible by the computing power provided by the volunteers of the Einstein@Home project and improves on previous results by a factor of 2 across the entire frequency range for all targets. We find no significant signal candidate and set the most stringent upper limits to date on the amplitude of gravitational wave signals from the target population, corresponding to sensitivity depths between 54 [1/Hz−−−√] and 83 [1/Hz−−−√], depending on the target and the frequency range. At the frequency of best strain sensitivity, near 172 Hz, we set 90% confidence upper limits on the gravitational wave intrinsic amplitude of h90%0≈10−25, probing ellipticity values for Vela Jr. as low as 3×10−8, assuming a distance of 200 pc.

    https://arxiv.org/abs/1903.09119

  • [2019] Optimising the choice of analysis method for all-sky searches for continuous gravitational waves with Einstein@Home (Sinead Walsh, et al)

    Rapidly rotating neutron stars are promising sources of continuous gravitational waves for the LIGO and Virgo observatories. The majority of neutron stars in our galaxy have not been identified with electromagnetic observations. Blind all-sky searches offer the potential to detect gravitational waves from these unidentified sources. The parameter space of these searches presents a significant computational challenge. Various methods have been designed to perform these searches with available computing resources. Recently, a method called Weave has been proposed to achieve template placement with a minimal number of templates. We employ a mock data challenge to assess the ability of this method to recover signals, and compare its sensitivity with that of the global correlation transform method (GCT), which has been used for searches with the Einstein@Home volunteer computing project for a number of years. We find that the Weave method is 14% more sensitive for an all-sky search on Einstein@Home, with a sensitivity depth of 57.9±0.6 1/Hz−−−√ at 90% detection efficiency, compared to 50.8+0.7−1.1 1/Hz−−−√ for GCT. This corresponds to a 50% increase in the volume of sky where we are sensitive with the Weave search. We also find that the Weave search recovers candidates closer to the true signal position. In the search studied here the improvement in candidate localisation would lead to a factor of 70 reduction in the computing cost required to follow up the same number of candidates. We assess the feasability of deploying the search on Einstein@Home, and find that Weave requires more memory than is typically available on a volunteer computer. We conclude that, while GCT remains the best choice for deployment on Einstein@Home due to its lower memory requirements, Weave presents significant advantages for the subsequent hierarchical follow-up of interesting candidates.

    https://arxiv.org/abs/1901.08998

GPUGRID

  • [2020] Small Molecule Modulation of Intrinsically Disordered Proteins Using Molecular Dynamics Simulations (Pablo Herrera-Nieto, et al)

    The extreme dynamic behavior of intrinsically disordered proteins hinders the development of drug-like compounds capable of modulating them. There are several examples of small molecules that specifically interact with disordered peptides. However, their mechanisms of action are still not well understood. Here, we use extensive molecular dynamics simulations combined with adaptive sampling algorithms to perform free ligand binding studies in the context of intrinsically disordered proteins. We tested this approach in the system composed by the D2 sub-domain of the disordered protein p27 and the small molecule SJ403. The results show several protein–ligand bound states characterized by the establishment of a loosely oriented interaction mediated by a limited number of contacts between the ligand and critical residues of p27. Finally, protein conformations in the bound state are likely to be explored by the isolated protein too, therefore supporting a model where the addition of the small molecule restricts the available conformational space.

    https://pubs.acs.org/doi/10.1021/acs.jcim.0c00381

  • [2020] GPCRmd uncovers the dynamics of the 3D-GPCRome (Ismael Rodríguez-Espigares, et al)

    G-protein-coupled receptors (GPCRs) are involved in numerous physiological processes and are the most frequent targets of approved drugs. The explosion in the number of new three-dimensional (3D) molecular structures of GPCRs (3D-GPCRome) over the last decade has greatly advanced the mechanistic understanding and drug design opportunities for this protein family. Molecular dynamics (MD) simulations have become a widely established technique for exploring the conformational landscape of proteins at an atomic level. However, the analysis and visualization of MD simulations require efficient storage resources and specialized software. Here we present GPCRmd (http://gpcrmd.org/), an online platform that incorporates web-based visualization capabilities as well as a comprehensive and user-friendly analysis toolbox that allows scientists from different disciplines to visualize, analyze and share GPCR MD data. GPCRmd originates from a community-driven effort to create an open, interactive and standardized database of GPCR MD simulations.

    https://www.nature.com/articles/s41592-020-0884-y

  • [2017] Dynamic and Kinetic Elements of µ-Opioid Receptor Functional Selectivity (Abhijeet Kapoor, et al)

    While the therapeutic effect of opioids analgesics is mainly attributed to µ-opioid receptor (MOR) activation leading to G protein signaling, their side effects have mostly been linked to β-arrestin signaling. To shed light on the dynamic and kinetic elements underlying MOR functional selectivity, we carried out close to half millisecond high-throughput molecular dynamics simulations of MOR bound to a classical opioid drug (morphine) or a potent G protein-biased agonist (TRV-130). Statistical analyses of Markov state models built using this large simulation dataset combined with information theory enabled, for the first time: a) Identification of four distinct metastable regions along the activation pathway, b) Kinetic evidence of a different dynamic behavior of the receptor bound to a classical or G protein-biased opioid agonist, c) Identification of kinetically distinct conformational states to be used for the rational design of functionally selective ligands that may eventually be developed into improved drugs; d) Characterization of multiple activation/deactivation pathways of MOR, and e) Suggestion from calculated transition timescales that MOR conformational changes are not the rate-limiting step in receptor activation.

    https://www.nature.com/articles/s41598-017-11483-8

  • [2017] Complete protein–protein association kinetics in atomic detail revealed by molecular dynamics simulations and Markov modelling (Nuria Plattner, et al)

    Protein–protein association is fundamental to many life processes. However, a microscopic model describing the structures and kinetics during association and dissociation is lacking on account of the long lifetimes of associated states, which have prevented efficient sampling by direct molecular dynamics (MD) simulations. Here we demonstrate protein–protein association and dissociation in atomistic resolution for the ribonuclease barnase and its inhibitor barstar by combining adaptive high-throughput MD simulations and hidden Markov modelling. The model reveals experimentally consistent intermediate structures, energetics and kinetics on timescales from microseconds to hours. A variety of flexibly attached intermediates and misbound states funnel down to a transition state and a native basin consisting of the loosely bound near-native state and the tightly bound crystallographic state. These results offer a deeper level of insight into macromolecular recognition and our approach opens the door for understanding and manipulating a wide range of macromolecular association processes.

    https://www.nature.com/articles/nchem.2785

Milkyway@home

  • [2021] An Algorithm for Reconstructing the Orphan Stream Progenitor with MilkyWay@home Volunteer Computing (Siddhartha Shelton, et al)

    We have developed a method for estimating the properties of the progenitor dwarf galaxy from the tidal stream of stars that were ripped from it as it fell into the Milky Way. In particular, we show that the mass and radial profile of a progenitor dwarf galaxy evolved along the orbit of the Orphan Stream, including the stellar and dark matter components, can be reconstructed from the distribution of stars in the tidal stream it produced. We use MilkyWay@home, a PetaFLOPS-scale distributed supercomputer, to optimize our dwarf galaxy parameters until we arrive at best-fit parameters. The algorithm fits the dark matter mass, dark matter radius, stellar mass, radial profile of stars, and orbital time. The parameters are recovered even though the dark matter component extends well past the half light radius of the dwarf galaxy progenitor, proving that we are able to extract information about the dark matter halos of dwarf galaxies from the tidal debris. Our simulations assumed that the Milky Way potential, dwarf galaxy orbit, and the form of the density model for the dwarf galaxy were known exactly; more work is required to evaluate the sources of systematic error in fitting real data. This method can be used to estimate the dark matter content in dwarf galaxies without the assumption of virial equilibrium that is required to estimate the mass using line-of-sight velocities. This demonstration is a first step towards building an infrastructure that will fit the Milky Way potential using multiple tidal streams.

    https://arxiv.org/abs/2102.07257

MLC@home

  • [2021] MLDS: A Dataset for Weight-Space Analysis of Neural Networks (John Clemens)

    Neural networks are powerful models that solve a variety of complex real-world problems. However, the stochastic nature of training and large number of parameters in a typical neural model makes them difficult to evaluate via inspection. Research shows this opacity can hide latent undesirable behavior, be it from poorly representative training data or via malicious intent to subvert the behavior of the network, and that this behavior is difficult to detect via traditional indirect evaluation criteria such as loss. Therefore, it is time to explore direct ways to evaluate a trained neural model via its structure and weights. In this paper we present MLDS, a new dataset consisting of thousands of trained neural networks with carefully controlled parameters and generated via a global volunteer-based distributed computing platform. This dataset enables new insights into both model-to-model and model-to-training-data relationships. We use this dataset to show clustering of models in weight-space with identical training data and meaningful divergence in weight-space with even a small change to the training data, suggesting that weight-space analysis is a viable and effective alternative to loss for evaluating neural networks.

    https://arxiv.org/abs/2104.10555

QuChemPedIA@home

  • [2020] EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation (Jules Leguy, et al)

    The objective of this work is to design a molecular generator capable of exploring known as well as unfamiliar areas of the chemical space. Our method must be flexible to adapt to very different problems. Therefore, it has to be able to work with or without the influence of prior data and knowledge. Moreover, regardless of the success, it should be as interpretable as possible to allow for diagnosis and improvement. We propose here a new open source generation method using an evolutionary algorithm to sequentially build molecular graphs. It is independent of starting data and can generate totally unseen compounds. To be able to search a large part of the chemical space, we define an original set of 7 generic mutations close to the atomic level. Our method achieves excellent performances and even records on the QED, penalised logP, SAscore, CLscore as well as the set of goal-directed functions defined in GuacaMol. To demonstrate its flexibility, we tackle a very different objective issued from the organic molecular materials domain. We show that EvoMol can generate sets of optimised molecules having high energy HOMO or low energy LUMO, starting only from methane. We can also set constraints on a synthesizability score and structural features. Finally, the interpretability of EvoMol allows for the visualisation of its exploration process as a chemically relevant tree.

    https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00458-z

Rosetta@home

  • [2020] Computational design of closely related proteins that adopt two well-defined but structurally divergent folds (Kathy Y. Wei, et al)

    The plasticity of naturally occurring protein structures, which can change shape considerably in response to changes in environmental conditions, is critical to biological function. While computational methods have been used for de novo design of proteins that fold to a single state with a deep free-energy minimum [P.-S. Huang, S. E. Boyken, D. Baker, Nature 537, 320–327 (2016)], and to reengineer natural proteins to alter their dynamics [J. A. Davey, A. M. Damry, N. K. Goto, R. A. Chica, Nat. Chem. Biol. 13, 1280–1285 (2017)] or fold [P. A. Alexander, Y. He, Y. Chen, J. Orban, P. N. Bryan, Proc. Natl. Acad. Sci. U.S.A. 106, 21149–21154 (2009)], the de novo design of closely related sequences which adopt well-defined but structurally divergent structures remains an outstanding challenge. We designed closely related sequences (over 94% identity) that can adopt two very different homotrimeric helical bundle conformations—one short (~66 Å height) and the other long (~100 Å height)—reminiscent of the conformational transition of viral fusion proteins. Crystallographic and NMR spectroscopic characterization shows that both the short- and long-state sequences fold as designed. We sought to design bistable sequences for which both states are accessible, and obtained a single designed protein sequence that populates either the short state or the long state depending on the measurement conditions. The design of sequences which are poised to adopt two very different conformations sets the stage for creating large-scale conformational switches between structurally divergent forms.

    https://www.pnas.org/content/117/13/7208

  • [2019] De novo design of potent and selective mimics of IL-2 and IL-15 (Daniel-Adriano Silva, et al)

    We describe a de novo computational approach for designing proteins that recapitulate the binding sites of natural cytokines, but are otherwise unrelated in topology or amino acid sequence. We use this strategy to design mimics of the central immune cytokine interleukin-2 (IL-2) that bind to the IL-2 receptor βγc heterodimer (IL-2Rβγc) but have no binding site for IL-2Rα (also called CD25) or IL-15Rα (also known as CD215). The designs are hyper-stable, bind human and mouse IL-2Rβγc with higher affinity than the natural cytokines, and elicit downstream cell signalling independently of IL-2Rα and IL-15Rα. Crystal structures of the optimized design neoleukin-2/15 (Neo-2/15), both alone and in complex with IL-2Rβγc, are very similar to the designed model. Neo-2/15 has superior therapeutic activity to IL-2 in mouse models of melanoma and colon cancer, with reduced toxicity and undetectable immunogenicity. Our strategy for building hyper-stable de novo mimetics could be applied generally to signalling proteins, enabling the creation of superior therapeutic candidates.

    https://www.nature.com/articles/s41586-018-0830-7

  • [2019] Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus (Jessica Marcandalli, et al)

    Respiratory syncytial virus (RSV) is a worldwide public health concern for which no vaccine is available. Elucidation of the prefusion structure of the RSV F glycoprotein and its identification as the main target of neutralizing antibodies have provided new opportunities for development of an effective vaccine. Here, we describe the structure-based design of a self-assembling protein nanoparticle presenting a prefusion-stabilized variant of the F glycoprotein trimer (DS-Cav1) in a repetitive array on the nanoparticle exterior. The two-component nature of the nanoparticle scaffold enabled the production of highly ordered, monodisperse immunogens that display DS-Cav1 at controllable density. In mice and nonhuman primates, the full-valency nanoparticle immunogen displaying 20 DS-Cav1 trimers induced neutralizing antibody responses ~10-fold higher than trimeric DS-Cav1. These results motivate continued development of this promising nanoparticle RSV vaccine candidate and establish computationally designed two-component nanoparticles as a robust and customizable platform for structure-based vaccine design.

    https://www.cell.com/cell/fulltext/S0092-8674(19)30109-6

  • [2019] Controlling protein assembly on inorganic crystals through designed protein interfaces (Harley Pyles, et al)

    The ability of proteins and other macromolecules to interact with inorganic surfaces is essential to biological function. The proteins involved in these interactions are highly charged and often rich in carboxylic acid side chains but the structures of most protein–inorganic interfaces are unknown. We explored the possibility of systematically designing structured protein–mineral interfaces, guided by the example of ice-binding proteins, which present arrays of threonine residues (matched to the ice lattice) that order clathrate waters into an ice-like structure6. Here we design proteins displaying arrays of up to 54 carboxylate residues geometrically matched to the potassium ion (K+) sublattice on muscovite mica (001). At low K+ concentration, individual molecules bind independently to mica in the designed orientations, whereas at high K+ concentration, the designs form two-dimensional liquid-crystal phases, which accentuate the inherent structural bias in the muscovite lattice to produce protein arrays ordered over tens of millimetres. Incorporation of designed protein–protein interactions preserving the match between the proteins and the K+ lattice led to extended self-assembled structures on mica: designed end-to-end interactions produced micrometre-long single-protein-diameter wires and a designed trimeric interface yielded extensive honeycomb arrays. The nearest-neighbour distances in these hexagonal arrays could be set digitally between 7.5 and 15.9 nanometres with 2.1-nanometre selectivity by changing the number of repeat units in the monomer. These results demonstrate that protein–inorganic lattice interactions can be systematically programmed and set the stage for designing protein–inorganic hybrid materials.

    https://www.nature.com/articles/s41586-019-1361-6

  • [2018] Programmable design of orthogonal protein heterodimers (Hao Shen, et al)

    We describe a general computational approach to designing self-assembling helical filaments from monomeric proteins and use this approach to design proteins that assemble into micrometer-scale filaments with a wide range of geometries in vivo and in vitro. Cryo–electron microscopy structures of six designs are close to the computational design models. The filament building blocks are idealized repeat proteins, and thus the diameter of the filaments can be systematically tuned by varying the number of repeat units. The assembly and disassembly of the filaments can be controlled by engineered anchor and capping units built from monomers lacking one of the interaction surfaces. The ability to generate dynamic, highly ordered structures that span micrometers from protein monomers opens up possibilities for the fabrication of new multiscale metamaterials.

    https://science.sciencemag.org/content/362/6415/705

  • [2018] De novo design of a non-local β-sheet protein with high stability and accuracy (Enrique Marcos, et al)

    β-sheet proteins carry out critical functions in biology, and hence are attractive scaffolds for computational protein design. Despite this potential, de novo design of all-β-sheet proteins from first principles lags far behind the design of all-α or mixed-αβ domains owing to their non-local nature and the tendency of exposed β-strand edges to aggregate. Through study of loops connecting unpaired β-strands (β-arches), we have identified a series of structural relationships between loop geometry, side chain directionality and β-strand length that arise from hydrogen bonding and packing constraints on regular β-sheet structures. We use these rules to de novo design jellyroll structures with double-stranded β-helices formed by eight antiparallel β-strands. The nuclear magnetic resonance structure of a hyperthermostable design closely matched the computational model, demonstrating accurate control over the β-sheet structure and loop geometry. Our results open the door to the design of a broad range of non-local β-sheet protein structures.

    https://www.nature.com/articles/s41594-018-0141-6

  • [2018] De novo design of a fluorescence-activating β-barrel (Jiayi Dou, et al)

    The regular arrangements of β-strands around a central axis in β-barrels and of α-helices in coiled coils contrast with the irregular tertiary structures of most globular proteins, and have fascinated structural biologists since they were first discovered. Simple parametric models have been used to design a wide range of α-helical coiled-coil structures, but to date there has been no success with β-barrels. Here we show that accurate de novo design of β-barrels requires considerable symmetry-breaking to achieve continuous hydrogen-bond connectivity and eliminate backbone strain. We then build ensembles of β-barrel backbone models with cavity shapes that match the fluorogenic compound DFHBI, and use a hierarchical grid-based search method to simultaneously optimize the rigid-body placement of DFHBI in these cavities and the identities of the surrounding amino acids to achieve high shape and chemical complementarity. The designs have high structural accuracy and bind and fluorescently activate DFHBI in vitro and in Escherichia coli, yeast and mammalian cells. This de novo design of small-molecule binding activity, using backbones custom-built to bind the ligand, should enable the design of increasingly sophisticated ligand-binding proteins, sensors and catalysts that are not limited by the backbone geometries available in known protein structures.

    https://www.nature.com/articles/s41586-018-0509-0

  • [2018] An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12 (Chen Keasar, et al)

    Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.

    https://www.nature.com/articles/s41598-018-26812-8

  • [2017] Comprehensive computational design of ordered peptide macrocycles (Parisa Hosseinzadeh, et al)

    Mixed-chirality peptide macrocycles such as cyclosporine are among the most potent therapeutics identified to date, but there is currently no way to systematically search the structural space spanned by such compounds. Natural proteins do not provide a useful guide: Peptide macrocycles lack regular secondary structures and hydrophobic cores, and can contain local structures not accessible with L-amino acids. Here, we enumerate the stable structures that can be adopted by macrocyclic peptides composed of L- and D-amino acids by near-exhaustive backbone sampling followed by sequence design and energy landscape calculations. We identify more than 200 designs predicted to fold into single stable structures, many times more than the number of currently available unbound peptide macrocycle structures. Nuclear magnetic resonance structures of 9 of 12 designed 7- to 10-residue macrocycles, and three 11- to 14-residue bicyclic designs, are close to the computational models. Our results provide a nearly complete coverage of the rich space of structures possible for short peptide macrocycles and vastly increase the available starting scaffolds for both rational drug design and library selection methods.

    https://science.sciencemag.org/content/358/6369/1461/

  • [2017] Evolution of a designed protein assembly encapsulating its own RNA genome (Gabriel L. Butterfield, et al)

    The challenges of evolution in a complex biochemical environment, coupling genotype to phenotype and protecting the genetic material, are solved elegantly in biological systems by the encapsulation of nucleic acids. In the simplest examples, viruses use capsids to surround their genomes. Although these naturally occurring systems have been modified to change their tropism1 and to display proteins or peptides2,3,4, billions of years of evolution have favoured efficiency at the expense of modularity, making viral capsids difficult to engineer. Synthetic systems composed of non-viral proteins could provide a ‘blank slate’ to evolve desired properties for drug delivery and other biomedical applications, while avoiding the safety risks and engineering challenges associated with viruses. Here we create synthetic nucleocapsids, which are computationally designed icosahedral protein assemblies5,6 with positively charged inner surfaces that can package their own full-length mRNA genomes. We explore the ability of these nucleocapsids to evolve virus-like properties by generating diversified populations using Escherichia coli as an expression host. Several generations of evolution resulted in markedly improved genome packaging (more than 133-fold), stability in blood (from less than 3.7% to 71% of packaged RNA protected after 6 hours of treatment), and in vivo circulation time (from less than 5 minutes to approximately 4.5 hours). The resulting synthetic nucleocapsids package one full-length RNA genome for every 11 icosahedral assemblies, similar to the best recombinant adeno-associated virus vectors. Our results show that there are simple evolutionary paths through which protein assemblies can acquire virus-like genome packaging and protection. Considerable effort has been directed at ‘top-down’ modification of viruses to be safe and effective for drug delivery and vaccine applications; the ability to design synthetic nanomaterials computationally and to optimize them through evolution now enables a complementary ‘bottom-up’ approach with considerable advantages in programmability and control.

    https://www.nature.com/articles/nature25157/

  • [2017] Massively parallel de novo protein design for targeted therapeutics (Aaron Chevalier, et al)

    De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.

    https://www.nature.com/articles/nature23912/

  • [2017] Computational design of environmental sensors for the potent opioid fentanyl (Matthew J Bick, et al)

    We describe the computational design of proteins that bind the potent analgesic fentanyl. Our approach employs a fast docking algorithm to find shape complementary ligand placement in protein scaffolds, followed by design of the surrounding residues to optimize binding affinity. Co-crystal structures of the highest affinity binder reveal a highly preorganized binding site, and an overall architecture and ligand placement in close agreement with the design model. We use the designs to generate plant sensors for fentanyl by coupling ligand binding to design stability. The method should be generally useful for detecting toxic hydrophobic compounds in the environment.

    https://elifesciences.org/articles/28909

  • [2017] Global analysis of protein folding using massively parallel design, synthesis, and testing (Gabriel J. Rocklin, et al)

    Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are “encoded” in the thousands of known protein structures, “decoding” them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds—a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment and has the potential to transform computational protein design into a data-driven science.

    https://science.sciencemag.org/content/357/6347/168/

  • [2017] Protein structure determination using metagenome sequence data (Sergey Ovchinnikov, et al)

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.

    https://science.sciencemag.org/content/355/6322/294/

  • [2016] The coming of age of de novo protein design (Po-Ssu Huang, et al)

    There are 20200 possible amino-acid sequences for a 200-residue protein, of which the natural evolutionary process has sampled only an infinitesimal subset. De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding. Computational methodology has advanced to the point that a wide range of structures can be designed from scratch with atomic-level accuracy. Almost all protein engineering so far has involved the modification of naturally occurring proteins; it should now be possible to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.

    https://www.nature.com/articles/nature19946

  • [2016] Accurate design of megadalton-scale two-component icosahedral protein complexes (Jacob B. Bale, et al)

    Nature provides many examples of self- and co-assembling protein-based molecular machines, including icosahedral protein cages that serve as scaffolds, enzymes, and compartments for essential biochemical reactions and icosahedral virus capsids, which encapsidate and protect viral genomes and mediate entry into host cells. Inspired by these natural materials, we report the computational design and experimental characterization of co-assembling, two-component, 120-subunit icosahedral protein nanostructures with molecular weights (1.8 to 2.8 megadaltons) and dimensions (24 to 40 nanometers in diameter) comparable to those of small viral capsids. Electron microscopy, small-angle x-ray scattering, and x-ray crystallography show that 10 designs spanning three distinct icosahedral architectures form materials closely matching the design models. In vitro assembly of icosahedral complexes from independently purified components occurs rapidly, at rates comparable to those of viral capsids, and enables controlled packaging of molecular cargo through charge complementarity. The ability to design megadalton-scale materials with atomic-level accuracy and controllable assembly opens the door to a new generation of genetically programmable protein-based molecular machines.

    https://science.sciencemag.org/content/353/6297/389/

  • [2016] Design of a hyperstable 60-subunit protein icosahedron (Yang Hsia, et al)

    The icosahedron is the largest of the Platonic solids, and icosahedral protein structures are widely used in biological systems for packaging and transport. There has been considerable interest in repurposing such structures for applications ranging from targeted delivery to multivalent immunogen presentation. The ability to design proteins that self-assemble into precisely specified, highly ordered icosahedral structures would open the door to a new generation of protein containers with properties custom-tailored to specific applications. Here we describe the computational design of a 25-nanometre icosahedral nanocage that self-assembles from trimeric protein building blocks. The designed protein was produced in Escherichia coli, and found by electron microscopy to assemble into a homogenous population of icosahedral particles nearly identical to the design model. The particles are stable in 6.7 molar guanidine hydrochloride at up to 80 degrees Celsius, and undergo extremely abrupt, but reversible, disassembly between 2 molar and 2.25 molar guanidinium thiocyanate. The icosahedron is robust to genetic fusions: one or two copies of green fluorescent protein (GFP) can be fused to each of the 60 subunits to create highly fluorescent 'standard candles' for use in light microscopy, and a designed protein pentamer can be placed in the centre of each of the 20 pentameric faces to modulate the size of the entrance/exit channels of the cage. Such robust and customizable nanocages should have considerable utility in targeted drug delivery, vaccine design and synthetic biology.

    https://www.nature.com/articles/nature18010

  • [2016] De novo design of protein homo-oligomers with modular hydrogen-bond network–mediated specificity (Scott E. Boyken, et al)

    In nature, structural specificity in DNA and proteins is encoded differently: In DNA, specificity arises from modular hydrogen bonds in the core of the double helix, whereas in proteins, specificity arises largely from buried hydrophobic packing complemented by irregular peripheral polar interactions. Here, we describe a general approach for designing a wide range of protein homo-oligomers with specificity determined by modular arrays of central hydrogen-bond networks. We use the approach to design dimers, trimers, and tetramers consisting of two concentric rings of helices, including previously not seen triangular, square, and supercoiled topologies. X-ray crystallography confirms that the structures overall, and the hydrogen-bond networks in particular, are nearly identical to the design models, and the networks confer interaction specificity in vivo. The ability to design extensive hydrogen-bond networks with atomic accuracy enables the programming of protein interaction specificity for a broad range of synthetic biology applications; more generally, our results demonstrate that, even with the tremendous diversity observed in nature, there are fundamentally new modes of interaction to be discovered in proteins.

    https://science.sciencemag.org/content/352/6286/680/

TN-Grid

  • A Computing System for Discovering Causal Relationships among Human Genes to Improve Drug Repositioning

    The automatic discovery of causal relationships among human genes can shed light on gene regulatory processes and guide drug repositioning. To this end, a computationally-heavy method for causal discovery is distributed on a volunteer computing grid and, taking advantage of variable subsetting and stratification, proves to be useful for expanding local gene regulatory networks. The input data are purely observational measures of transcripts expression in human tissues and cell lines collected within the FANTOM project. The system relies on the BOINC platform and on optimized client code. The functional relevance of results, measured by analyzing the annotations of the identified interactions, increases significantly over the simple Pearson correlation between the transcripts. Additionally, in 82% of cases networks significantly overlap with known protein-protein interactions annotated in biological databases. In the two case studies presented, this approach has been used to expand the networks of genes associated with two severe human pathologies: prostate cancer and coronary artery disease. The method identified respectively 22 and 36 genes to be evaluated as novel targets for already approved drugs, demonstrating the effective applicability of the approach in pipelines aimed to drug repositioning.

    https://ieeexplore.ieee.org/document/9224179

  • Vitis OneGenE: A Causality-Based Approach to Generate Gene Networks in Vitis vinifera Sheds Light on the Laccase and Dirigent Gene Families

    The abundance of transcriptomic data and the development of causal inference methods have paved the way for gene network analyses in grapevine. Vitis OneGenE is a transcriptomic data mining tool that finds direct correlations between genes, thus producing association networks. As a proof of concept, the stilbene synthase gene regulatory network obtained with OneGenE has been compared with published co-expression analysis and experimental data, including cistrome data for MYB stilbenoid regulators. As a case study, the two secondary metabolism pathways of stilbenoids and lignin synthesis were explored. Several isoforms of laccase, peroxidase, and dirigent protein genes, putatively involved in the final oxidative oligomerization steps, were identified as specifically belonging to either one of these pathways. Manual curation of the predicted sequences exploiting the last available genome assembly, and the integration of phylogenetic and OneGenE analyses, identified a group of laccases exclusively present in grapevine and related to stilbenoids. Here we show how network analysis by OneGenE can accelerate knowledge discovery by suggesting new candidates for functional characterization and application in breeding programs.

    https://www.mdpi.com/2218-273X/11/12/1744

Universe@home

  • [2019] Populations of stellar mass Black holes from binary systems (Grzegorz Wiktorowicz, et al)

    In large and complicated stellar systems like galaxies it is difficult to predict the number and characteristics of a black hole population. Such populations may be modelled as an aggregation of homogeneous (i.e. having uniform star formation history and the same initial chemical composition) stellar populations. Using realistic evolutionary models we predict the abundances and properties of black holes formed from binaries in these environments. We show that the black hole population will be dominated by single black holes originating from binary disruptions and stellar mergers. Furthermore, we discuss how black hole populations are influenced by such factors as initial parameters, metallicity, initial mass function, and natal kick models. As an example application of our results, we estimate that about 26 microlensing events to happen every year in the direction of the Galactic Bulge due to black holes in a survey like OGLE-IV. Our results may be used to perform in-depth studies related to realistic black hole populations, e.g. observational predictions for space survey missions like Gaia, or Einstein Probe. We prepared a publicly available database with the raw data from our simulations to be used for more in-depth studies.

    https://arxiv.org/abs/1907.11431

  • [2019] Merger of compact stars in the two-families scenario (Roberto De Pietri, et al).

    We analyse the phenomenological implications of the two-families scenario on the merger of compact stars. That scenario is based on the coexistence of both hadronic stars and strange quark stars. After discussing the classification of the possible mergers, we turn to detailed numerical simulations of the merger of two hadronic stars, i.e., "first family" stars in which delta resonances and hyperons are present, and we show results for the threshold mass of such binaries, for the mass dynamically ejected and the mass of the disk surrounding the post-merger object. We compare these results with those obtained within the one-family scenario and we conclude that relevant signatures of the two-families scenario can be suggested, in particular: the possibility of a rapid collapse to a black hole for masses even smaller than the ones associated to GW170817; during the first milliseconds, oscillations of the postmerger remnant at frequencies higher than the ones obtained in the one-family scenario; a large value of the mass dynamically ejected and a small mass of the disk, for binaries of low total mass. Finally, based on a population synthesis analysis, we present estimates of the number of mergers for: two hadronic stars; hadronic star - strange quark star; two strange quark stars. We show that for unequal mass systems and intermediate values of the total mass, the merger of a hadronic star and a strange quark star is very likely (GW170817 has a possible interpretation into this category of mergers). On the other hand, mergers of two strange quark stars are strongly suppressed.

    https://arxiv.org/abs/1904.01545

  • [2019] The observed vs total population of ULXs (Grzegorz Wiktorowicz, et al)

    We have analyzed how anisotropic emission of radiation affects the observed sample of ultraluminous X-ray sources (ULXs) by performing simulations of the evolution of stellar populations, employing recent developments in stellar and binary physics, and by utilizing a geometrical beaming model motivated by theory and observation. Whilst ULXs harboring black hole accretors (BH ULXs) are typically emitting isotropically, the majority of ULXs with neutron star accretors (NS ULXs) are found to be beamed. These findings confirm previous assertions that a significant fraction of ULXs are hidden from view due to a substantial misalignment of the emission beam and the line-of-sight. We find the total number of NS ULXs in regions with constant star formation, solar metallicity, and ages above ~1 Gyr to be higher than the BH ULXs, although observationally both populations are comparable. For lower metallicities BH ULX dominate both the total and observed ULX populations. As far as burst star-formation is concerned, young ULX populations are dominated by BH ULXs, but this changes as the population ages and, post star-formation, NS ULXs dominate both the observed and total population of ULXs. We also compare our simulation output to a previous analytical prediction for the relative ratio of BH to NS ULXs in idealized flux-limited observations and find broad agreement for all but the lowest metallicities. In so doing we find that in such surveys the observed ULX population should be heavily dominated by black-hole systems rather than by systems containing neutron stars.

    https://arxiv.org/abs/1811.08998

World Community Grid

  • [2019] Revealing solvent-dependent folding behavior of mycolic acids from Mycobacterium tuberculosis by advanced simulation analysis (Wilma Groenewald, et al)

    Mycobacterium tuberculosis remains a persistent pathogen, partly due to its lipid rich cell wall, of which mycolic acids (MAs) are a major component. The fluidity and conformational flexibilities of different MAs in the bacterial cell wall significantly influence its properties, function, and observed pathogenicity; thus, a proper conformational description of different MAs in different environments (e.g., in vacuum, in solution, in monolayers) can inform about their potential role in the complex setup of the bacterial cell wall. Previously, we have shown that molecular dynamics (MD) simulations of MA folding in vacuo can be used to characterize MA conformers in seven groupings relating to bending at the functional groups (W, U and Z-conformations). Providing a new OPLS-based forcefield parameterization for the critical cyclopropyl group of MAs and extensive simulations in explicit solvents (TIP4P water, hexane), we now present a more complete picture of MA folding properties together with improved simulation analysis techniques. We show that the ‘WUZ’ distance-based analysis can be used to pinpoint conformers with hairpin bends at the functional groups, with these conformers constituting only a fraction of accessible conformations. Applying principle component analysis (PCA) and refinement using free energy landscapes (FELs), we are able to discriminate a complete and unique set of conformational preferences for representative alpha-, methoxy- and keto-MAs, with overall preference for folded conformations. A control backbone-MA without any mero-chain functional groups showed significantly less folding in the mero-chain, confirming the role of functionalization in directing folding. Keto-MA showed the highest percentage of WUZ-type conformations and, in particular, a tendency to fold at its alpha-methyl trans-cyclopropane group, in agreement with results from Villeneuve et al. MAs demonstrate similar folding in vacuum and water, with a majority of folded conformations around the W-conformation, although the molecules are more flexible in vacuum than in water. Exchange between conformations, with a disperse distribution that includes unfolded conformers, is common in hexane for all MAs, although with more organization for Keto-MA. Globular, folded conformations are newly defined and may be specifically relevant in biofilms.

    https://link.springer.com/article/10.1007/s00894-019-3943-5

  • [2019] Massive-Scale Binding Free Energy Simulations of HIV Integrase Complexes Using Asynchronous Replica Exchange Framework Implemented on the IBM WCG Distributed Network (Junchao Xia, et al)

    To perform massive-scale replica exchange molecular dynamics (REMD) simulations for calculating binding free energies of protein–ligand complexes, we implemented the asynchronous replica exchange (AsyncRE) framework of the binding energy distribution analysis method (BEDAM) in implicit solvent on the IBM World Community Grid (WCG) and optimized the simulation parameters to reduce the overhead and improve the prediction power of the WCG AsyncRE simulations. We also performed the first massive-scale binding free energy calculations using the WCG distributed computing grid and 301 ligands from the SAMPL4 challenge for large-scale binding free energy predictions of HIV-1 integrase complexes. In total there are ∼10000 simulated complexes, ~1 million replicas, and ~2000 μs of aggregated MD simulations. Running AsyncRE MD simulations on the WCG requires accepting a trade-off between the number of replicas that can be run (breadth) and the number of full RE cycles that can be completed per replica (depth). As compared with synchronous Replica Exchange (SyncRE) running on tightly coupled clusters like XSEDE, on the WCG many more replicas can be launched simultaneously on heterogeneous distributed hardware, but each full RE cycle requires more overhead. We compared the WCG results with that from AutoDock and more advanced RE simulations including the use of flattening potentials to accelerate sampling of selected degrees of freedom of ligands and/or receptors related to slow dynamics due to high energy barriers. We propose a suitable strategy of RE simulations to refine high throughput docking results which can be matched to corresponding computing resources: from HPC clusters, to small or medium-size distributed campus grids, and finally to massive-scale computing networks including millions of CPUs like the resources available on the WCG.

    https://pubs.acs.org/doi/full/10.1021/acs.jcim.8b00817#

  • [2019] A diarylamine derived from anthranilic acid inhibits ZIKV replication (Suely Silva, et al)

    Zika virus (ZIKV) is a mosquito-transmitted Flavivirus, originally identified in Uganda in 1947 and recently associated with a large outbreak in South America. Despite extensive efforts there are currently no approved antiviral compounds for treatment of ZIKV infection. Here we describe the antiviral activity of diarylamines derived from anthranilic acid (FAMs) against ZIKV. A synthetic FAM (E3) demonstrated anti-ZIKV potential by reducing viral replication up to 86%. We analyzed the possible mechanisms of action of FAM E3 by evaluating the intercalation of this compound into the viral dsRNA and its interaction with the RNA polymerase of bacteriophage SP6. However, FAM E3 did not act by these mechanisms. In silico results predicted that FAM E3 might bind to the ZIKV NS3 helicase suggesting that this protein could be one possible target of this compound. To test this, the thermal stability and the ATPase activity of the ZIKV NS3 helicase domain (NS3Hel) were investigated in vitro and we demonstrated that FAM E3 could indeed bind to and stabilize NS3Hel.

    https://www.nature.com/articles/s41598-019-54169-z

  • [2019] Structure-Based Function Prediction using Graph Convolutional Networks (Vladimir Gligorijevic, et al)

    Recent massive increases in the number of sequences available in public databases challenges current experimental approaches to determining protein function. These methods are limited by both the large scale of these sequences databases and the diversity of protein functions. We present a deep learning Graph Convolutional Network (GCN) trained on sequence and structural data and evaluate it on ~40k proteins with known structures and functions from the Protein Data Bank (PDB). Our GCN predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and competing methods. Feature extraction via a language model removes the need for constructing multiple sequence alignments or feature engineering. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 30% sequence identity to the training set. Using class activation mapping, we can automatically identify structural regions at the residue-level that lead to each function prediction for every protein confidently predicted, advancing site-specific function prediction. De-noising inherent in the trained model allows an only minor drop in performance when structure predictions are used, including multiple de novo protocols. We use our method to annotate all proteins in the PDB, making several new confident function predictions spanning both fold and function trees.

    https://www.biorxiv.org/content/10.1101/786236v1

  • [2018] Hidden partners: Using cross-docking calculations to predict binding sites for proteins with multiple interactions (Nathalie Lagarde, et al)

    Protein-protein interactions control a large range of biological processes and their identification is essential to understand the underlying biological mechanisms. To complement experimental approaches, in silico methods are available to investigate protein-protein interactions. Cross-docking methods, in particular, can be used to predict protein binding sites. However, proteins can interact with numerous partners and can present multiple binding sites on their surface, which may alter the binding site prediction quality. We evaluate the binding site predictions obtained using complete cross-docking simulations of 358 proteins with 2 different scoring schemes accounting for multiple binding sites. Despite overall good binding site prediction performances, 68 cases were still associated with very low prediction quality, presenting individual area under the specificity-sensitivity ROC curve (AUC) values below the random AUC threshold of 0.5, since cross-docking calculations can lead to the identification of alternate protein binding sites (that are different from the reference experimental sites). For the large majority of these proteins, we show that the predicted alternate binding sites correspond to interaction sites with hidden partners, that is, partners not included in the original cross-docking dataset. Among those new partners, we find proteins, but also nucleic acid molecules. Finally, for proteins with multiple binding sites on their surface, we investigated the structural determinants associated with the binding sites the most targeted by the docking partners.

    https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.25506

  • [2018] The A–Z of Zika drug discovery (Melina Mottin, et al)

    Docking is commonly applied to drug design efforts, especially high-throughput virtual screenings of small molecules, to identify new compounds that bind to a given target. Despite great advances and successful applications in recent years, a number of issues remain unsolved. Most of the challenges and problems faced when running docking experiments are independent of the specific software used, and can be ascribed to either improper input preparation or to the simplified approaches applied to achieve high-throughput speed. Being aware of approximations and limitations of such methods is essential to prevent errors, deal with misleading results, and increase the success rate of virtual screening campaigns. In this review, best practices and most common issues of docking and virtual screening will be discussed, covering the journey from the design of the virtual experiment to the hit identification.

    https://www.sciencedirect.com/science/article/abs/pii/S1359644618300412?via%3Dihub

  • [2018] Computational drug discovery for the Zika virus (Melina Mottin, et al)

    Few Zika virus (ZIKV) outbreaks had been reported since its first detection in 1947, until the recent epidemics occurred in South America (2014/2015) and expeditiously became a global public health emergency. This arbovirus reached 0.5-1.3 million cases of ZIKV infection in Brazil in 2015 and rapidly spread in new geographic areas such as the Americas. Despite the mild symptoms of the Zika fever, the major concern is related to the related severe neurological disorders, especially microcephaly in newborns. Advances in ZIKV drug discovery have been made recently and constitute promising approaches to ZIKV treatment. In this review, we summarize current computational drug discovery efforts and their applicability to discovery of anti-ZIKV drugs. Lastly, we present successful examples of the use of computational approaches to ZIKV drug discovery.

    https://doi.org/10.1590/s2175-97902018000001002

  • [2018] Novel Intersubunit Interaction Critical for HIV-1 Core Assembly Defines a Potentially Targetable Inhibitor Binding Pocket (Pierrick Craveur, et al)

    HIV-1 capsid protein (CA) plays critical roles in both early and late stages of the viral replication cycle. Mutagenesis and structural experiments have revealed that capsid core stability significantly affects uncoating and initiation of reverse transcription in host cells. This has led to efforts in developing antivirals targeting CA and its assembly, although none of the currently identified compounds are used in the clinic for treatment of HIV infection. A specific interaction that is primarily present in pentameric interfaces in the HIV-1 capsid core was identified and is reported to be important for CA assembly. This is shown by multidisciplinary characterization of CA site-directed mutants using biochemical analysis of virus-like particle formation, transmission electron microscopy of in vitro assembly, crystallographic studies, and molecular dynamic simulations. The data are consistent with a model where a hydrogen bond between CA residues E28 and K30' from neighboring N-terminal domains (CANTDs) is important for CA pentamer interactions during core assembly. This pentamer-preferred interaction forms part of an N-terminal domain interface (NDI) pocket that is amenable to antiviral targeting.

    https://mbio.asm.org/content/10/2/e02858-18

  • [2017] Molecular dynamics simulations of Zika virus NS3 helicase: Insights into RNA binding site activity (Melina Mottin, et al)

    America is still suffering with the outbreak of Zika virus (ZIKV) infection. Congenital ZIKV syndrome has already caused a public health emergency of international concern. However, there are still no vaccines to prevent or drugs to treat the infection caused by ZIKV. The ZIKV NS3 helicase (NS3h) protein is a promising target for drug discovery due to its essential role in viral genome replication. NS3h unwinds the viral RNA to enable the replication of the viral genome by the NS5 protein. NS3h contains two important binding sites: the NTPase binding site and the RNA binding site. Here, we used molecular dynamics (MD) simulations to study the molecular behavior of ZIKV NS3h in the presence and absence of ssRNA and the potential implications for NS3h activity and inhibition. Although there is conformational variability and poor electron densities of the RNA binding loop in various apo flaviviruses NS3h crystallographic structures, the MD trajectories of NS3h-ssRNA demonstrated that the RNA binding loop becomes more stable when NS3h is occupied by RNA. Our results suggest that the presence of RNA generates important interactions with the RNA binding loop, and these interactions stabilize the loop sufficiently that it remains in a closed conformation. This closed conformation likely keeps the ssRNA bound to the protein for a sufficient duration to enable the unwinding/replication activities of NS3h to occur. In addition, conformational changes of this RNA binding loop can change the nature and location of the optimal ligand binding site, according to ligand binding site prediction results. These are important findings to help guide the design and discovery of new inhibitors of NS3h as promising compounds to treat the ZIKV infection.

    https://www.sciencedirect.com/science/article/abs/pii/S0006291X1730534X?via%3Dihub

  • [2016] Effects of novel small compounds targeting TrkB on neuronal cell survival and depression-like behavior (Mayu Fukuda, et al)

    Brain-derived neurotrophic factor (BDNF) and its high affinity receptor tyrosine kinase receptor B (TrkB) are involved in neuronal survival, maintenance, differentiation and synaptic plasticity. Deficiency of BDNF was reported to be associated with psychological disorders such as depression. Hence we examined proliferative effect of 11 candidate TrkB agonistic compounds in TrkB-expressing SH-SY5Y cells, via a hypothesis that some candidate compounds identified in our previous in silico screening for a small molecule targeting the BDNF binding domain of TrkB should activate TrkB signaling. In the present study, two promising compounds, 48 and 56, were identified and subsequently assessed for their ability to induce TrkB phosphorylation in vitro and in vivo. Likewise those seen in BDNF, the compounds mediated TrkB phosphorylation was blocked by the Trk inhibitor, K252a. Since BDNF-TrkB signaling deficiency is associated with the pathogenesis of depression and reactivation of this signaling by antidepressants is a cause of the pathogenic state recovery, the compounds were subjected to the assessment for forced swim test, which is a mouse model of depression. We found that compound 48 significantly reduced mouse immobility time compared with the control vehicle injection, suggesting the confirmation of hypothetical antidepressant-like efficacy of 48 compound in vivo. Thus, our present study demonstrated that compound 48, selected through in silico screening, is a novel activator of TrkB signaling and a potential antidepressant molecule.

    https://www.sciencedirect.com/science/article/abs/pii/S0197018616300869

  • [2016] Illustrating and homology modeling the proteins of the Zika virus (Sean Ekins, et al)

    The Zika virus (ZIKV) is a flavivirus of the family Flaviviridae, which is similar to dengue virus, yellow fever and West Nile virus. Recent outbreaks in South America, Latin America, the Caribbean and in particular Brazil have led to concern for the spread of the disease and potential to cause Guillain-Barré syndrome and microcephaly. Although ZIKV has been known of for over 60 years there is very little in the way of knowledge of the virus with few publications and no crystal structures. No antivirals have been tested against it either in vitro or in vivo. ZIKV therefore epitomizes a neglected disease. Several suggested steps have been proposed which could be taken to initiate ZIKV antiviral drug discovery using both high throughput screens as well as structure-based design based on homology models for the key proteins. We now describe preliminary homology models created for NS5, FtsJ, NS4B, NS4A, HELICc, DEXDc, peptidase S7, NS2B, NS2A, NS1, E stem, glycoprotein M, propeptide, capsid and glycoprotein E using SWISS-MODEL. Eleven out of 15 models pass our model quality criteria for their further use. While a ZIKV glycoprotein E homology model was initially described in the immature conformation as a trimer, we now describe the mature dimer conformer which allowed the construction of an illustration of the complete virion. By comparing illustrations of ZIKV based on this new homology model and the dengue virus crystal structure we propose potential differences that could be exploited for antiviral and vaccine design. The prediction of sites for glycosylation on this protein may also be useful in this regard. While we await a cryo-EM structure of ZIKV and eventual crystal structures of the individual proteins, these homology models provide the community with a starting point for structure-based design of drugs and vaccines as well as a for computational virtual screening.

    https://f1000research.com/articles/5-275/v2