news-04072024-215749

Introduction:

A growing federation of scientists share the goal of generating chemical modulators for all druggable human proteins by the year 2035. Under the umbrella name Target 2035, the rationale for this initiative is that selective chemical probes are precious tools to study functional genomics and reveal novel targets for unmet medical needs. Artificial intelligence (AI) may be the accelerator needed to reach this ambitious goal. However, AI can only fulfill this promise if trained on datasets that are large, reliable, and machine interpretable. The Structural Genomics Consortium (SGC) is a multi-national open science public-private partnership invested in the goal of Target 2035.

Data Management Best Practices:

Planning early for data science applications is essential, with a focus on adhering to FAIR principles (Findability, Accessibility, Interoperability, and Reusability). Generating, processing, and curating data explicitly with a focus on computational knowledge discovery is crucial. Defining precise ontologies and vocabulary to ensure machine interpretability of data is key.

Aiming for a centralized database architecture is recommended to facilitate the generation of global datasets with increased impact. Implementing shared data management tools and integrated ELN and LIMS solutions can enhance data management workflows and enable team collaboration within projects.

Pushing the boundaries of data recording with lab automation, especially with the integration of AI-operated laboratories, can generate large, rich, and reliable training sets for building AI models. Opening ELNs to data mining and deploying integrated ELN and LIMS solutions can further streamline data management processes.

Data Science Applications in Early-Stage Drug Discovery:

Data science presents opportunities to increase the efficiency of the drug discovery process by analyzing large experimental datasets to identify potential bioactive compounds in screening data. Machine learning models trained on DEL or ASMS data can be used to expand hits in virtual screens, predict pharmacokinetic and dynamic properties of drug candidates, and optimize compound potency and solubility.

Combining correlated data types to improve AI models and choosing the most relevant data representation are crucial steps in enhancing predictive power. Defining the right training and test sets and estimating prediction uncertainty can further improve model performance and reliability.

Designing virtuous DMTA cycles and ensuring scalability of AI predictions for virtual screening are essential for optimizing the drug discovery process. By establishing dynamic feedback loops and leveraging predictive models, researchers can prioritize informative data points for experimental testing and refine predictive models iteratively.

Conclusion:

Data science plays a pivotal role in early-stage drug discovery, offering valuable insights and opportunities for optimizing research operations. By incorporating best practices in data management and leveraging AI technologies, research institutions can enhance their capabilities in identifying promising drug candidates and accelerating the drug discovery process. Embracing a data-driven mindset and fostering collaboration between domain experts and data scientists can lead to significant advancements in open science drug discovery initiatives like Target 2035.