Small molecules, big data
Critical drug discovery research demands comprehensive data. GOSTAR™ Small Molecules delivers it, with structure-activity data on over 10.6 million drug compounds. Our carefully curated data is manually standardized and quality checked by a three-tiered process conducted by medicinal chemists.
See why almost 15 of the top 20 pharmaceutical companies consider GOSTAR™ an essential tool for their drug discovery research.

The largest small molecule SAR database
With over 10.6 million chemical structures, 35 million bioactivities, and 79,000 targets, GOSTAR™ Small Molecules provides you with the coverage of the small molecule space necessary to understand what compounds came before you and plan which will come next.
The harmonized data sets on GOSTAR™ are curated to contain the data most relevant to pharmaceutical drug design and discovery. It provides comprehensive information on small molecules, including their biological activities, and physico-chemical properties. You’ll be confident in your findings due to GOSTAR™’s extensive coverage and our three-tiered, QMS-ISO-9001:2015 certified quality control process.
Streamline drug discovery with
GOSTAR™ Small Molecules’ comprehensive SAR and pharmacological data
10.6
120
35
78
79
4
10.6
35
79
120
78
4
Discover how GOSTAR™ Small Molecules is useful for
AI, informaticians and data scientists
Your AI/ML models, algorithms, and AI training sets can only be as good as their inputs. GOSTAR™ Small Molecules provides you with the data you need to generate highly optimized compounds.
The scale you need
We can provide you with our entire database – all 35 million rows – as a single file. The GOSTAR™ Small Molecules database is available in a number of convenient formats including flat files, hierarchical files, semantic format, Oracle databases, MySQL™ databases etc., or you can pull from our database using an API.
Minimal processing
We’ve done the hard work of cleaning and standardizing the data so you don’t have to. GOSTAR™ gets you as close as possible to usable data being directly importable out of the box.
Extreme accuracy
Our three-tiered, QMS-ISO-9001:2015 certified quality control process results in industry-best accuracy. Want to check for yourself? Go for it. Our data is fully referenced.
Medicinal chemists
A clear and comprehensive view
Distill from over 10.6 million compounds to the specific subset most relevant to you in seconds. GOSTAR™’s next-generation feature set enables you to interrogate the desired chemical space much more quickly and easily with our user-friendly interface.
The most relevant data
GOSTAR™ is the only SAR database that is designed by medicinal chemists, for medicinal chemists. Our harmonized data sets are curated to contain the most relevant data for pharmaceutical design and discovery.
Quality you can rely on
Be confident in your results. GOSTAR™ is the only fully manually curated small molecule SAR database, and it is subject to a three-tiered, QMS-ISO-9001:2015 certified quality control process.
Drug development executives and strategic professionals
The clearest view of the patent landscape
Patent search? We’ve searched 4 million of them. GOSTAR™ Small Molecule gives you the most comprehensive view of the available chemical space. You can dive into the source with our fully referenced database.
Updates as fast as you need them
For critical competitive spaces where weeks or even days matter, GOSTAR™ Small Molecule is here. With the most frequently updated database, and the option for special curation with under 48-hour turnaround from when a source is published, you can ensure your information is the most up-to-date available.
User-friendly graphical interface
With powerful, intuitive search, helpful tools, flexible exports, and more, GOSTAR™ makes it easy to find the data you need quickly and explore chemical spaces of interest meaningfully.
AIl drug development and discovery professionals
Scale + accuracy
When you choose GOSTAR™ Small Molecules, you are getting more than just data on 10.6 million biologically active compounds. You are getting confidence that the data you are working with faithfully represents the pharmaceutical body of knowledge. Our three-tiered, QMS-ISO-9001:2015 certified quality control process provides an industry-leading accuracy.
The data you need, how you need it
Need data in bulk to fuel AI/ML drug discovery algorithms? Want to easily explore chemical spaces, and export your findings as desired? Need a rapid understanding of whatever chemotypes you desire, with whatever data you need, delivered to you on demand? GOSTAR™ can accommodate, with a range of options for accessing our data and the option of custom curation with a rapid turnaround.
Fully Traceable
We understand that critical decisions require absolute certainty in your data. That’s why every data point in GOSTAR™ Small Molecule is fully referenced. Need to check the source? We make it easy.
- AI, informaticians and data scientists
- Medicinal chemists
- Drug development executives and strategic professionals
- AIl drug development and discovery professionals
AI, informaticians and data scientists
Your AI/ML models, algorithms, and AI training sets can only be as good as their inputs. GOSTAR™ Small Molecules provides you with the data you need to generate highly optimized compounds.
The scale you need
We can provide you with our entire database – all 35 million rows – as a single file. The GOSTAR™ Small Molecules database is available in a number of convenient formats including flat files, hierarchical files, semantic format, Oracle databases, MySQL™ databases etc., or you can pull from our database using an API.
Minimal processing
We’ve done the hard work of cleaning and standardizing the data so you don’t have to. GOSTAR™ gets you as close as possible to usable data being directly importable out of the box.
Extreme accuracy
Our three-tiered, QMS-ISO-9001:2015 certified quality control process results in industry-best accuracy. Want to check for yourself? Go for it. Our data is fully referenced.
Medicinal chemists
We’re medicinal chemists, too. GOSTAR™ Small Molecules is the SAR database that understands your needs and gives you the data and tools to succeed in drug discovery.
A clear and comprehensive view
Distill from over 10.6 million compounds to the specific subset most relevant to you in seconds. GOSTAR™’s next-generation feature set enables you to interrogate the desired chemical space much more quickly and easily with our user-friendly interface.
The most relevant data
GOSTAR™ is the only SAR database that is designed by medicinal chemists, for medicinal chemists. Our harmonized data sets are curated to contain the most relevant data for pharmaceutical design and discovery.
Quality you can rely on
Be confident in your results. GOSTAR™ is the only fully manually curated small molecule SAR database, and it is subject to a three-tiered, QMS-ISO-9001:2015 certified quality control process.
Drug development executives and strategic professionals
Know how your pharmaceutical areas of interest are evolving. Get the most recent data so you can make the most timely decisions. Drug discovery is a race. Win it with GOSTAR™ Small Molecules.
The clearest view of the patent landscape
Patent search? We’ve searched 4 million of them. GOSTAR™ Small Molecule gives you the most comprehensive view of the available chemical space. You can dive into the source with our fully referenced database.
Updates as fast as you need them
For critical competitive spaces where weeks or even days matter, GOSTAR™ Small Molecule is here. With the most frequently updated database, and the option for special curation with under 48-hour turnaround from when a source is published, you can ensure your information is the most up-to-date available.
User-friendly graphical interface
With powerful, intuitive search, helpful tools, flexible exports, and more, GOSTAR™ makes it easy to find the data you need quickly and explore chemical spaces of interest meaningfully.
AIl drug development and discovery professionals
GOSTAR™ Small Molecules has the breadth and quality of activity, affinity, ADME, toxicology, physiochemical, and other data relevant to pharmaceutical design and discovery needed to empower your drug development programs.
Scale + accuracy
When you choose GOSTAR™ Small Molecules, you are getting more than just data on 10.6 million biologically active compounds. You are getting confidence that the data you are working with faithfully represents the pharmaceutical body of knowledge. Our three-tiered, QMS-ISO-9001:2015 certified quality control process provides an industry-leading accuracy.
The data you need, how you need it
Need data in bulk to fuel AI/ML drug discovery algorithms? Want to easily explore chemical spaces, and export your findings as desired? Need a rapid understanding of whatever chemotypes you desire, with whatever data you need, delivered to you on demand? GOSTAR™ can accommodate, with a range of options for accessing our data and the option of custom curation with a rapid turnaround.
Fully Traceable
We understand that critical decisions require absolute certainty in your data. That’s why every data point in GOSTAR™ Small Molecule is fully referenced. Need to check the source? We make it easy.
The data you need, whenever you need it, however you need it
Flat file or web API | Get all 35 million rows or any subset | Periodic or on-demand updates

Intuitive search | Segmented data analysis | User-friendly SAR tool | Report generation | Flexible export options

Any data publicly available | Uses GOSTAR™’s proven data model | Even more accurate with four-tiered QC | Stay up-to-date: < 48 hour turnaround time on new data

Experience the future of small molecule discovery – Explore GOSTAR™ Small Molecules today!
What our customers say
Drug discovery is a complex, time-consuming process that entails multi-parameter optimization of molecular properties. GOSTAR™’s curated drug discovery data is a key building block for our proprietary, generative AI technology, Enki. Enki performs multi-parameter optimization to accelerate the design of novel and selective compounds for important therapeutic targets.
Peter Guzzo
Vice President, Head of Drug Discovery, Variational AI
Drug discovery is a challenge of optimizing dozens of molecular attributes simultaneously. Incorporating high-quality, annotated, and comprehensive datasets from GOSTAR™ is crucial for training our data-efficient physics-based AI models and supplementing them with program-specific data from our platform helps us achieve the desired attributes of a compound. Leveraging such data makes the process more time and resource-efficient, aids across a very wide range of relevant endpoints for early discovery and enables the design of novel therapeutics.
Fred Manby
Co-founder and CTO, Lambic Therapeutics
GOSTAR™ gives us access to an area of biological and chemical space that wouldn’t be possible otherwise.
Jonny Wray
CTO, E-therapeutics
GOSTAR™ provides a range of different types of information – a range of different types of data – and that allows you to create a range of different types of predictions, and in drug discovery having many different predictive options is certainly helpful.
Stephen MacKinnon
VP of Research and Development, Cyclica
Excelra responds quickly to provide the most recent patent and journal data to the GOSTAR™ database.
Hanjo Kim
Deputy Head of Research, Standigm
GOSTAR™ helps us analyze the competitive landscape, understand what’s out there, and in some instances, helps us to build models based on known information as well as train systems to assess how well we are doing at the curation process.
Bryce Allen
CEO/Co-founder, Differentiated Therapeutics

Case study
Structured and analysis-ready data for AI/ML-based drug discovery
Excelra’s Global Online Structure Activity Relationship Database GOSTAR® provides a 360-degree view of million compounds, linking their chemical structure to biological, pharmacological and therapeutic information. The heterogeneous and unstructured data captured from various data sources is transformed into a structured relational database format in GOSTAR™. All the content in GOSTAR™ is captured manually and passes through a 3-step quality control process. These normalized and structured datasets covering structure activity relationship (SAR), physicochemical properties, and ADMET parameters were integrated into the client’s internal platform to train the AI/ML algorithms for model building and activity/property prediction to support hit identification and lead optimization.
Knowledge hub
Filter
Frequently asked questions
How frequently is GOSTAR™ Small Molecules updated?
What types of drug properties are curated?
- Biochemical assay endpoints
- Bioactivity
- Binding affinity
- Cell-based assay endpoints
- Chemical structure
- Molecular descriptors
- Permeability
- Physicochemical properties
- Solubility
- Toxicity
Do you allow exports of the data from the user interface? What formats are available?
- Flat files
- Hierarchical files
- Databases (Oracle, MySQL™, etc.,)
- Semantic format
GOSTAR™: Powering the future of drug discovery
Explore how GOSTAR™ curated intelligence can accelerate your discovery!
"*" indicates required fields