BiomarkerEngineWe recently announced a new product developed here at Amplion called BiomarkerEngine™, which is a system that is able to gather biomarker information from large datasets such as clinical trial databases. BiomarkerEngine is built on our proprietary platform called the Amplion Data Acquisition Platform, or ADAP™.

In this post we will provide some detail about ADAP and how it can be used, but first a little more about BiomarkerEngine, which is a product that we are now starting to license to other companies for their own biomarker information gathering needs.

BiomarkerEngine: Rapid and Accurate Biomarker Collection and Curation

BiomarkerEngine is able to download large public datasets published in a variety of formats and curate them for biomarker information, preparing the information for use in whatever final product one wishes. We are currently using it to process clinical trial records for biomarker mentions, and then to publish that information into BiomarkerBase™, our knowledge base of biomarkers in clinical use and development. BiomarkerBase is used by drug companies to conduct competitive analysis of biomarkers in clinical trials, and by diagnostics companies to identify targets for new diagnostic products or services.

We recently ran our first test of BiomarkerEngine’s ability to process every clinical trial record at ClinicalTrials.gov, and were pleased to find that it was able to rapidly conduct an initial scan of all 200,000 records without error. Next week we expect to publish over 1,000 new biomarkers in BiomarkerBase that were exclusively gathered and curated by BiomarkerEngine, and by the end of this month we will have extracted biomarkers from over 10,000 trial records. Before the end of the year we will have extracted biomarker information from every record at every publicly-available repository of clinical trial records worldwide, and we will be able to update this information daily.

And BiomarkerEngine’s accuracy is greater than 93%.

Gathering this information by manual curation would cost over $4 million by our estimate, and staying current on newly added trial records would cost at least $50,000 per month. Replicating the ability to conduct a daily update of a combined dataset this large is essentially impossible by manual review.

One of the coolest things about BiomarkerEngine is its ability to learn. We have PhD reviewers using a special interface to correct the assumptions that BiomarkerEngine makes when it curates the information it has gathered, and those corrections and any additional annotations are then fed back to improve future data gathering and curation.

BiomarkerEngine can of course gather biomarker information from data sources other than clinical trial records, and based on requests from our current customers we are likely to focus next on biomarkers in patent applications.

ADAP: A Flexible Platform That Learns

But gathering biomarker information is only one potential application of ADAP, as the platform is able to gather and curate almost any kind of data, and it can certainly be coupled to almost any type of final repository. ADAP is only limited by the accessibility of the data, the keywords that one is using to search those data, and by the user’s ability to provide input to train the artificial intelligence model.

Architecturally ADAP is a modular platform that includes the following components:

  1. The Poller scans large datasets for text-level changes to documents that you care about, and returns the changes back to ADAP;
  2. The Curator applies rules to the structuring and tagging of the datasets, and prepares the data for inclusion in a final repository;
  3. The Annotator is a multi-user tool for making text-level annotations to any documents that have been processed by The Curator. The primary function of The Annotator is to correct mistakes made by The Curator, and to make additional annotations that inform training sets;
  4. The Assimilator is a natural language processing model that applies named-entity recognition to the training sets produced by The Annotator, in order to improve the rules being applied by The Curator.

ADAP is a flexible platform that we think can be used for a wide variety of data acquisition problems, and not just in healthcare-related fields.