Disparate data with inconsistent data models, terminologies, and unstructured descriptions from studies need to be ingested into a searchable data store. Smart transformation replaces fixed adaptors and mappers as an important part of curation to make study data searchable across studies to gain insights.
Smart transformation uses machine learning to transform clinical, nonclinical and biomarker data from data lakes to a target model with automation. Supervised, expertly curated datasets train multiple deep neural network models that transform disparate source data. Recommendation engines using ontologies and vocabularies referenced in the target data model definition harmonize the transformed data. The smart transformers continually improve, learn and adaptively evolve as data managers intervene, assert or correct errors in transformation or users make decisions on metadata, content and terminology recommendations. This artificial intelligence augmented automation promotes data normalization and harmonization for search analytics as well as for regulatory packaging of eData.
Most BioPharma companies accumulate collected data on nonclinical and clinical studies, and the molecular biomarker data from their bio-samples and hold them in their native format (SAS, Excel, flat files, etc.). These are the “data lakes” from which precisely the data that serves a business purpose should be read and transformed for that business purpose. Curating this data for scientific uses such as cross-study cohort identification or analysis is well known to be time and labor consuming. There are many opportunities for automation that reduces the time and effort while improving quality. These include identifying the needed data, semantically mapping, and transforming to the required format using deep neural networks-based recommendation engines, to self-organizing the data using supervised machine learning. This improves the speed and quality of such curation through “Smart Transformation”
Download Whitepaper: Smart Transformation of Clinical & Nonclinical Data for Insights