To gain insights from various studies, it is important to ingest disparate data that have inconsistent data models, terminologies, and unstructured descriptions into a searchable data store. Smart transformation is a crucial part of data curation, which replaces fixed adaptors and mappers. This technique uses machine learning to transform clinical, nonclinical, and biomarker data from data lakes into a target model with automation.
Expertly curated datasets are used to train multiple deep neural network models, which transform the disparate source data. Recommendation engines utilizing ontologies and vocabularies referenced in the target data model definition help harmonize the transformed data. The smart transformers continue to improve and learn, adaptively evolving as data managers intervene to assert or correct errors in the transformation or users make decisions on metadata, content, and terminology recommendations.
This artificial intelligence augmented automation promotes data normalization and harmonization for search analytics, as well as for regulatory packaging of eData.
BioPharma companies often collect data from nonclinical and clinical studies, as well as molecular biomarker data from their bio-samples. This data is typically stored in various native formats, such as SAS, Excel, and flat files, in what is referred to as “data lakes”.
To utilize this data for business purposes, it needs to be read and transformed appropriately. However, curating this data for scientific uses such as cross-study cohort identification or analysis is a known to be a time and labor-intensive process.
To address this challenge, there are numerous opportunities for automation that can reduce time and effort while improving the quality of the data. These include identifying the necessary data, semantically mapping and transforming it to the required format using deep neural network-based recommendation engines, and self-organizing the data using supervised machine learning. This approach, known as “Smart Transformation”, can significantly improve the speed and quality of data curation.
Download Paper: Smart Transformation of Clinical & Nonclinical Data for Insights