The BioPharma industry continues to misapply exchange standards for clinical and nonclinical data to do analysis and research. FHIR HL7 and CDISC SDTM tabulation models are perfect for accurate data “exchange” between any two computer systems that either collect, analyze or review such data. SDTM, however makes it hard to readily exploit, data within, for scientific review, analysis, cross-study, meta-analysis or retrospective research.
Imagine exchange standards as a meticulous organizer for placing diverse items into numerous boxes like the ones used to deliver online shopping. While this approach ensures an organized categorization, it prevents data from being analyzed unless it is unpacked from these “boxes” and prepared for analysis. Putting these boxes in a storage room doesn’t help. Neither does creating a database from the exchange data standards. The data must be unpacked and re-organized so they can be used for the intended purpose. That is intensive programming for creating ADaM sets.
Consider simplifying the process. Take the data out of their “boxes” and reorganize it into a repository with a unified data model. Make it easy to find subjects in a clinical trial that meet a common set of criteria to form analyzable cohorts. Make it easy to compare cohorts within a study or across studies. Do meta-analysis. And add layers of annotated interpretations to form a knowledge stack of data, metadata and interpretations. Use these to form the basis for machine intelligence and automation.
Others in internet communications, manufacturing, or networks governing online commerce, adeptly navigate vast volumes of rapidly evolving content. Their challenges eclipse those faced by the BioPharma industry. They continually collect data from various nodes – or Sites, CROs, and Labs in BioPharma – and relay it to their parent organizations and then on to other recipient entities such as regulatory agencies. They use:
- An Exchange Standard for “inter-node” data transmissions in their data-supply-chain network: e.g. SDTM, SEND or FHIR HL7 in Bio-Pharma; EDI in supply chain management, SMTP, FTP, TCP-IP, UDP-IP, SIP, https etc. in internet communications.
- A Repository Model for “intra-node” use by buffering, holding or storing data to suit their individual purposes at each node: e.g., a RDBMS warehouse or a Big-data store like Hadoop, with a Unified Data Model that is searchable and analyzable.
- A “codec”, or Encoder-Decoder, that is capable of very fast automated transformation between any standards models: e.g., hardware or software codecs in telecommunications and internet, or a semantically aware Smart Transformation mapper of standards.
- A Master Data Registry or repository (MDR) of all the standards, for models, terminologies, and mapping registries.
Misapplying Exchange Standards, such as SDTM or SEND for creating a data repository levies heavy penalties on performance and ease of analysis. It is due to the overhead of running scripts to un-pack the data in the exchange packets of SDTM domains, and convert them into analysis ready sets, before additional scripts can be run to perform the analysis to generate TFLs.
On the other hand, a unified, stacked model of the “single point of truth” data, metadata, and interpretations in a repository forms an indexed, searchable knowledge stack that is a powerful resource for exploratory, discovery, and developmental research.