Harmonizing Raw Data with CDISC Standards to SDTM

View Webinar Recording :

Harmonizing Raw Data with CDISC Standards to Streamline SDTMs

Having trouble viewing the video? Please click here

In this webinar (Harmonizing Raw Data with CDISC Standards to Streamline SDTMs), we introduce a new, automated mapping of raw clinical EDC and biomarker data to SDTM mapping through the Universal Data Model which standardizes all raw and CDISC metadata (files, variables, data, etc.).

Xbiom leverages EDC raw data and metadata attributes to compare and map to SDTM metadata and controlled terminology variables and deviations. Users then visually review and confirm Xbiom’s recommendations of each SDTM variable mapping and data. SDTM programming and mapping for statistical analysis output has traditionally been labor and code intensive.

Fast Turnaround: Rather than a 6-week turnaround cycle following a database lock, SDTM can now be generated in 24 hours after the automatic and user mapping process. Non-programmers can access a simple, low-code interface for control and visualization of EDC raw data to SDTM mapping.

Minimal Repetition: Users then visually review and confirm Xbiom’s recommendations of each SDTM variable mapping and data, while Xbiom remembers and learns the clinical study’s profile. This greatly helps when EDC raw data is refreshed. We’d love to address all the questions and concerns that come to your mind when you think about the topic.

Download: CDISC-360 Mission: SDTM Design and Automation

Download: End-to-End Clinical Study MetaData-Driven Process

If you have anything in particular that you’d like to be discussed, let us know at: ask@pointcross.com

Webinar Questions

Q1: I have the impression that the software uses SAS under the hood. Is that correct? Does it mean that it requires an (expensive) SAS license?

No, Xbiom does NOT use, or rely on any SAS server or software in Xbiom. Xbiom is built on a big data stack, and under the hood, uses Hadoop components such as Spark with some controls for advanced users to write in Pyspark, Python and R.

In the event you do need to use SAS, the data does export in XPT and can be used to exchange data with your external SAS server if you have one.

However, there is nothing that requires users to leave Xbiom as all functions are self-contained within it

Q2: Use of supervised learning ML is a great feature- what are the training sets used for recommendations? are recommendations TA specific- did you see variability in TA wrt specificity/sensitivity of predictions

ML is applied differently in separate contexts within Xbiom. In the Smart Transformation module, data from disparate sources with their native formats or de-facto standards are being transformed to an invariant model and terminology that is used by the UDM (unified, or universal data model). This is supported by a recognition process, a decision support process that appears as a “recommendation from the engine”, and a operator override that affirms the user’s intention. The recognition process is supported by a machine memorizing set of column names (source terminology), their provenance, and type. The machine matches the incoming raw data elements to the stored patterns which form the training sets. Once recognized and depending on the context of the data (e.g., EDC or a LIMS source or a biomarker Assay) the Smart Transformation proposes a mapping to the invariant term and model that the UDM will expect.

About 80+% of incoming data tends to be identified this way and these are presented as recommendations. Operator overrides are used to add to the standard recommendations stored by the machine. Conflicts are presented when the new data is being ingested, or the recommendations are suppressed. Similar to terminologies, the data values in the data variable fields are also assessed to make sure that the kind of data associated with an input variable is of the same type (e.g., date like fields as opposed to integer flags). This is used to discriminate among conflicting recommendations.

We use TensorFlow™ within Xbiom for certain aspects of machine learning where there is an exceptionally large training set available.

Q3: Is XBIOM also a MDR? Is there an ability to do impact assesment- changes E2E across collection->tabulation->analysis

Yes, Xbiom includes the complete MDR capabilities and several controls for users to perform impact analysis, identify changes between multiple published models, global models (e.g. templated per therapeutic area), and different versions of evolving study model. Change reports till the attribute level can be viewed/exported.

Xbiom is a “System of Records” and it has been used by various clients over the past 18 years in two industries for enterprise data, metadata, unstructured content, highly compartmentalized and secure content. All the records management and governance of data in any solution including the Xbiom solutions are configured to be 21 CFR Part 11 for BioPharma business whether or not the client wishes to have it IQ, OQ, PQ qualified. The same is true for all document management and metadata management.

As an enterprise level MDR, Xbiom does meet the governance and integrity needs called for master data management of data and metadata.

Q4: Export of mappings to excel is very useful for documentation and traceability. Does it also work other way round (import existings mapping specs)?

Yes, the mapping specifications are supported for both import and export. Sometimes it can be transferred from one Xbiom instance to another, for example, test to production server.

Xbiom ingests data from disparate sources including most files types of data listings (csv, xlx, xpt, xml etc.) and the data model within is checked against its known de-facto models and standard, and automatically ingested and transformed into the UDM’s meta model. If the incoming data such as an Excel is not recognized (custom model) then the automated ML supported Smart Transformation takes over and provides a recommendation on how to map the incoming data with options to override it. The transformation models are bi-lateral, allowing data within to be exported to de[1]facto, de jure or even custom data models; while providing the same service in reverse as data is ingested from external sources.

Q5: Hi Sunil, Not sure if I joined late, as a programmer, there are various known issues, that occur with Raw datasets, I wonder how the system handles this

Yes, there are a number of issues that occur within raw input data. Here is a brief description of some of the issue types and how they are handled by the Xbiom input portal, and smart transformation engines. There are a few terms that need a description first, so that the discussion about raw data issues will have a context:

VDR: This is a Virtual Data Room where incoming data from external sources, or data crawled from external stores (such as AWS S3 etc.) or periodically polled from client data sources are brought. It is also the virtual portal through which Xbiom communicates and shares exported data to the client or partners. All the security expectations are managed through roles and responsibility – RBAS Role Based Access Control.

eDV: eDataValidator is a standards validation engine that checks data sets against conformance rules associated with data from de-jure exchange standards s, ADaM< SEND etc. ) as well as de-facto standards such as those established between a sponsor and their Specialty Assay service provider or CRO; LIMS and EDC’ ODM etc. eDV provides validation reports and it drives dashboards so that users can visualize and manage data issues in incoming raw data. eDV is used in the management of raw data issues as will be explained further below.

Smart Transformation: This is bi-lateral data model and terminology transformation module that handles de-jure and de-facto exchange standards. It is supported by algorithmic, ontology based, and machine learning tools such as TensorFlow with custom scripts for filling gaps in the recommendation engine, or for custom many to many mapping.

Standard Management and Terminology Management: This is managed by engines that provide mapping, and transformation services that support de-jure and de-facto standards as well as customized standards.

Raw data being ingested can have many types of issues. Some of the common issues that need to be handled in the course of clinical or nonclinical study data are listed and discussed below.

Data Format Issues: Xbiom recognizes data of a wide variety of formats including structured (data base extracts, machine readable data listings in columnar format), unstructured (e.g. documents, images etc.). Any header or associated metadata is automatically extracted and held for supporting curation, or as a matter of record keeping. Since the data is read from the incoming files loaded to the VDR, data corruption, although rare are picked up at this stage of format and readability checks.

Data Model Issues: Raw data from LIMS or EDCs can have data model issues where naming conventions or column headers are missed or entered incorrectly. These are picked up by the Xbiom Smart Transformation module and the collection of data input files and their issues are catalogued and displayed in an interactive dashboard from where a curator can parse them and either fix them or share them back with the source if the fix must be done at source. If the incoming raw data is supposed to meet a specific exchange standard – e.g. SDTM Vx.y with a CT of a particular version – then the eDataValidator may be automatically run to identify the many hundreds of conformance rule that are applied to the data domains. The output and summaries are available on the dashboards. The same is true for recognized de-facto exchange standards.

Missing Data or Incorrectly Coded Issues: This is quite common with EDC and occasionally with LIMS data when the original CRF data is entered manually. Visit days and cycles can often be improperly coded. These issues – in the context of clinical trial data – are captured during the longitudinal integration when missing or incorrect entries can be picked up using mismatched patterns and time[1]shifted data on specific subjects. Here, a combination of support services can be used to surface the occurrence of these issues and the remedies, which are recorded. All corrections are made in the forward direction and a log of “errata” are maintained in the logs, while the curated data is allowed to proceed. In the case of Clinical Trial data from a database lockout of the EDC, this errata is maintained after each lockout and shared through the VDR to Clin-Ops or other Biometrics teams for their CFR Part 11 processes related to any corrections to EDC. If an error seen in a previous lockout is found to have been fixed in a subsequent lockout, that is also registered, and the subsequent errata files will show the change with a time stamp. If a subsequent lockout snapshot shows a correction that is done differently than the original correction done in Xbiom, the authoritative correction is applied and the change in Xbiom is recorded with a reason for change.

If missing data about a subject must be imputed, then such changes are made only by authorized users with the right roles and the change is recorded and logged and any script used for imputation of findings is also recorded and logged. The same process is used for assay data where notations are standardized.

Q6: Does Xbiom have any dependencies on 3rd party trial data visualization tools such as Spotfire or BI

Xbiom does NOT have any dependencies on external data visualization tools such as SpotFire, Tableau, Graphpad, Clik or other BI Tools

Xbiom has its own embedded visualization tool within the IGO – or Immersive Graphics Object module. The IGO uses the Data Services Layer within the UDM to allow access from the UDM for analysis and plotting of as-collected curated data, all metadata including native, derived metadata, extended metadata and groupings for stratified cohorts. The IGO provides a rich set of tools to further select the displayable data along with computed statistics and statistical tests for plotting in the IGO. The IGO has controls for selecting from a palette of plot types and controls that allow for various chart types, tabulations and plotting techniques (individual subject data with scatter plots, groups data with line, box, Z-transforms etc..). A separate set of controls allow the user to dynamically change and select the subset of data to be plotted.

Xbiom’s IGO supports the snapshotting of a data view for the purposes of creating and annotating a TFL Object (Table-Figure-Listing). Each TFL object is a self-contained data object that is stored along with searchable metadata including automatically associated hashtag metadata, and user included annotations and hashtags for collaboration, sharing and searching. These could include hashtags about therapeutic Areas, biomarkers, biomarker panels that may be associated with a specific biologic system and such. The TFLs are readily publishable into raster and RTF for generation of PowerPoint, Word, Excel, and XML objects. Collected TFLS across studies can be searched and compiled into publishable sets for posters and papers.

Xbiom has a powerful published API that is OData compliant that allow technical users to let their instance of data visualization or analytics tools to control, and query Xbiom for data or TFLs for use in their own toolsets; or to trigger the embedded tools and scripts in Xbiom to generate results remotely. User may also interact with Xbiom using Jupyter notebook with Python or R.