Dataset-JSON

Pilot
Specification

CDISC and PHUSE are delighted to announce a new pilot project aimed at supporting the adoption of Dataset-JSON as an alternative transport format for regulatory submissions. This pilot builds upon the considerable amount of work done over the years to replace XPT as the default file format for clinical and device data submissions to regulatory authorities.

The pilot will be split into short-term goals of the acceptance of Dataset-JSON as a transport format option (in addition to existing XPT format), as well as the development of the future strategy relating to the adoption of advanced Dataset-JSON. The pilot report will be completed in Q2 2024.

Milestone 1: Short-Term

Pilot submissions using the JSON format with existing XPT ingress/egress to carry the same data
Same content, different suitcase, no disruption to business process on either side
In parallel, evaluate with the FDA how their toolset can support JSON format and identify a tool upgrade roadmap

Milestone 2: Development of Future Strategy

Evaluate how current and future industry standards can benefit without XPT limitations (e.g., Variable names > 8, labels > 40, data > 200)
Evaluate combining metadata with data (e.g., Define-XML / Define-JSON based)
Enhanced conformance rules
Collaborate with the FDA to develop plans to retool their environment to natively consume JSON

Dataset-JSON was released as part of ODM v2.0 in 2023. Dataset-JSON version 1.1 is currently under development and will be published as an independent standard.

Dataset-JSON was adapted from the Dataset-XML Version 1.0 specification but uses JSON format. Like Dataset-XML, each Dataset-JSON file is connected to a Define-XML file that contains detailed information about the metadata. One aim of Dataset-JSON is to address as many of the relevant requirements from the PHUSE 2017 Transport for the Next Generation paper as possible, including the efficient use of storage space.

Dataset-JSON uses lowerCamelCase notation for attribute names, compared to Dataset-XML PascalCase (e.g., clinicalData vs ClinicalData).

JSON format does not allow to specify or control order of attributes. Despite that, as most JSON engines allow to control the order of attributes, it is strongly recommended to follow the attribute order specified in detail. Due to a possible large size of Dataset-JSON files, following the specified order will enable a software using steaming approaches to read the file to work in an efficient and fast way.

Dataset-JSON must contain only one dataset per file.

Top Level Attributes

At the top level of Dataset-JSON object, there are technical attributes and two main optional attributes: clinicalData and referenceData, corresponding to Dataset-XML elements. At least 1 of the main attributes must be provided. Subject data is stored in clinicalData and non-subject data is stored in referenceData.

Attribute	Usage	Description	Attribute order
creationDateTime	Required	Time of creation of the file containing the document.	1
datasetJSONVersion	Required	Version of Dataset-JSON standard	2
fileOID	Optional	A unique identifier for this file.	3
asOfDateTime	Optional	The date/time at which the source database was queried in order to create this document.	4
originator	Optional	The organization that generated the Dataset-JSON file.	5
sourceSystem	Optional	The computer system or database management system that is the source of the information in this file.	6
sourceSystemVersion	Optional	The version of the "SourceSystem" above.	7
clinicalData	Optional	Contains datasets for clinical data across multiple subjects.	8
referenceData	Optional	Contains datasets for non-subject data domains.	9

{
    "creationDateTime": "2023-03-22T11:53:27",
    "datasetJSONVersion": "1.0.0",
    "fileOID": "www.sponsor.xyz.org.project123.final",    
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.0.0",
    "clinicalData": { ... },
    "referenceData": { ... }
}

ClinicalData and ReferenceData Attributes

Both clinicalData and referenceData have the same structure. Each of these attributes contains study and metadata OIDs, optional reference to the metadata file and an object describing an item group (dataset). The following attributes are defined on this level

Attribute	Requirement	Description	Attribute order
studyOID	Optional	See ODM definition for study OID (ODM/Study/@OID).	1
metaDataVersionOID	Optional	See ODM definition for metadata version OID (ODM/Study/MetaDataVersion/@OID).	2
metaDataRef	Optional	URL for a metadata file the describing the data.	3
itemGroupData	Required	Object containing dataset information	4

Values of the studyOID and metaDataVersionOID must match corresponding values in the Define-XML file.

{
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "metaDataRef": "https://metadata.location.org/api.link",
        "itemGroupData": { ... }
}

ItemGroupData attribute

itemGroupData is an object with a single attribute corresponding to an individual dataset. There must be only one dataset per Dataset-JSON file. The attribute name is OID of a described dataset, which must be the same as the OID of the corresponding itemGroupDef in the Define-XML file.

"itemGroupData": { 
    "IG.DM": { ... }
}

The dataset description contains basic information about the dataset itself and its items.

Attribute	Requirement	Description	Attribute order
records	Required	The total number of records in a dataset	1
name	Required	Dataset name	2
label	Required	Dataset description	3
items	Required	Basic information about variables	4
itemData	Required	Dataset data	5

"IG.DM": {
    "records": 100,
    "name": "DM",
    "label": "Demographics",
    "items": [ ... ],
    "itemData": [ ... ]
}

items is an array of basic information about dataset variables. The order of the elements in the array must be the same as the order of variables in the described dataset. The first element always describes the Record Identifier (ITEMGROUPDATASEQ).

Attribute	Requirement	Description	Attribute order
OID	Required	OID of a variable (must correspond to the variable OID in the Define-XML file)	1
name	Required	Variable name	2
label	Required	Variable description	3
type	Required	Type of the variable. Allowed values: "string", "integer", "decimal", "float", "double", "boolean". See ODM types for details.	4
length	Optional	Variable length	5
displayFormat	Optional	Display format supports data visualization of numeric float and date values.Â	6
keySequence	Optional	Indicates that this item is a key variable in the dataset structure. It also provides an ordering for the keys.	7

"items": [    
    {
        "OID": "ITEMGROUPDATASEQ",
        "name": "ITEMGROUPDATASEQ",
        "label": "Record identifier",
        "type": "integer",
    },
    {
        "OID": "IT.DM.STUDYID",
        "name": "STUDYID",
        "label": "Study Identifier",
        "type": "string",
        "length": 12,
        "keySequence": 1,
    },
    ...
]

itemData is an array of records with variables values. Each record itself is also represented as an array of variables values. The first value is a unique sequence number for each record in the dataset.

"itemData": {
   [1, "MyStudy", "001", "DM", 56],
   [2, "MyStudy", "002", "DM", 26],
}

Missing values are represented by null in the case of numeric variables, and an empty string in case of character variables: [1, "MyStudy", "", "DM", null]

The following is a full example of a Dataset-JSON file:

{
    "creationDateTime": "2023-03-22T11:53:27",
    "datasetJSONVersion": "1.0.0",  
    "fileOID": "www.sponsor.org.project123.final",
    "asOfDateTime": "2023-02-15T10:23:15",
    "originator": "Sponsor XYZ",
    "sourceSystem": "Software ABC",
    "sourceSystemVersion": "1.2.3",
    "clinicalData": {
        "studyOID": "xxx",
        "metaDataVersionOID": "xxx",
        "metaDataRef": "https://metadata.location.org/api.link",
        "itemGroupData": {
            "IG.DM": {
                "records": 600,
                "name": "DM",
                "label": "Demographics",
                "items": [                      
                    {"OID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record identifier", "type": "integer"},
                    {"OID": "IT.STUDYID", "name": "STUDYID", "label": "Study identifier", "type": "string", "length": 7, "keySequence": 1}, 
                    {"OID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "type": "string", "length": 3, "keySequence": 2}, 
                    {"OID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Identifier", "type": "string", "length": 2},
                    {"OID": "IT.AGE", "name": "AGE", "label": "Subject Age", "type": "integer", "length": 2}
                ],
                "itemData": [
                    [1, "MyStudy", "001", "DM", 56],
                    [2, "MyStudy", "002", "DM", 26],
                    ...
                ]
            }
        }
    }
}