Data Validation
- Table of contents
Data Validation Process
The general flow of data is from the jurisdictions to Logicly (AMHOCN Component 1) via the Online Validator. Logicly validates the data and incorporates it into a master database that will grow over time as it accumulates successive years of data. Each time the master database reaches a point of stability after a round of new data being incorporated, it is encrypted and passed on to the University of Queensland (AMHOCN Component 2) where it will be analysed to produce reports and recommendations. These recommendations will form the basis for revisions to practises in Mental Health and complementary changes in training courses. The changes will be promulgated by the NSWIP in the form of seminars and training courses, along with promotional and training materials.
The jurisdictions will be able to download, refine and propose replacement data files via the Online Validator for their Jurisdiction. Having this data validated via the MDS Online Validator also allows a Jurisdiction the ability to share access to a file with the Analysis & Reporting component in order to gain insight into errors, inconsistencies, missing or miscoded data or clarification on definitions. The real time interaction help to improve the quality of data collected.
Source Data Formats
There are two main streams of data into the AMHOCN project - the Mental Health portions of the National Minimum Data Set (NMDS), and the National Outcomes and Casemix Collection (NOCC). The NMDS data of interest is already collected under the auspices of the AIHW and is divided into three subsets: Community Mental Health Care (CMHC), Residential Mental Health Care (RMHC) and Mental Health Establishments (MHE) which between them give a picture of mental health care of both inpatients and ambulatory patients along with the impact they have on the mental health infrastructure.
The NOCC data is a data stream specific to mental health designed to fill in the gaps in the NMDS data (which is, by definition, a minimum dataset) thereby permitting a wider range of analyses than is possible with the NMDS.
It is hoped that future developments will also incorporate the Admitted Patient Mental Health Care (APMHC) dataset for greater reporting and analysis although this data import would be extremely complex due to the data being submitted directly by the Health Systems.
NMDS Data Format
The CMHC, RMHC and MHE submission process allows file Submitters to upload and review their potential submissions in a private workspace (MDS Online Validator) before choosing to formally submit the file for review.
Once a file has been submitted for review it is no longer in the private workspace, rather it is best to consider it as having been sent to the Reviewers at AIHW.
When the file is still pending acceptance then the Reviewer(s) can simply reject it. If it has already been accepted then a new version can be uploaded as a “proposed replacement” for the submission and the process repeats until a submitted file is agreed to be final by both Submitter and Reviewer(s).
The structure for .DAT file submissions should meet the most current specification as listed on the following website: https://webval.validator.com.au/spec
NOCC Data Format
Data compliant to the NOCC specification is fed by the jurisdictions directly into the MDS Online Validator for processing. Once uploaded into the online Validator, the data file is validated against the relevant Data Specification to ensure validity. These checks involve such measures as ensuring mandatory fields are present, that entries meet the required format (ie: dd/mm/yy) and that outlying fields are identified and explained. The general format of a batch of data is in a text file containing a mixture of record types, each of which has a set of fixed-width fields.
Identification of Data Issues
Logicly will hold the initial specifications to form the basis for quality control documentation that will be actively maintained in collaboration with the key stakeholders and this documentation (and the history of changes to it) will be made available via the Online Data Validator (https://webval.validator.com.au/spec).
Detailed validation reports are automatic and made available immediately after data submission to facilitate the prompt identification and rectification of problems in conjunction with the Analysis & Reporting data approvers.
Reporting Issues to Data Originators
Data sets are uploaded by each Jurisdiction directly into the MDS Validator https://webval.validator.com.au and no longer transmitted to Logicly. The MDS Validator application electronically validates to ensure compliance with the governance of structural file integrity before being passed into the system and the upload deemed successful. The jurisdictions are able to review their own uploads and propose a replacement file where necessary. The issue summary page available to the jurisdiction provides details as to where the submission has Missing, Historical or Sequencing issues. These reports will go back to the data originator and Logicly will offer consulting services, as part of the AMHOCN project, to assist with the rectification of these issues. The non-conforming data set will not be applied to the database until the issues are resolved and finally accepted by the AMHOCN reviewer.
Processing Steps
As there is already an established database for the NMDS, it will be taken as the “gold” standard and the related NOCC data will be linked to it. In general, NMDS data is grouped into Episodes of Care each with a defined start of episode (eg admission to hospital) and a defined end of episode (eg discharge from hospital). The Episode of Care is the unit of analysis for Component 2 of the AMHOCN project.
NOCC data, on the other hand, has a finer granularity. The data is divided into Collection Occasions each being one of three types - admission, review, and discharge. In principal, Episodes of Care could be constructed from Collection Occasions by grouping them from an admission record through to the next discharge record. In practise, there is likely to be some mismatch between the NMDS data sets and the NOCC data sets. There may be records provided by a jurisdiction for one data set but overlooked for the other, or records in data sets that were rejected due to some coding errors. There may also be inconsistencies between the admission and discharge dates between the data sets.
Collection Occasions (CODs) are sorted by date by the Online Validator and then grouped into episodes comprising of Admission, Reviews and Discharge and the Validator will look for complete sets, with possibly open ends if at the start or end of a collection period for example if the discharge was after June 30. The intricacy is working out what happens when there are multiple CODs on a day. Also data that is missing from a collection occasion is ignored if the sequence doesn’t fit. For example a patient has a review and a discharge but no admission is found. The sequences that do make sense become what is referred to as ‘GOLD’ data.
The validation process is a critical element in this process. The fewer errors there are in the data sets, the greater the chances of getting clean matches between the NMDS Episodes of Care and the NOCC Collection Occasions. Also critical is the concept of feedback-and-resubmission as part of helping the jurisdictions to fine-tune their data production processes such that they ultimately are able to deliver high quality raw data into the AMHOCN project.
NMDS Data Processing
The Online MDS Validator that can be found at https://validator.com.au/, processes submission of the RMHC, CMHC and MHE datasets. The Validator checks that the jurisdiction’s data meets the relevant specification found at: https://webval.validator.com.au/spec. The jurisdictions can amend any non-conformant data prior to transmission. In brief, the process is:
-
A jurisdiction uploads a file to its private workspace and can be validated
-
The jurisdiction can address some or all problems raised by the validator and can upload a revised file. This will also be validated and a report provided.
-
This process can be repeated until the jurisdiction is satisfied that the data is ready and submits it to the Commonwealth.
-
The jurisdiction can invite a reviewer to assist with errors addressed in the reports by sharing the file to the Reviewer.
-
Once happy with the quality of the data the jurisdiction has the file submitted and the Commonwealth can conduct quality tests.
-
The Commonwealth accepts the file as best possible or negotiates with the jurisdiction to improve the data reported. In the latter case the submission process is repeated.
NOCC Data Processing
Logicly has developed a web-based tool for ascertaining the conformance of data files against a specification. It is known as the “online validator”, and general information about it can be found at https://validator.com.au/. The NOCC is one of the data specifications handled by the validator, and it has been made available to the jurisdictions. The validator verifies that the jurisdiction’s data meets the NOCC specification and packages the data appropriately for transmission to the AMHOCN project. The jurisdictions can amend any non-conformant data prior to transmission. A detailed diagram outlining the data validation process can be viewed at https://validator.com.au/mds-validator/major-systems-operations/. In brief, the process is:
-
A jurisdiction uploads a file to its private workspace and can be validated.
-
The jurisdiction can address some or all problems raised by the validator and can upload a revised file. This will also be validated and a report provided.
-
This process can be repeated until the jurisdiction is satisfied that the data is ready and submits it to the Commonwealth.
-
The jurisdiction can invite a reviewer to assist with errors addressed in the reports by sharing the file to the Reviewer.
-
Once happy with the quality of the data the jurisdiction has the file submitted and the Commonwealth can conduct quality tests.
-
The Commonwealth accepts the file as best possible or negotiates with the jurisdiction to improve the data reported. In the latter case the submission process is repeated.
NOCC Validation Method
Logicly has converted the previous NOCC 1.5 submissions since 2008 into a format that allows it to be used in much the same way as the current data specification and subsequently provides a greater range of years in which to compare between. The 1.5 data has been tested for accuracy by being revalidated under the same constraints as the current data gets validated.