Developing technology for patient data collection, protection and sharing.
By Svitlana Sudorina, R&D HealthTech, FinTech. CEO of Skein Group
Skein worked with the University of Oxford’s Digitally Enabled Preventative Health research group on defining the requirements and developing technology for patient data collection and analysis. It aimed at addressing the critical challenge of the quality of the data used in the evaluation of clinical outcomes, development of new drugs and vaccines, in precision medicine and personalised health treatments.
The research was funded by a European Institute of Innovation and Technology grant.
Widespread adoption of precision medicine depends on an understanding of the implications of individual variations on drug type, dose, and response in various diseases, and access to high-quality patient data.
Herpes Simplex Virus (HSV) was used as an example of a disease that affects a huge population. Due to insufficient knowledge of the virus, there are still no vaccines for it despite large numbers of infections globally. Whilst there are many studies, researchers are unable to control the data provenance and quality. The World Health Organisation defined a better collection and management of epidemiologic data as the key first step towards an improved understanding of the virus and thus advancing research.
Data heterogeneity is a key problem in standardising epidemiologic data. Through developing a standardised patient data registry tools, we aimed to facilitate the collection of better quality data for a systematical study of HSV epidemiology. As a part of this project, we also developed an innovative machine learning algorithm for the identification of risk groups among people potentially infected with HSV.
AI FOR REAL-LIFE MEDICAL EVIDENCE AND PERSONAL DATA INSIGHTS
Comprehensive understanding of patterns requires robust genomic and demographic data, that includes extended data such as family history, ancestry, genomic, biomarker and imaging information.
Anonymisation is one of the critical instruments for providing a secure environment for data sharing. The pseudo-anonymity techniques provide tools for designing a data collection system that enables users to get access to it without providing his/her identity. It requires the use of a robust cryptographic hash function to anonymise information related to the patient’s identity and solutions for reversible pseudonym generation.
An increasing proportion of older people among users means that the technology implementation needs to take that into account and avoid, for example, a purely app-based implementation.
To resolve these challenges, we created a patient registry engineering solution using machine learning and data science methods. The source database contained over 600 variables on demographic, socioeconomic, dietary, and health-related information collected by interviews and physical examinations.
The process followed the UK Government Agile Service Design framework. At the first stage, Discovery, we learned about the users and their contexts, the technological constraints, defined design requirements and user stories. After completion of Discovery, technologically we are going into Alpha with a set of user-focused requirements and design specifications. At the Alpha phase, a technological blueprint was developed.
DATA INTEGRATION AND SECURITY
We researched the health data exchange standards in operation and development:
- FHIR is a standard developed by Health Level Seven (HL7) that functions as an API for developers to access needed clinical information from the EMR
- EN/ISO 13606 – Electronic Health Record Communication
- Extensible Markup Language (XML)
- The Resource Description Framework (RDF) and RDF-Schema (RDFS)
- Simple Knowledge Organization System (SKOS)
- Common Terminology Services, Release 2 (CTS2)
FHIR and openEHR are the two most recent, robust and complete healthcare data persistence and exchange specifications.
Based on the data standards requirements, including openEHR and FHIR, we planned the database architecture that deploys a history of updates, supports the JSON, XML, RDF formats and provides oAuth authorisation. The core database is a relational PostgreSQL with additional NoSQL data storage for unstructured data.
MACHINE LEARNING MODULE
We explored and prototyped solutions based on Natural Language Processing (NLP), XGboost models and Classification and Regression Trees approaches to reducing informational entropy. Ultimately, a CART Random Forest (RF) model was used for generating questions for users in the HSV Diagnostic Tool implementation.
An anonymous lifestyle-data based questionnaire with a Random Forest algorithm was devised using Python. The algorithm was optimised to reduce the number of questions and to identify risk groups for HSV. We split the data set in training and validation subsets, which were used for training and performance testing of the model.
See more detail in the published academic paper.
SECURITY: SSL, ENCRYPTION FOR DATA AT REST AND IN TRANSIT REST API REACTJS PYTHON CART RANDOM FOREST MACHINE LEARNING MODELSABOUT
About EIT Health
EIT Health is a ‘knowledge and innovation community’ (KIC) of the European Institute of Innovation and Technology. It works across borders with approximately partner organisations, bringing together the brightest minds in healthcare to answer some of the biggest health challenges facing Europe. Headquartered in Munich, Germany, with a pan-EU representation via six regional Innovation Hubs.
About Digitally Enabled Preventative Health Research Group
The group is focused on high-impact research in digital health, developing and evaluating digital solutions to health-related issues, integrated health data ecosystems and enabling infrastructure.