Federated data analysis
Health research is largely dependent on the analysis of personal health data. However, the sharing of these and the possibilities for their subsequent use are severely limited. For example, consent forms from subjects are often not transferable to other research objectives or are subject to various legal regulations. In addition, there are considerable restrictions for institutes to share personal health data with other cooperation partners within a research project.
Innovative approaches such as Code2Data technologies have emerged to address these challenges. The goal is to create a collaborative research environment. Federated data analytics enables privacy and data security to be maintained and ensures that data-holding organisations do not have to relinquish control over their data. This is done by creating a secure analysis space that allows analysis code to be sent to the data for analysis without the need to disclose or transmit personal data. NFDI4Health presented two methods for federated data analysis in a bar camp session at the Data for Health Conference 2023.
DataSHIELD and Personal Health Train
In the area of distributed data analysis, DataSHIELD offers a technological approach. Here, the data-holding organisations make their individual data available behind a firewall in special databases (Opal servers). The data itself cannot be viewed by the scientists, but is analysed via an R server connected to the Opal database. The scientists can connect to the servers via login processes in order to execute DataSHIELD functions and receive aggregated statistics.
The second platform presented was the Personal Health Train (PHT). The PHT comes from a real world analogy: railway system with trains, stations and train depots. As with DataSHIELD, the goal of this design paradigm is to bring the algorithm to the data, rather than bringing confidential data to the algorithm, which enables compliance with privacy requirements.
Each component of the PHT is containerised using Docker technology to facilitate software development. In addition, the components are loosely coupled to allow for possible extensions and orchestration of (REST API) web services.
In the bar camp session, we discussed the advantages and limitations associated with these innovative approaches. The guiding question of our discussions here was: How can personal health data be handled responsibly?