Activities & Results
Activities, awards, and other outputs
- Speakers at workshops on “AI inom medicinteknik,” session “Vad minns en högparametriserad modell? Organized by Läkemedelsverket, April 6, online with more than 150 participants from industry and regulatory bodies.
- “Tillgängliggörande av hälsodata,” Dec 2021 online with more than 50 participants from four regions participating
- “Digital innovation i samverkan stad, region och akademi,” Oct 2021 online with about 20 participants from KTH, Region and City of Stockholm, plus some KTH internal events.
- Organisation and participation of panel at Nordic Privacy Forum 2022 panel discussing calculated privacy and the interplay between law and tech.
- DataLEASH organizes regular seminars every two months for three years with the City of Stockholm and Region Stockholm about requirements from the stakeholders and the results from our research project.
- SAIS 2022, Swedish AI Society workshop, is organised and paper [BFLSSR22] is presented in this workshop.
- Award: Rise solution for Encrypted Health AI was announced the winner of the Vinnova Vinter competition in the infrastructure category.
Research objectives of DataLEASH are: (i) develop and study privacy measures suitable for privacy risk assessment and utility optimization; (ii) characterization of fundamental bounds on data disclosure mechanisms; (iii) design and study of efficient data disclosure mechanisms with privacy guarantees; (iv) demonstration and testing of algorithms using real-data repositories; (v) study of the cross-disciplinary privacy aspects between law and information technology.
Research achievements and main results of DataLEASH:
- Pointwise Maximal Leakage (PML) has been proposed as a new privacy measure framework. PML has an operational meaning and is robust. Using the framework, several other privacy measures have been derived and their properties have been characterized as well as the relation to existing privacy measures have been established.
- The privacy-preserving learning mechanism PATE has been studied using conditional maximal leakage explaining the cost of privacy. PATE approach has been extended to deal with high-dimensional targets such as in segmentation tasks of MRI brain scans.
- Fundamental bounds on data disclosure mechanisms have been derived considering various pointwise privacy measures. Furthermore, approximate solutions to optimal data disclosure mechanisms have been derived using concepts from Euclidian Information Theory.
- In a cross-disciplinary study between law and tech, we propose and discuss how to relate the legal data protection principles of data minimization to the mathematical concept of a sufficient statistic to be able to deal from a regulatory perspective with the rapid advancements in machine learning.
- Health Bank, a large health data repository of 2 million patient record texts in Swedish has been de-identified. A deep learning BERT model, SweDeClin-BERT, has been created and obtained permission from the Swedish Ethical Review Authority to be shared among academic users. The model SweDeClin-BERT has been used at the University Hospital of Linköping with promising results. Handling sensitive health-related data is often challenging. Proposed Fully Homomorphic Encryption (FHE) to encrypt diabetes data. The proposed approach won the pilot Winter competition 2021–22 organized by Vinnova.
- We created a systematization of knowledge on ambient assisted living (combining the challenges of mobile and smart-home monitoring for health) from a privacy perspective to map out potential issues and intervention points.
- Using a cryptographic approach, we developed distance-bounding attribute-based credentials, which provide anonymity for location-based services, provably resisting attacks.
- We investigated the uses and limitations of synthetic data as a privacy-preservation mechanism. For image data, we developed a framework of clustering and synthesizing facial images for privacy-preserving data analysis with privacy guarantees from k-anonymity and found trade-off choice points with analysis utility. In a different work on facial images, we proposed a novel approach for the privacy preservation of attributes using adversarial representation learning. This work removes the sensitive facial expressions and replaces them with an independent random expression while preserving facial features. For tabular data, we investigated across several datasets whether different methods of generating fully synthetic data vary in their utility a priori (when the specific analyses to be performed on the data are not known yet), how closely their results conform to analyses on original data a posteriori, and whether these two effects are correlated. We found classification tasks when using synthetic data for training machine-learning models more promising in terms of consistent accuracy than statistical analysis.
In the interplay between information technology and law, the project itself has been a testbed, given the personal data processing in research of this kind. Quite often, there is a challenge merely to find the governing legal framework. Practical experiences and theoretical studies can be a sign of this. However, much research today is concentrated on specific data protection regulations. The reasoning above boils down to a broadened approach to GDPR.