Securing health data vulnerability for the NHS Transformation Directorate

The Challenge

When seeking to solve a medical or clinical research problem, significant amounts of data is required to provide insights and trends to correctly evaluate an approach or solution. When not enough data is available to reliably model the problem then synthetic data can be produced and supplied from various sources.

In the NHS, when synthetic datasets are generated, or synthetic data-producing models released, there is a gap in support for validating the privacy of the data and benchmarking the product against any standards. This leads to a dependence on the party generating the data to prove the privacy, which creates a conflict of interest.

A popular but complex approach is to use adversarial attacks on the data to prove what information can be ascertained if the attacker has different levels of information. Roke was asked to work alongside the NHS Transformation Directorate to develop tools which can assist in determining if datasets are vulnerable to such attacks.

The Approach

Adversarial attacks to recover real information from synthetic datasets are highly varied, and specific, depending on what information and project artefacts the synthetic data publishers release. To develop this proof of concept, we identified two common situations. These were both examples of Membership Inference (MI) attack.

A suite of extensible tools were built, with example attack scenarios deployed, for a range of synthetic data models. By building for extensibility, additional attack scenarios can be accounted for in the suite in future. The included scenarios in this commission were:

Attacker Scenario 1: Researchers have released a model description, as well as the synthetic data produced from it, but not the trained model itself. Using the model architecture and training description, an attacker may reconstruct fresh instances of the model (so-called ‘shadow models’), and train them against the highly realistic synthetic data. These shadow models allow the attacker to look for clues at a per-datum level that give away when details of real individuals have leaked through the synthetic data model.

Attacker Scenario 2: Researchers have an upload facility (a black-box model) for anonymising user-uploaded datasets, have released example synthetic data and described their model, without releasing the trained version. By uploading many copies of the same realistic data (which may be gathered by the attacker, or example synthetic data released by the researchers), the attacker can measure exactly the differences introduced by the black box model, and train a new model to recognise when the changes introduced for anonymisation have led to large or small change to the input data. This attack model can then be run against the example dataset released by the researchers, and any data where the stochastic synthetic data model introduced little to no change can be recognised – this is then known by the attacker to be a real patient’s data leaked.

THE OUTCOME

The project developed tools suitable to simulate the attack scenarios outlined above, in an extensible manner. Adaptability of the tools to a wide variety of models was enabled by expecting ‘code-injection’ (exploitation of a computer bug that is caused by processing invalid data) as the technique that attackers would most likely use. This theorises that skilled users with a good understanding of the target models would add small amounts of custom code to a specified location for the developed tools to ingest. This was a reasonable assumption of capability as, the intended users of the suite will always be synthetic data researchers wishing to attack their own releases to discover vulnerability.

This project has provided the opportunity to increase assurance against synthetic data attacks, for two particular data release scenarios, and has laid the groundwork for additional scenarios to be accounted for.

By creating the extensible platform within which these attacks can be rolled out quickly, it is clear that the technology created has the potential to provide a much-needed level of assurance against highly-bespoke attacks that existing open source libraries do not provide.

The adoption of this platform would improve system resilience by lowering the threat of unintentional data leaks, and helping to ensure data can be pre-emptively protected through testing. By applying our capabilities in this area, we’re able to help support the digitalisation of organisations such as the NHS, protecting their systems and reputation

Talk to the experts

Interested in AI, data security or our other capabilities? Talk to an expert.

Get in touch Futures

Case study

Securing health data vulnerability for the NHS Transformation Directorate

Talk to the experts

Related news, insights and innovations

Roke’s contribution to making the UK the new Silicon Valley

Protecting children from online harms - how AI can pave the way to a safer internet

How the UK Government can evolve to embrace artificial intelligence (AI) ethically

How the UK can counter drone and missile attacks

British Army trusts Roke to deliver £40m contract for the next phase of Project ZODIAC

Robots at scale - what does the future of autonomy look like?

Roke delivers innovation to the Anglian @one Alliance partnership to enhance water quality monitoring

Roke launches dynamic and portable Counter-UAS system to tackle growing unmanned air system and drone threat

Roke appoints Raj Dedi as Futures Director

Protecting children from online harms - how AI can pave the way to a safer internet

Robots at scale - what does the future of autonomy look like?

Empowering Human Robot Collaboration - how it can enhance situational awareness

Roke launches new Intelligence service

Roke appoints Raj Dedi as Futures Director

Roke invests in future growth and customer delivery, with new state-of-the-art office in Gloucester

Roke boosts sustainability drive: Joins Hellios' JOSCAR Zero Initiative

Roke invests in future growth and customer delivery, with new state-of-the-art office in Gloucester

British Army trusts Roke to deliver £40m contract for the next phase of Project ZODIAC

Roke delivers innovation to the Anglian @one Alliance partnership to enhance water quality monitoring

How Roke is tackling spoofing and jamming attacks head-on

Roke launches dynamic and portable Counter-UAS system to tackle growing unmanned air system and drone threat

Protecting children from online harms - how AI can pave the way to a safer internet

British Army trusts Roke to deliver £40m contract for the next phase of Project ZODIAC

Robots at scale - what does the future of autonomy look like?

Supporting the most important change to expanding and accessing NHS services in our time

Digital superiority and artificial intelligence - how to achieve it?

Banking on security: Protecting digital assets

Natasha: STEM, sewing and Cyber Security

John: Hacking the hackers

Steven: Paving the path to cyber success

Awards & Accreditations