A four-step guide to engineering privacy into any system

Nehal Maniar, chief technology officer at Trūata explains the importance of privacy when it comes to handling data, and how to ensure that you’re treating that data with respect.

Engineers solve problems. Indeed, the solutions to some of the world’s greatest challenges – manned flight, the provision of clean water, the sustainability of human life on Earth – are, above all, engineering challenges. And engineers know this, which is why the National Academy of Engineering has produced a list of 14 challenges for them to solve in order to – well, if not to save the world, then at least to make life on Earth much better.

Notably, these challenges can be grouped into categories: sustainability, health, security and quality of life. While they will be solved using cutting-edge technologies, such as AI, machine learning, the IoT and VR, they are fundamentally about human experience. They are about people.

And therein lies a further, overarching, challenge that is common to all systems and all technologies that rely on personal data. For data is at heart an account, a stored record, of human experience. As such, we must treat it with respect. That’s why we have GDPR, it’s why data protection and privacy is enshrined in the EU Charter of Fundamental Rights and it explains the public outrage at Cambridge Analytica and similar scandals.

Unfortunately, this creates a problem for engineers. Modern technologies like AI are fed on a diet of consumer data. Without accurate and comprehensive data, they don’t work properly. Without data, the 14 challenges set by the Academy cannot be solved. But these technologies, with their huge potential for system optimisation and leaps forward for mankind, have arisen just as the regulation of data processing has become more robust.

That has left many engineers clutching their heads in frustration, as they try to meet the competing demands of system optimisation and data protection. It’s not easy. But the good news is that it can be done, if a thoughtful four-step process is followed.

Step 1: Make privacy a key system requirement

To ensure privacy, engineers must design-in data privacy from the outset. That means including data privacy as a non-functional requirement (NFR) when the system is initially scoped.

This makes data privacy non-negotiable as the system evolves, and so use cases must be built for associated functional elements/features such as the individual’s right to be forgotten and the recording/archiving of relevant data processing permissions.

The beauty of this approach is that it requires the system to take account of regulatory requirements from the outset and avoids the pain of having to force privacy policies to work with a system and technologies that, if left unmanaged, would tend towards activities that may be judged non-compliant with privacy legislation.

Step 2: Understand the business domain

In order to plan privacy into any system, particularly as an NFR, the engineering team must have a clear understanding of the business domain. So, for example, a security-related NFR must map the critical assets and identify key information that could be harmful if exposed. That information could be anything from credit card numbers to cryptographic keys. The project engineers can only secure these items when they understand their importance and their role in, and during, the functioning of the system.

For data privacy, this step is particularly important because it’s very easy for engineers to overlook some personal data or other vital elements. While it is common sense to assume that direct identifiers such as names and e-mail addresses are personal data, it’s easy to miss the less obvious items and/or ways in which secondary data or indirect identifiers may be applied. Therefore, engineers must understand what types of data they are looking for, how such data flows through the system and when that data may merge with other data.

Engineers should also, in the first instance, establish why any personal data is being used and delete it from the system wherever possible when use data is no longer necessitated or lawful. If data must be retained, then comprehensive documentation of the hows and whys of this may prove invaluable later, particularly if future complaints or queries about privacy arise.

Step 3: Apply industry standards to the handling and processing of personal data

It is good risk management practice for engineers and system designers to understand and takes steps to design and implement known and recognised standards into systems. This can include ISO standards, guidelines issued by data protection authorities and other organisations such as NIST. For example, system designers may wish to obfuscate personal data as far as possible, even before analysis or re-purposing is considered. This is best done by applying privacy best practice in anonymisation to the system’s development and thus, in all of its functioning.

Examples of best practice to be designed-in include keeping all personal data stored together within the system, so that its radius (and thus the scope of its management) is restricted and using pseudonymisation techniques to blur the data, whenever possible. Using larger date ranges to categorise data and greater use of generalisation are examples of how engineers might achieve this. Engineers and system designers will find a considerable array of guidelines, standards and best practice frameworks to assist with these tasks.

Step 4: Add privacy-specific test cases to QA plans

It is standard practice for all NFRs to be tested during QA processes and this applies as much to consciously designed privacy NFRs as to anything else. However, it should be noted that relatively few privacy tests will have binary yes/no outcomes, because most are matters of contextual risk assessment and mitigation.

Take the example of anonymisation. Much of the QA testing around privacy will comprise questions about how easy it is to identify or re-identify somebody using the data generated by or in the system, how straightforward it would be to make inferences about or single out individuals. Every company must determine its own acceptable level of risk (within the bounds of regulation, of course) and balance that with the advantages of, or need for, the processing of such data.

Naturally, further challenges will arise during system development and given the subjective nature of the topic, compromises will probably have to be made. Balancing the data requirements of new tech and Big Data with the privacy requirements of human beings may never be easy, but by using the four steps described above, system designers can at least be sure that the have covered some of the fundamentals and fulfilled their duty to all system users.