GDPR vs Blockchain: Technology vs the Law

One of the biggest impacts that GDPR will have for consumers (citizens of countries that comply with GDPR, at least) is the right to be forgotten. A person can request that they be removed from a record. What if the record is part of a blockchain? This poses a challenge for blockchain implementations. Blockchains are designed to last forever. Each block has a hash based on its contents, and carries the hash of its predecessor. So when you look at a block on a blockchain, you can trace the block back through its predecessors to the founding block. Changing the contents of a block changes the block’s hash. If a block’s hash changes, the successor blocks will no longer reference it. They point to the original, valid, block. Rebuilding the chain with the replacement block means the hash for each successive block will have to be recalculated, which is an enormous computational task. In Figure 1, we see part of a blockchain showing three blocks. Block 36 contains the hash for block 35, some data (DATA yyyyy) and its own brand new hash (HASH 36). Note that some of the data may include the identity of the creator of that data – the miner who computed the hash. If the data changes, the value of HASH 36 will change. Subsequent blocks will not point to it.

Figure 1: Three blocks in a blockchain

For a distributed blockchain, the problem of modifying the chain becomes vastly more difficult. Not only will the hash for the changed block and all successive blocks have to be recalculated, but each copy of the blockchain will have to be replaced, on each machine it resides. Anyone who has ever sent an erroneous email to a group knows how hard it is to recall all those copies. Since blockchains are effectively indelible, any record containing personal information about an individual cannot be altered. Further, any individual who creates a block on a blockchain is affiliated with that block for the duration of the blockchain. In systems like bitcoin, miners use a pseudonym (usually generated using public key cryptography) to validate their authorship without revealing their identity. If that person is doxed, that is, if their real identity is revealed, that relationship is exposed for all transactions they participated in. Simply put, both the contents of a block, and the authorship of a block, are permanent. Under GDPR, an organization that constructs a blockchain may have to remove a block or modify some data to comply with a request to forget someone. How would that work? To illustrate this problem, consider this bit of history. The popular mainframe database DB2 supported security around the use of “views,” maps of the data used by applications. Each view requires an owner – a user on the system. Often this would be a particular individual’s userid, with special permission granted to allow them to create or modify a view. Over time, the individual might move into a different job or leave the company. If that person’s ID were removed from the system, the view would no longer work. Businesses had to preserve these “orphaned IDs” as long as the application using the view remained in production. This posed a problem for auditors in light of Sarbanes Oxley – each ID should have a known, authenticated user. When organizations deployed their Identity Management (IdM) tools, these orphaned IDs would flag as “invalid.” The answer is to use synthetic identities. That is, instead of using an individual’s identity, create an indirect identity and maintain the association between that pseudo-ID and the real data subject separately and securely. Mining would be handled by “Corp-ID Miner 031.” If that individual wished to be forgotten, the organization would assign that pseudo-ID to another technical professional for mining. A medical record would refer to “Corp-ID Client 192734.” If that person wished to be forgotten, the organization would re-assign that pseudo-ID to a null ID, eradicating the link from the person to the data. This example may help clarify this proposal. During the 1980s, I was custodian for some planning documents with IBM’s software development center in Poughkeepsie, NY. People in this job typically stayed for a year or two. Many other departments and labs across IBM needed information about elements of the plan, but constantly updating each potential correspondent with the email address of the new occupant of the role was tedious and unreliable. Rather, we had an email ID like “MVS Lab Plan” that was owned by whomever held that role, and their backup. Any inquiry about the plan would be directed to that ID, and whomever responded would provide the requested information (after authenticating the requestor). Just as today, when you want the police, you don’t look up the chief’s number, you dial 911 and the phone gets handled by a call center. GDPR does not prohibit blockchain, but it does put some procedural requirements around blockchain’s use in commercial enterprises. For individuals who opt into a blockchain, there is no authority to amend or correct a block once it is incorporated into the chain. For them, caveat emptor. For organizations, make sure you have a mechanism that will allow you to disassociate an individual with their blockchain contributions, either as a miner or as a data subject. For a discussion of how blockchains work, see https://bitsonblocks.net/2016/02/29/a-gentle-introduction-to-immutability-of-blockchains/ and http://blockchain.mit.edu/how-blockchain-works/ Let me know what you think! Post your comments below, or follow me on Twitter: @WilliamMalikTM.