Generative artificial intelligence (Generative AI) has been a topic of increasing interest, especially following the release of ChatGPT by OpenAI in November 2022. These systems generate outputs such as pictures, text, and other media in response to prompts entered by a user. Examples of ChatGPT’s capabilities range from drafting cover letters for job applications to university research papers and LinkedIn posts.
These systems do, however, pose interesting data protection issues.
Data collection
Article 14 of the GDPR requires organisations to explain to individuals how they process personal data and how the personal data was obtained, when not obtained directly from the data subject. It is reasonable to assume that the method used to train ChatGPT (whereby publicly accessible data is “scraped” from the internet, including books, articles, individual websites and social media posts) will result in personal data being used for the purposes of training the Generative AI model and that the individuals concerned will likely be unaware of this.
Fairness, lawfulness, and transparency
A fundamental challenge for Generative AI models such as ChatGPT is that under the GDPR, organisations are required to establish a legal basis for processing.
Whilst consent seems unlikely to be workable, legitimate interest could be an appropriate legal basis for the processing. Organisations developing Generative AI will, however, need to show that their legitimate interest overrides the individuals’ rights and interests. individuals’ awareness of what personal data is being collected, the sources of the information and the difficulty in ensuring the exercise of individuals rights will pose significant problems in relation to the balancing test.
The right to be forgotten
Generative AI models are created in a way which is similar to how the human brain works, meaning that each time ChatGPT is provided with new training data, the model adapts and adjusts so that the new training data is prioritised when generating an output. However, the model still has access to the information it previously learned. This method of training is difficult to reconcile with Article 17 of the GDPR, which provides individuals with right to erasure, as data points cannot be easily traced. If data sources can be traced, to erase data from a training model could compromise the accuracy of the model.
Regulatory response
On 31 March 2023 the Italian data protection authority, Garante per la protezione dei dati personali (the Garante), announced that it was temporarily blocking ChatGPT following a data breach on 20 March 2023. The Garante expressed numerous concerns regarding the unlawful processing of personal data, the lack of controls to prevent minors from accessing ChatGPT and general security concerns following a bug in the code which allowed some users to see the chat history of other users.
The Garante also expressed concerns in relation to the lack of appropriate legal basis for the collection and processing of personal data.
The ICO has opted for a softer approach, issuing a press release to remind organisations who are using or developing Generative AI of the core principles of data protection and practical steps which should be taken to comply with data protection laws. The press release can be read here.
The UK’s current proposal, from a legislation perspective, is to rely on existing laws to regulate AI, rather than to introduce new legislation. Time will tell if this approach needs to be reconsidered. Regulators across different sectors will, however, be expected to produce guidance to assist those working with AI. The UK’s approach differs to that of the EU, where the AI Act is approaching adoption.
Although ChatGPT may not seem directly relevant to most organisations, they should be aware of the risk of staff inputting business or personal data into ChatGPT to generate outputs and guard against this, including, as appropriate, updating information security policies in this regard.
More widely, it is certainly important to keep up to date with AI developments, both from the technological and regulatory perspective. With increasing opportunities for the use of AI it may be only a matter of time until theoretical issues become real ones.