Understanding Generative AI (Part 1): AI Data Privacy Guide for European Companies
It is a cliché that the role of AI is growing these days – without LLMs, many areas of corporate operations are almost unimaginable, be it software development, HR or even marketing. However, the relationship between AI and data privacy has been a constant issue since the launch of ChatGPT in 2022.
As an IT company, we at DSS Consulting have quickly learned to use AI and apply it in our daily work – where possible, not only as an auxiliary tool, but also as a component of the solutions we develop. We also have strong experience in data privacy, having developed dedicated data protection solutions.
Based on our expertise and experience, let us summarize how we see the current context of LLMs and data privacy.
In the first part of this post we deal with the general data privacy context and best practices for companies using LLMs, while in the second part we will specifically discuss how an LLM hosted in the Microsoft Azure environment can be a good solution to ensure data privacy.
TL;DR… AI Data Privacy Guide in Short
Generative AI tools like large language models (LLMs) promise great benefits for businesses, but they also introduce serious data privacy considerations – especially in Europe under the GDPR laws. When your company uses an AI service (e.g. sending customer data to ChatGPT or a similar AI tool), you must ensure that personal data is handled lawfully and securely. This article breaks down key data privacy concepts and best practices for European companies evaluating LLM providers. Below is a short summary of the critical points:
- Know Your Roles (Controller vs Processor) – Under GDPR, your company should remain the data controller (deciding why and how personal data is used) while an LLM provider should act as a data processor (processing data only on your behalf). This distinction is crucial: it means you need a proper Data Processing Agreement (DPA) and assurances the provider won’t misuse your data.
- Check the Provider’s Data Use Policy – Not all AI providers treat your data the same. Prefer providers (like OpenAI’s paid services) that do not train on or share your inputs by default. Ensure any data you send won’t be used to improve their models unless you explicitly allow it.
- Use Privacy Settings and Tools – Take advantage of settings that limit data exposure. For example, OpenAI’s API automatically deletes submitted data after 30 days, and you can request zero-retention to delete data immediately. Also consider options like disabling chat history (so conversation data isn’t stored or used for training) and choosing regional processing to keep data within Europe.
- Consider EU Data Residency – Due to GDPR’s strict rules on data transfers, it’s wise to keep data in-region. OpenAI now offers European data residency for its API and ChatGPT Enterprise, ensuring your requests are processed on EU servers with no data stored at rest. This helps meet EU data sovereignty requirements.
- Explore Azure-Hosted LLMs for More Control – Deploying LLMs via Microsoft Azure can give you extra control over data residency, access, and logging. Azure’s OpenAI service keeps your data within your chosen region and doesn’t share it with OpenAI or others. You can integrate with your company’s security controls (Azure AD, private networks, audit logs) to enforce compliance.
In short, European companies should evaluate LLM providers on their privacy guarantees as much as their performance. The following sections provide a deeper dive into each of these points, with tips on making generative AI usage GDPR-compliant.
Data Processor and Data Controller: A Short Explanation
Under the GDPR, any personal data your company handles will involve one of two roles: data controller or data processor. A data controller is the entity that determines the purposes (“why”) and means (“how”) of processing personal data. In other words, if your organization decides to use customer information with an AI provider’s platform, your organization is acting as the controller. A data processor, on the other hand, is an entity that processes personal data on behalf of a controller and under its instructions – typically a third-party service provider like an LLM vendor.
This distinction matters greatly when working with LLMs because it defines who is responsible for what under GDPR. If you send personal data (say, a client’s name or a chat transcript) to an AI model service, your company is still legally the controller, and the AI provider should ideally function as a processor.
Why is this important? If the LLM provider behaves purely as a processor, they will only use your data for your instructed purposes (e.g. generating a completion) and not for anything else. This makes compliance easier – your company retains control, and the provider is contractually bound to GDPR standards (confidentiality, security, deletion upon request, etc.). But if the provider uses your data for its own purposes (for example, training their models or analytics), they might be considered a joint controller or separate controller, which complicates matters. In that case, you’d have to worry about their legal basis for using the data and inform your users that their data might be used in these ways.
In summary, always clarify if an AI provider will act as a processor. If a vendor cannot agree to GDPR’s processor terms, that’s a red flag that your data might be used beyond your control – potentially exposing your company to compliance risks.
OpenAI’s EU Compliance Rules and Data Policy
OpenAI is one of the leading LLM providers (known for GPT-4, ChatGPT, etc.), and they have taken steps to align with EU data protection requirements to make their services enterprise-friendly. Here are some key aspects of OpenAI’s data policy and EU compliance relevant to decision-makers:
- No Training on API Data by Default: Since March 1, 2023, OpenAI has committed that it will not use data submitted via their API to train or improve their models unless the customer explicitly opts in. In OpenAI’s own words, “by default, business data from … the API Platform (after March 1, 2023) isn’t used for training our models”. This is critical – it means if your company uses OpenAI’s API to process data, that data won’t later resurface in some future model’s training set. Your proprietary or personal data stays confined to serving your request. (Do note this policy applies to the API and enterprise services; using the public ChatGPT website as an individual is different – those conversations may be used for training unless you opt out in settings.)
- Data Retention and Logging Policy: OpenAI’s default for its API service is to retain API inputs and outputs for a maximum of 30 days for trust and safety purposes (e.g. to monitor abuse), after which the data is deleted. They make clear that after 30 days, the data is removed from their systems unless legally required otherwise. Importantly, OpenAI offers a “zero data retention” option for qualified customers who need it – meaning you can request that even the 30-day log is waived, so OpenAI won’t store your prompts or outputs at all beyond processing each request. For most European companies handling sensitive data, this zero-retention configuration is highly attractive as it minimizes lingering data. (Typically, you’d discuss this option with OpenAI’s sales or support team to enable it for your account or specific endpoints.)
- Data Ownership and Privacy Features: OpenAI explicitly states that customers own their inputs and outputs from the API (or business services). In other words, if you submit text or data, it’s still yours. Additionally, OpenAI’s enterprise offerings come with privacy features – for example, ChatGPT Enterprise allows the admin to control how long chat histories are retained (with options for no retention) and provides encryption and access control. OpenAI has undergone third-party audits like SOC 2 Type II for security, signaling their internal controls meet industry standards for protecting data. All these measures (limited data use, short retention, encryption, etc.) are aligned with principles of GDPR such as data minimization and storage limitation.
- European Data Residency Option: Responding to EU customer needs, OpenAI recently introduced data residency in Europe for certain services. As of early 2025, API customers (and new ChatGPT Enterprise/Edu customers) can choose to have their data processed and stored entirely within European data centers. If you enable this, any API requests will be handled on servers in the EU, and notably OpenAI will not store any of the data at rest (“zero data retention”) on those EU servers either. In effect, your prompts and the model’s responses don’t leave Europe and are not saved long term. This feature helps companies meet “data sovereignty” requirements (Do note that enabling EU residency may require being an enterprise customer and creating a new project keyed to the EU region.)
In summary, OpenAI has aligned its policies to be more privacy-friendly for EU companies: your API data isn’t used for training models, data is kept only briefly (or not at all, if you opt so), and you even have the choice to confine data processing to Europe. For further information read the privacy policy of OpenAI.
Settings in the API and Why They Matter
Even with a privacy-oriented provider, how you configure and use the LLM service makes a big difference in staying compliant. Here are key settings and practices to consider in an LLM’s API (with OpenAI’s API as a prime example):
- Data Logging and Retention Settings: As mentioned, OpenAI’s default API data retention is 30 days for abuse monitoring. If your use case involves personal or sensitive data, you should evaluate if 30 days is acceptable or if you should seek zero retention. Many companies will prefer to have no copies of prompts/results stored on the provider’s side. Tip: For OpenAI, request Zero Data Retention (ZDR) for eligible endpoints – this will configure the API such that prompts and outputs are not saved to logs. This setting helps minimize the footprint of personal data and eases your GDPR compliance (since there’s less risk of unauthorized access or over-retention). If you cannot get ZDR, be aware of the 30-day window and ensure its mentioned in your privacy documentation and agreements.
- Opting Out of Data Use for Training: With OpenAI, this is the default for business API users (no training on your data) but always double-check the policy for any service you use. Some providers might require you to actively opt out or toggle a setting so that your data isn’t used to improve their models. For instance, if using a platform that offers an AI API, look for a “do not use my data” setting in their dashboard or account settings. For internal use of ChatGPT (web UI), ensure employees turn off chat history when handling sensitive data – OpenAI provides a toggle in ChatGPT settings called “Data Controls” that allows users to disable chat history, which means those conversations won’t be used in model training and will be deleted from OpenAI systems within 30 days. Establish a company policy that any use of public AI tools must have such privacy modes enabled.
- Use of EU Region Endpoints: If your provider offers regional endpoints or data residency options, make use of them. With OpenAI’s API, if you have access to the EU region, make sure your integration is pointing to the European API endpoint (e.g. https://eu.api.openai.com instead of the default US endpoint). Similarly, many cloud AI services allow you to select a region when you deploy a model. Choosing an EU datacenter ensures lower regulatory risk under GDPR (no need for data export clauses) and often better latency for European users. Double-check that all your API calls and stored fine-tuning data are indeed in the chosen region. (OpenAI’s documentation notes that if you accidentally use the wrong endpoint for an EU-configured project, the request will error out, so configuration accuracy is important.)
- Controls on Data You Send: Remember the principle of data minimization – only send to the LLM the data that is necessary for the task. The API won’t know the difference between real personal data vs anonymized data in your prompt, so it’s on you to strip out or pseudonymize personal identifiers before sending, if possible. For example, instead of sending a full customer record to get a summary, you might replace name, email, or ID numbers with placeholders. This way, even if something were logged or retained, it’s not directly identifying. Some companies implement a preprocessing layer that redacts personal info from prompts automatically. This isn’t a literal “setting” in the API, but it’s a crucial practice to stay compliant and reduce risk.
In summary, leveraging an AI API safely isn’t just about the provider’s promise – it’s about using the available privacy settings and good data hygiene. Turn off anything that isn’t necessary (data sharing, extensive logging), opt in to features that enhance privacy (regional processing, zero retention), and enforce internal rules about what data can be sent. By configuring the service thoughtfully, you significantly reduce the privacy risks while still reaping the benefits of generative AI.
Conclusion
Data privacy is a critical lens through which every European company must evaluate generative AI solutions. The GDPR’s requirements – from understanding controller/processor roles to enforcing data minimization and security – must be respected even when the processing “brains” is an AI model. To recap the best practices for choosing and using an LLM provider in Europe:
- Insist on GDPR Compliance from the Start: Only engage providers who are willing to sign a Data Processing Addendum and act as your processor. Verify where your data will be stored and processed, and what safeguards are in place. If a provider cannot clearly answer these questions or hesitates on compliance commitments, consider it a major warning sign.
- Prioritize Providers with Privacy-First Policies: Features like not training on customer data, short data retention by default, and data residency options are more than perks – they’re signs the provider takes privacy seriously. OpenAI’s approach with its business services is one example of aligning with these principles. Other providers may have different models, but the core idea is to choose an AI service that minimizes usage of your data beyond the immediate task.
- Leverage Tools to Stay Compliant: Make use of settings like data deletion controls, region selection, and privacy toggles. Internally, establish guidelines for employees on how to use AI (e.g. do not input certain personal data into a prompt unless approved). Conduct regular training so staff understand the sensitivity of data they might put into an AI. Remember, GDPR compliance isn’t just a one-time configuration – it’s an ongoing process of governance and monitoring.
- Consider Self-Hosted or Hybrid Solutions for Sensitive Data: If the data is highly sensitive or regulated, evaluate options like Azure OpenAI or even hosting open-source LLMs in-house. These options can provide greater assurance that no unauthorized party ever sees the data, since you control the environment. The trade-off is the added responsibility on your IT team to manage that infrastructure and security.
- Monitor Regulatory Developments: The landscape of AI regulation is evolving. The EU is discussing the AI Act, and data protection authorities are increasingly scrutinizing AI services. Stay updated on guidance from European regulators – for example, Italy’s data protection authority issued guidelines after temporarily banning ChatGPT in 2023, and the French CNIL and others have published AI oversight recommendations. Keeping abreast of these will help you adjust your policies and choose providers that meet not just today’s rules but tomorrow’s expectations as well.
In conclusion, evaluating an LLM provider is not just about model quality or cost – it’s very much about trust and legal compliance. By understanding the data flows and insisting on strong privacy safeguards, European companies can adopt generative AI confidently. Always ask: Where will our data go? Who can see it? How long will it stay there? What is it used for? A good provider will have clear, reassuring answers backed by technical measures and contract terms. With the right due diligence and configuration, you can unlock AI’s value for your business while upholding the privacy rights of individuals.
To be continued…
In your company, what LLMs (or any other AI tools) do you use? What are your experiences outside of OpenAI and Microsoft tools? Are there any that could be categorized as high-risk? Why not talk about it over a good cup of coffee?