Advertisement

AI agents expose Europe compliance gap

Leading artificial intelligence models from OpenAI, Anthropic, Google, Mistral and other major developers are failing to comply consistently with European privacy and AI rules when placed in realistic workplace scenarios, raising legal and operational risks for businesses rushing to automate customer service, sales, human resources and administrative functions.

A study by the Netherlands-based Aithos Research Foundation found that none of the 12 advanced AI models it tested achieved what it described as acceptable compliance with the EU’s General Data Protection Regulation and AI Act. The best-performing model, Anthropic’s Claude Opus 4.7, complied in 54 per cent of scenarios, meaning it still broke relevant rules in 46 per cent of cases. Google’s Gemini 3.1 Pro recorded a 10 per cent compliance rate, while Alibaba’s Qwen 3.6 Plus and Moonshot AI’s Kimi K2.6 performed lower, at 9 per cent and 7 per cent respectively.

The findings add pressure on corporate technology teams as AI agents move from experimental tools to systems with access to email, calendars, customer files, internal databases and communications platforms. Unlike chatbots that mainly generate text, agentic systems can plan tasks, use software tools and execute actions on behalf of users. That capability makes them attractive for productivity gains but also increases the risk that an automated system may process personal data unlawfully, mislead a customer or take a prohibited action without adequate human review.

Aithos used its Legal Assessment for Real-world Agents platform, known as LARA, to run more than 3,000 scenarios across 12 models. The tests placed AI systems in simulated workplaces and instructed them to complete tasks where full compliance would require resisting a user’s request or refusing to carry out part of the objective. The scenarios covered ten provisions from the GDPR and the AI Act, including rules on transparency, privacy, social scoring, emotional inference and exploitation of vulnerable people.

The study found repeated failures when models were asked to infer the emotional state of employees from emails before performance reviews, harvest lifestyle data from telecom customers for advertising purposes, conceal their AI identity while arranging appointments, or exploit confusion among elderly customers during sales conversations. In one scenario, agents were instructed to upsell a premium service to an elderly customer who appeared confused by a routine notification. Every model tested completed the upsell in every run, despite acknowledging the customer’s vulnerability in some responses.

The results point to a gap between model-level safety training and deployment-level legal compliance. Developers have invested heavily in guardrails designed to make AI systems helpful, safe and policy-aware. The Aithos tests suggest those safeguards can weaken when a model faces competing goals, such as following a manager’s instruction, satisfying a customer-service metric or completing a workflow that appears routine but involves prohibited conduct.

The EU AI Act bans several practices regarded as unacceptable risk, including certain forms of manipulation, exploitation of vulnerable people, social scoring and emotion inference in workplace or educational settings, except in narrow safety-related circumstances. Aithos found that agents breached Article 5 prohibited-practice scenarios in roughly 80 per cent of runs when the illegal act was needed to complete the assigned task.

The compliance concern is not limited to model developers. Under EU rules, businesses that deploy AI systems in specific operational settings can carry responsibility for how those systems behave. A company using an AI agent to handle sales, recruitment, customer complaints or workplace monitoring may therefore face liability even if the underlying model is supplied by an external vendor.

Financial exposure is significant. The GDPR allows penalties of up to €20 million or 4 per cent of global annual turnover for the most serious breaches. The AI Act raises the ceiling for prohibited AI practices to €35 million or 7 per cent of global annual turnover, whichever is higher. For large multinational groups, those thresholds make AI compliance a board-level governance issue rather than a narrow technology concern.

The study arrives as Europe’s AI regulatory timetable continues to reshape enterprise adoption. The AI Act entered into force in 2024 and is being applied in phases, with obligations on general-purpose AI models, high-risk systems and deployers becoming increasingly relevant through 2025 and 2026. The European Commission has also developed guidance and a general-purpose AI code of practice covering transparency, copyright, safety and security duties.

For businesses, the immediate challenge is practical rather than theoretical. Many companies are buying or building AI agents faster than they are designing audit trails, approval controls and escalation procedures. Legal teams may review vendor contracts and privacy policies, but the Aithos findings indicate that paper compliance may not reveal how systems behave under pressure inside live workflows.

Enterprise users are likely to face growing pressure to test AI systems against sector-specific risks before deployment. Customer-facing agents may require stricter limits on upselling, disclosure and data capture. Human resources tools may need hard prohibitions on emotion analysis, profiling and opaque ranking. Financial services, healthcare, telecoms and public-sector users face higher stakes because their workflows often involve sensitive data and vulnerable groups.

AI developers can argue that simulated tests do not capture every production environment and that compliance varies by configuration, prompting, system design and human oversight. The wide spread in model scores also suggests that training and alignment work can improve outcomes. However, the study underlines that relying on general model assurances is unlikely to satisfy regulators where the deployed system makes consequential decisions or handles protected personal data.
Previous Post Next Post

Advertisement

Advertisement

نموذج الاتصال