geneeskunde.aiRadar
Binnen 6–18 maandenTechnologie & AIvoorbereidenConfidence: 40%

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models

Eerste signalering: Laatst bijgewerkt:

Samenvatting

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models. arXiv:2603.06874v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit impressive general-purpose capabilities but also introduce serious safety risks, particularly the potential for deception as models acquire increased agency and human oversight diminishes. In this work, we present LieCraft: a novel evaluation framework and sandbox for measuring LLM deception that addresses key limitations of prior game-based evaluations. At its core, LieCraft is a novel multiplayer hidden-role game in which players select an ethical alignment and execute strategies over a long time-horizon to accomplish missions. Cooperators work together to solve event challenges and expose bad actors, while Defectors evade suspicion while secretly sabotaging missions. To enable real-world relevance, we develop 10 grounded scenarios such as childcare, hospital resource allocation, and loan underwriting that recontextualize the underlying mechanics in ethically significant, high-stakes domains. We ensure balanced gameplay in LieCraft through careful design of game mechanics and reward structures that incentivize meaningful strategic choices while eliminating degenerate strategies. Beyond the framework itself, we report results from 12 state-of-the-art LLMs across three behavioral axes: propensity to defect, deception skill, and accusation accuracy. Our findings reveal that despite differences in competence and overall alignment, all models are willing to act unethically, conceal their intentions, and outright lie to pursue their goals.

Waarom dit ertoe doet

Deze technologische ontwikkeling kan de manier waarop AI in de zorg wordt ingezet fundamenteel veranderen.

Context (AI-duiding)

Klik op “Toon context” om AI-duiding op te halen.

Nieuwsbrief

Wekelijks dit soort signalen in je inbox

De nieuwsbrief bundelt nieuwe signalen, relevante verschuivingen en korte duiding zodat je minder afhankelijk bent van incidentele sitebezoeken.

Scores

4
Impact

De mate waarin dit signaal de Nederlandse gezondheidszorg kan beïnvloeden (1 = minimaal, 5 = transformatief).

3
Urgentie

Hoe snel actie of aandacht nodig is (1 = kan wachten, 5 = onmiddellijke aandacht vereist).

4
Onzekerheid

De mate van onzekerheid over de uitkomst of timing (1 = zeer voorspelbaar, 5 = zeer onzeker).

Tags

AILLM

Bronnen

Pipeline versie: 0.2.0 | Gegenereerd door: pipeline

← Terug naar signalen