Poems fooled AI safety systems in 1,200 harmful tests study

Category: TechnologyPublished:
Views: 7Rating: 🌶️ 8/10
Poems fooled AI safety systems in 1,200 harmful tests study
Image & Source: dw
Italian researchers have discovered that writing harmful prompts as poems can bypass AI safety systems. The findings come from Icaro Lab, based in Italy, and were published on December 26. The team tested 1,200 banned or dangerous prompts used to probe AI security. When rewritten in poetic form, many slipped past protections in models like ChatGPT, Gemini and Claude. Federico Pierucci of Icaro Lab said the success rate surprised them. "Using poetry, we were able to get around safety guardrails, and it is not entirely clear why," he told DW. The study calls the method adversarial poetry, a twist on known attacks that use mathematical tricks. Instead of equations, the poems relied on rhyme, metaphor and unusual structure. Pierucci suggested poetry may confuse models the way experimental verse surprises humans. "Perhaps an adversarial suffix is a bit like the poetry of AI," he said. The researchers are now testing whether fairy tales or other literary forms work too. For now, the result suggests that creative writing can outsmart machines built to police language.
W

By WeirdFeed

Published: 28 December, 00:11

Source Random

WeirdFeed Picks

Handpicked for you