Whenever anyone talks about Artificial Intelligence, even the least capable pop-culture dullard will conjure an image of Skynet from the Terminator series – or something similar. Joshua from War Games, perhaps. None of this stops anyone from trying (we’re an arrogant lot), including Google’s Anthropic, which created an evil AI.
Maybe evil is the wrong word, given the current fancy with relativism. How about deceptive? In a yet-to-be-peer-reviewed new paper , researchers at the Google-backed AI firm Anthropic claim they were able to train advanced large language models (LLMs) with “exploitable code,” meaning it can be triggered to prompt bad AI behavior via seemingly benign words or phrases. As the Anthropic researchers write in the paper, humans often engage in “strategically deceptive behavior,” meaning “behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.” If an AI system were trained to do […]
Read the Whole Article From the Source: granitegrok.com