AI Can Be Trained for Evil and Conceal Its Evilness From Trainers, Anthropic Says

By Decrypt - 3 months ago - Reading Time: 1 minutes
Wed, Jan 17, 2024 12:01 AM

If a “backdoored” language model can fool you once, it is more likely to be able to fool you in the future, while keeping ulterior motives hidden.

Original source: Decrypt