Teaching LLMs to Be Deceptive
February 7 2024Interesting research: “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training“:
Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very...
Read more
Recent Comments