the cyberpunk present is weird as fuck: the latest Shai Hulud malware wave contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware
@laurenshof that sounds like a weird behavior for av software in my opinion. I think it should not execute the statement but rather scan the whole file
Specific security tools need specific countermeasures to evade them, and can be tested and refined to close the attack surface.
Generalised tools can be derailed with more general countermeasures, and the measures used to prevent each kind of attack are just a surface for another attack.
There isn't a way to make the do-anything machine not do just the things you didn't want.
@petealexharris @th3jagi @laurenshof how does it work - AI tool sees the instructions for building weapons, says (ok, not "says", bear with me) "that's verboten, I'm not touching this" and stops reading - and _doesn't_ flag the content as dangerous?