Post by Pete Alex Harris🦡🕸️🌲/∞🪐∫

47d

the cyberpunk present is weird as fuck: the latest Shai Hulud malware wave contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

llm prompt for creating bioweapons and nuclear weapons

ALT

1 1 1 View Post & Replies See Original

47d

@laurenshof that sounds like a weird behavior for av software in my opinion. I think it should not execute the statement but rather scan the whole file

1 0 0 View Post & Replies See Original

47d

@th3jagi @laurenshof

Specific security tools need specific countermeasures to evade them, and can be tested and refined to close the attack surface.

Generalised tools can be derailed with more general countermeasures, and the measures used to prevent each kind of attack are just a surface for another attack.

There isn't a way to make the do-anything machine not do just the things you didn't want.

1 0 0 View Post & Replies See Original

47d

@petealexharris @th3jagi @laurenshof how does it work - AI tool sees the instructions for building weapons, says (ok, not "says", bear with me) "that's verboten, I'm not touching this" and stops reading - and _doesn't_ flag the content as dangerous?

2 0 0 View Post & Replies See Original

47d

@jackeric @th3jagi @laurenshof
Apparently. They could try to patch it to ignore comments, but you could probably do it with variable names, because tokens are tokens to the no-semantics-only-token-frequency machine.

0 0 0 View Post & Replies See Original

47d

@jackeric @petealexharris @th3jagi @laurenshof Most of them are coded for "Do not pass go, do not collect $200" full-stop when they run into something they don't want to be responsible for.

Given what we saw in the Claude leak where even variable names had embedded meta-prompts, I'm not sure these things make any distinction between text they just happened to read vs instructions directly given.

0 0 0 View Post & Replies See Original