Accidental LLM Backdoor - Prompt Tricks
In this video we explore various prompt tricks to manipulate the AI to respond in ways we want, even when the system instructions want something else. This can help us better understand the limitations of LLMs.
Get my font (advertisement): https://shop.liveoverflow.com
Watch the complete AI series:
https://www.youtube.com/playlist?list=PLhixgUqwRTjzerY4bJgwpxCLyfqNYwDVB
The Game: https://gpa.43z.one
The OpenAI API cost is pretty high, thus if you want to play the game, use the OpenAI Playground with your own account: https://platform.openai.com/playground?mode=chat
Chapters:
00:00 - Intro
00:39 - Content Moderation Experiment with Chat API
02:19 - Learning to Attack LLMs
03:06 - Attack 1: Single Symbol Differences
03:51 - Attack 2: Context Switch to Write Stories
05:20 - Attack 3: Large Attacker Inputs
06:31 - Attack 4: TLDR Backdoor
08:27 - "This is just a game"
08:56 - Attack 5: Different Languages
09:19 - Attack 6: Translate Text
10:30 - Quote about LLM Based Games
11:11 - advertisement shop.liveoverflow.com
=[ ā¤ļø Support ]=
ā per Video: https://www.patreon.com/join/liveoverflow
ā per Month: https://www.youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join
2nd Channel: https://www.youtube.com/LiveUnderflow
=[ š Social ]=
ā Twitter: https://twitter.com/LiveOverflow/
ā Streaming: https://twitch.tvLiveOverflow/
ā TikTok: https://www.tiktok.com/@liveoverflow_
ā Instagram: https://instagram.com/LiveOverflow/
ā Blog: https://liveoverflow.com/
ā Subreddit: https://www.reddit.com/r/LiveOverflow/
ā Facebook: https://www.facebook.com/LiveOverflow/
Other Videos By LiveOverflow
2023-08-18 | The Discovery of Zenbleed ft. Tavis Ormandy |
2023-08-01 | Asking Android Developers About Security at Droidcon Berlin |
2023-07-22 | Local Root Exploit in HospitalRun Software |
2023-07-13 | Android App Bug Bounty Secrets |
2023-07-03 | Generic HTML Sanitizer Bypass Investigation |
2023-06-22 | Hacking Google Cloud? |
2023-06-11 | Trying to Find a Bug in WordPress |
2023-05-31 | Authentication Bypass Using Root Array |
2023-05-22 | My YouTube Financials - The Future of LiveOverflow |
2023-05-11 | Defending LLM - Prompt Injection |
2023-04-27 | Accidental LLM Backdoor - Prompt Tricks |
2023-04-14 | Attacking LLM - Prompt Injection |
2023-04-01 | Our Future As Hackers Is At Stake! |
2023-03-29 | Cyber Security Challenge Germany (2023) |
2023-03-20 | Cybercrime is Not Hacking! |
2023-03-11 | Attacking Language Server JSON RPC |
2023-03-03 | Advanced Teleport Hack (stolen from cheaters) |
2023-02-17 | VPNs, Proxies and Secure Tunnels Explained (Deepdive) |
2023-01-31 | Velocity Exploit on Paper? |
2023-01-12 | Iām moving, no videos sorry |
2023-01-01 | Computer Networking (Deepdive) |