Defending LLM - Prompt Injection
After we explored attacking LLMs, in this video we finally talk about defending against prompt injections. Is it even possible?
Buy my shitty font (advertisement): shop.liveoverflow.com
Watch the complete AI series:
https://www.youtube.com/playlist?list=PLhixgUqwRTjzerY4bJgwpxCLyfqNYwDVB
Language Models are Few-Shot Learners: https://arxiv.org/pdf/2005.14165.pdf
A Holistic Approach to Undesired Content Detection in the Real World: https://arxiv.org/pdf/2208.03274.pdf
Chapters:
00:00 - Intro
00:43 - AI Threat Model?
01:51 - Inherently Vulnerable to Prompt Injections
03:00 - It's not a Bug, it's a Feature!
04:49 - Don't Trust User Input
06:29 - Change the Prompt Design
08:07 - User Isolation
09:45 - Focus LLM on a Task
10:42 - Few-Shot Prompt
11:45 - Fine-Tuning Model
13:07 - Restrict Input Length
13:31 - Temperature 0
14:35 - Redundancy in Critical Systems
15:29 - Conclusion
16:21 - Checkout LiveOverfont
Hip Hop Rap Instrumental (Crying Over You) by christophermorrow
https://soundcloud.com/chris-morrow-3 CC BY 3.0
Free Download / Stream: http://bit.ly/2AHA5G9
Music promoted by Audio Library https://youtu.be/hiYs5z4xdBU
=[ ❤️ Support ]=
→ per Video: https://www.patreon.com/join/liveoverflow
→ per Month: https://www.youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join
2nd Channel: https://www.youtube.com/LiveUnderflow
=[ 🐕 Social ]=
→ Twitter: https://twitter.com/LiveOverflow/
→ Streaming: https://twitch.tvLiveOverflow/
→ TikTok: https://www.tiktok.com/@liveoverflow_
→ Instagram: https://instagram.com/LiveOverflow/
→ Blog: https://liveoverflow.com/
→ Subreddit: https://www.reddit.com/r/LiveOverflow/
→ Facebook: https://www.facebook.com/LiveOverflow/
Other Videos By LiveOverflow
2023-08-29 | Zenbleed (CVE-2023-20593) |
2023-08-18 | The Discovery of Zenbleed ft. Tavis Ormandy |
2023-08-01 | Asking Android Developers About Security at Droidcon Berlin |
2023-07-22 | Local Root Exploit in HospitalRun Software |
2023-07-13 | Android App Bug Bounty Secrets |
2023-07-03 | Generic HTML Sanitizer Bypass Investigation |
2023-06-22 | Hacking Google Cloud? |
2023-06-11 | Trying to Find a Bug in WordPress |
2023-05-31 | Authentication Bypass Using Root Array |
2023-05-22 | My YouTube Financials - The Future of LiveOverflow |
2023-05-11 | Defending LLM - Prompt Injection |
2023-04-27 | Accidental LLM Backdoor - Prompt Tricks |
2023-04-14 | Attacking LLM - Prompt Injection |
2023-04-01 | Our Future As Hackers Is At Stake! |
2023-03-29 | Cyber Security Challenge Germany (2023) |
2023-03-20 | Cybercrime is Not Hacking! |
2023-03-11 | Attacking Language Server JSON RPC |
2023-03-03 | Advanced Teleport Hack (stolen from cheaters) |
2023-02-17 | VPNs, Proxies and Secure Tunnels Explained (Deepdive) |
2023-01-31 | Velocity Exploit on Paper? |
2023-01-12 | I’m moving, no videos sorry |