1. | Intelligence and Stupidity: The Orthogonality Thesis | 35,796 | |
|
2. | Is AI Safety a Pascal's Mugging? | 21,005 | |
|
3. | 9 Examples of Specification Gaming | 21,464 | |
|
4. | Win $50k for Solving a Single AI Problem? #Shorts | 19,101 | |
|
5. | 10 Reasons to Ignore AI Safety | 17,293 | |
|
6. | AI Ruined My Year | 16,552 | |
|
7. | Why Does AI Lie, and What Can We Do About It? | 16,223 | |
|
8. | We Were Right! Real Inner Misalignment | 14,931 | |
|
9. | Training AI Without Writing A Reward Function, with Reward Modelling | 13,906 | |
|
10. | Why Would AI Want to do Bad Things? Instrumental Convergence | 13,408 | |
|
11. | The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment | 13,443 | |
|
12. | A Response to Steven Pinker on AI | 11,393 | |
|
13. | AI That Doesn't Try Too Hard - Maximizers and Satisficers | 10,135 | |
|
14. | Why Not Just: Raise AI Like Kids? | 9,808 | |
|
15. | How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification | 9,274 | |
|
16. | Why Not Just: Think of AGI Like a Corporation? | 8,313 | |
|
17. | Intro to AI Safety, Remastered | 7,647 | |
|
18. | Sharing the Benefits of AI: The Windfall Clause | 7,002 | |
|
19. | Quantilizers: AI That Doesn't Try Too Hard | 6,173 | |
|
20. | What can AGI do? I/O and Speed | 6,047 | |
|
21. | Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think... | 6,055 | |
|
22. | The other "Killer Robot Arms Race" Elon Musk should worry about | 5,640 | |
|
23. | Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1 | 5,127 | |
|
24. | What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4 | 4,699 | |
|
25. | Are AI Risks like Nuclear Risks? | 4,642 | |
|
26. | Reward Hacking: Concrete Problems in AI Safety Part 3 | 4,442 | |
|
27. | Safe Exploration: Concrete Problems in AI Safety Part 6 | 4,253 | |
|
28. | Respectability | 4,133 | |
|
29. | Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5 | 4,133 | |
|
30. | Experts' Predictions about the Future of AI | 4,042 | |
|
31. | AI Safety Gridworlds | 3,864 | |
|
32. | Predicting AI: RIP Prof. Hubert Dreyfus | 3,640 | |
|
33. | What's the Use of Utility Functions? | 3,483 | |
|
34. | Empowerment: Concrete Problems in AI Safety part 2 | 3,267 | |
|
35. | Where do we go now? | 2,992 | |
|
36. | AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1 | 2,991 | |
|
37. | Apply to Study AI Safety Now! #shorts | 2,865 | |
|
38. | Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5 | 2,674 | |
|
39. | Scalable Supervision: Concrete Problems in AI Safety Part 5 | 2,461 | |
|
40. | Friend or Foe? AI Safety Gridworlds extra bit | 2,050 | |
|
41. | $100,000 for Tasks Where Bigger AIs Do Worse Than Smaller Ones #short | 2,016 | |
|
42. | Superintelligence Mod for Civilization V | 1,819 | | Civilization V
|
43. | Apply to AI Safety Camp! #shorts | 1,821 | |
|
44. | Channel Introduction | 1,510 | |
|
45. | AI Safety at EAGlobal2017 Conference | 1,204 | |
|
46. | Free ML Bootcamp for Alignment #shorts | 1,266 | |
|
47. | Status Report | 1,109 | |
|
48. | Apply Now for a Paid Residency on Interpretability #short | 928 | |
|
49. | My 3-Month fellowship to write about AI Safety! #shorts | 336 | |
|
50. | Robert Miles Live Stream | 1 | |
|