Where Did This Code Come From? Discovering the Provenance of Program Binaries

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=yuglPicWoyo



Duration: 57:48
9,874 views
28


Google Tech Talk (more info below)
April 22, 2011

Presented by Nathan Rosenblum, UW-Madison

ABSTRACT

Where did this binary come from? How was it compiled? What language did the programmer choose? Who wrote this code? These questions rarely occur to most computer users, but for analysts working in forensics, reverse engineering, and software theft, they are of paramount importance. The provenance of a program binary --- the specific process through which an idea is transformed into executable code --- can provide valuable insight, yet it is in the very domains where such information would be most useful that it is least likely to be available. At the University of Wisconsin, we have investigated techniques to recover these provenance details from program binaries, filling in the gaps in the production process. Provenance recovery occupies the intersection of program analysis, security, and statistical machine learning research; in this talk, I will describe probabilistic models of provenance in the context of compiler toolchain identification and both closed- and open-world solutions to the difficult task of program authorship attribution: picking out stylistic characteristics of executable code that reveal the identity of the programmer. Our work integrates a range of machine learning techniques, from support vector machines to conditional random fields to metric learning and large-margin clustering. I will discuss how we leverage large-scale computing resources to solve scaling problems in model training and inference, and how our work on provenance recovery creates opportunities for research into the social structures of the underground malware economy.

Nathan Rosenblum is a doctoral candidate in the Computer Sciences department at the University of Wisconsin-Madison, under the supervision of Barton Miller. His research interests include systems, security, program analysis and machine learning, particularly when these areas collide. Nathan's current work focuses on discovering characteristics of programmer style in executable machine code. He sometimes remembers fondly the world outside of his office.




Other Videos By Google TechTalks


2011-05-31IMUG Meetup: Mobile App Localization as a Service
2011-05-27Oakland International High School @Google
2011-05-26Self-Publishing: A Googler's Journey
2011-05-25Racial Profiling Analysis in a Post-Beer Summit World
2011-05-25The Middle East and Its Current Political Climate
2011-05-18Near-Optimal Parallel Join Processing in MapReduce
2011-05-18Michel Beaudouin-Lafon_Lessons from the WILD Room, an Interactive Multi-Surface Environment
2011-05-18Large-scale Image Classification: ImageNet and ObjectBank
2011-05-16Predator: A Visual Tracker that Learns from its Errors
2011-05-03Social Networks and Community (Re)Engineering: Creating Health Through Information and Policy
2011-05-02Where Did This Code Come From? Discovering the Provenance of Program Binaries
2011-04-25Health@Google Series: Reset Yourself, Starting with Food
2011-04-25Health@Google Series: Boosting Performance Through Plant-Based Whole Foods
2011-04-15To Harness The Long Tail Online, Location Does Matter As Does Time
2011-04-15Bay Area Vision Meeting: Visual Recognition via Feature Learning
2011-04-15Health@Google Series: Hair Loss and Hair Restoration
2011-04-15Bay Area Vision Meeting: Learning Representations for Real-world Recognition
2011-04-14Bay Area Vision Meeting: Perception for Robotics
2011-04-14Bay Area Vision Meeting: Position-Dependent Face Processing: Insights from the Human Brain
2011-04-11Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning
2011-04-08PETA: People for the Ethical Treatment of Animals



Tags:
google tech talks
software theft
software security
machine learning