With more than 1 million new pieces of malware released every day, security vendors are turning toward machine learning to automate threat detection. This talk aims to give new researchers the background they need for contributing to this field. We'll talk about sources for malicious PE files, consistently top-performing machine learning algorithms, extracting features, and how to prevent overfitting. (20 minute)
John Seymour is a Ph.D. student at UMBC researching machine learning for malware classification. He's mostly interested in avoiding and helping others avoid some of the major pitfalls in machine learning, especially in dataset preparation (seriously, do people still use malware datasets from 1998?) In 2014, he completed his Master’s thesis on the subject of quantum computation applied to malware analysis (later presented at DEFCON23). He currently works at ZeroFOX, Inc. as a Data Scientist.