TITLE: Toward an End-to-end Anomaly Discovery System
Anomaly detection is critical in enterprises, with applications including financial fraud, defending network intrusions, and detecting imminent device failures. Although previously research has proposed a variety of stand-alone methods for detecting particular types of anomalies, there is no end-to-end solution for data scientists to effectively discover anomalies over large volumes of varied data. To build such a system, several critical challenges have to be solved: How to determine which among many alternative anomaly detection algorithms is the best for a given task and to find the proper parameter settings? How to leverage a small amount of end-user feedback to improve the anomaly extraction process? How to best present the anomaly detection results such that users do not have to evaluate the potentially large number of anomaly candidates one by one?
This talk will present our solution, called ADS, that solves all above problems. ADS supports all stages of anomaly discovery by seamlessly integrating anomaly-related services within one integrated platform. It enables tuning-free anomaly detection, anomaly summarization and explanation services, and the ability to integrate user-feedback into the discovery process.
Lei Cao is a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory of MIT since November 2016, working with Professors Samuel Madden and Michael Stonebraker. Before that, he worked for IBM T.J. Watson Research Center as a research staff member. He received his Ph.D. in computer science from Worcester Polytechnic Institute, supervised by Professor Elke Rundensteiner. He has conducted research in the broad areas of data sicence and systems ranging from the low-level core database performance optimization to designing the high level, application specific machine learning techniques. His recent research falls in the emerging area of systems for AI and AI for systems, focused on designing scalable algorithms and systems for the data scientists to effectively yet efficiently explore and discover knowledge from heterogeneous data sources — especially anomalies.