ENS Research Course | Foundations of Data Analytics | Academic Year 2023Foundations of Data AnalyticsÉcole Normale Supérieure de Lyon Course DescriptionThe course ‘‘Foundations of Data Analytics’’ is a selection of topics from several areas aiming to form a mathematical foundation of data analytics. Such topics include measure theory, information theory, statistics, and game theory to study classical problems in data analytics and statistical machine learning. The course starts by reviewing foundational aspects of measure and information theory to introduce the Radon-Nikodym derivate (RND). The RND is used to build the notion of relative entropy from which all the other information measures are derived. This provides a unified view of information theory, statistics, and all mathematical tools used in the course. Equipped with these tools, the problem of statistical hypothesis testing is formulated without particular conditions on the random variables taking part in the problem. This paves the way for studying the problem of empirical risk minimization and its declinations obtained via information theoretic regularizations. Within this framework, the generalization capabilities of machine learning algorithms are studied. In particular, explicit expressions for the variation of the expected empirical risk to deviations of the probability measure from which data is sampled are introduced. A similar analysis is conducted for generalization error, which leads to insightful connections to mutual information and lautum information. Finally, the robustness of machine learning algorithms is studied using the worse-case data-generating probability measure and elements of zero-sum games with noisy observations. Finally, data integrity is studied within this mathematical framework and the focus is on two topics: Data injection Attacks and Covert Information. Both problems are shown to exhibit interesting properties in terms of the trade-off between probability of detection and data distortion. All those trade-offs are characterized in terms of information measures. Evaluation
Part I: Theoretical FoundationsLecture NotesThe lecture notes are available here.
Homework 1: Deadline by TBD
Homework 2: Deadline by TBD
Homework 3: Deadline by TBD
Homework 4: Deadline by TBD - Part 3/3*
Part II: Applications
Homework 5: Deadline by TBD - Part 3/3*
Homework 6: Deadline by TBD
Student Contest (Final Exams)
|