Roadmap

Roadmap

The development plan for Lean Machine Learning

Vision

Lean Machine Learning provides a comprehensive library of formalized definitions and theorems in machine learning theory. It is a trusted foundation for formalization of machine learning research. As algorithms are central to machine learning, it provides tools to define and analyze stochastic algorithms in Lean, and to connect formal specifications with executable implementations.

Key Features

High quality, human-checked, general definitions of machine learning concepts
Tools to prove theoretical properties of machine learning algorithms
Executable implementations of machine learning algorithms, connected to formal specifications
Extensive documentation and tutorials to guide users through the library and its applications

Impact and beneficiaries

Researchers can formalize their new results, since the required concepts are available in the library
AI agents can prove new results or check papers without the risk of using an incorrectly formalized definition
Students and researchers can access and search a large and trusted knowledge base about machine learning

Roadmap

We are currently in the early stages of development, with the core framework for defining and analyzing stochastic algorithms in place, as well as initial results about bandit algorithms. Since the project is open and collaborative, the direction we will take for the extension of the library of theoretical results will be guided by the needs of the contributors, but we aim to cover a broad range of topics in machine learning theory and algorithms.

Year 1

Library - theoretical foundations: concentration inequalities, decision theory, information theory related to testing
Library - online learning: bandits, prediction with expert advice, online convex optimization
Executable algorithms: define a probabilistic monad with do-notation, test it on the existing bandit algorithms, and connect it to the formal specifications of the algorithms
Documentation: comprehensive documentation of the existing algorithm framework and bandit results, and tutorials to get started with the library
AI readiness: demonstration of an AI formalization of a bandit or online learning paper using the library

Year 2

Library - statistical learning theory: PAC learning, Rademacher complexity, generalization bounds (PAC-Bayes)
Library - reinforcement learning: Markov decision processes, value iteration, policy gradient methods
Executable algorithms: have executable implementations of some online learning algorithms, on which we can run experiments and compare with the theoretical predictions
Documentation: comprehensive documentation of the new theoretical results and algorithms, with tutorials
Subject coverage and AI readiness: a folder with formalizations or reports on the feasibility of formalizing a wide array of papers from COLT/NeurIPS/ICML/ICLR using the library

Year 3

Library - deep learning: neural networks, backpropagation, generalization in deep learning
Library - advanced reinforcement learning: deep Q-learning, actor-critic methods, multi-agent RL
Executable algorithms: have executable implementations of deep learning and advanced RL algorithms, linked with theory
Subject coverage and AI readiness: several new papers formalized using the library, and a demonstration of an AI system generating a new theorem about a machine learning algorithm using the library