The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability.
Despite the beneﬁts of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications.
On the other hand, training neural networks without over-parameterization faces many practical problems, e.g., being trapped in the local optimal.
Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods;
and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks.
To fill in this gap, this talk will introduce a series of novel approaches based on differential inclusions of inverse scale spaces.
Speciﬁcally, these methods can generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously.
Deep learning based models have excelled in many computer vision tasks and appear to surpass human performance. However, these models require an avalanche of expensive human labeled training data and many iterations to train their large number of parameters. This severely limits their scalability to the real-world long-tail distributed categories of noisy training set, some of which are with a large number of instances, but with only a few manually annotated. Learning from such extremely limited labeled examples is known as Few-Shot Learning (FSL). On the other hand, the noisy training set usually leads to the degradation of generalization and robustness of neural networks. The task of Learning With noisy Labels (LNL) has also attracted the research attention from the community.
In this tutorial, we will introduce a theoretically guaranteed framework that explore and learn from few-shot or noisy training data. The introduced approached are motivated from the Statistical outlier detection methods, and in support of unlabeled instances for few-shot visual recognition, and the task of learning with noisy labels.
Boosting, as gradient descent method, is known as the 'best off-the-shelf' methods in machine learning. The Inverse Scale Space method, is a Boosting-type algorithm as restricted gradient descent for sparsity learning whose underlying dynamics are governed by differential inclusions. It is also known as Mirror Descent in optimization and (Linearized) Bregman Iterations in applied mathematics. Such algorithms generate iterative regularization paths with structural sparsity where significant features or parameters emerge earlier on paths than noise. Despite its simplicity, it may outperform the popular (generalised) Lasso in both theory and experiments. A statistical theory of path consistency is introduced here. Under the Incoherence or Irrepresentable conditions, such regularization paths will first evolve in the support set of true sparse parameters or causal features before overfitting noise. Equipped with the variable splitting technique, they may achieve by proper early stopping the model selection consistency under a family of Irrepresentable Conditions which can be strictly weaker than the necessary and sufficient condition for generalized Lasso. A data adaptive early stopping rule can be developed based on the Knockoff method which aims to control the false discovery rates of causal variables. The utility and benefit of the algorithm are illustrated by various applications including sparse feature selection, learning graphical models, partial order ranking, finding sparse deep neural networks, and Alzheimer's disease detection via neuroimaging, etc.
For better interpretability and prediction power in medical imaging, one can exploit sparsity pattern among features to select only the disease-related ones for prediction. Common types of such sparsity patterns include pure sparsity, non-negativity constraint and spatial coherence, which can be unified into a structural sparsity formulation. We first review existing methods under this formulation, with the example of Alzheimer's Disease. We will show limitations of these methods, in terms of feature selection and prediction accuracy. Next, we introduce an iterative regularization path method, that introduces a variable splitting scheme into a differential inclusion. Equipped with this variable splitting, our method is not only theoretically guaranteed to select better variables, but also can explore beyond sparsity to empirically achieve better prediction results. To further control the false-discovery-rate for this structural sparsity, we introduce two approaches that are respectively based on a wrapper method called knockoff and the perspective of Empirical Bayes.
Learning to optimize, or L2O, is a method of developing optimization algorithms and improving their performance by offline training. It has achieved significant success in deep learning acceleration, signal processing, inverse problems, and SAT and MIP problems. In this talk, after giving a short introduction to L2O methods, we will focus on Algorithmic Unrolling or AU, an L2O methodology that has evolved quickly over the past 10 years. Early AUs imitate DNNs, so algorithms tended to have a large number of trainable parameters, which were expensive to train and hard to generalize. Recent AU methods have significantly reduced the number of parameters, in some cases to only a few. By examing the algorithms generated by L2O, we can obtain extremely high-performance, almost universal algorithms that no longer need training. This tutorial is based on the sparse coding problem and LISTA-series methods.
Sessions | Title (with slides) | Video | Speakers |
8:30 - 8:40 | Opening Remarks | Yanwei Fu | |
8:40 - 9:10 | Introduction | [Youtube][Bilibili] | Yanwei Fu |
9:10 - 9:30 | Sparse Learning in Noisy Data Detection | [Youtube][Bilibili] | Yanwei Fu and Yikai Wang |
9:30 - 10:15 | Inverse Scale Space Method for Sparse Learning and Statistical Properties | [Youtube][Bilibili] | Yuan Yao |
10:15 - 10:30 | Break | ||
10:30 - 11:00 | Sparsity Learning in Medical Imaging | [Youtube][Bilibili] | Xinwei Sun |
11:00 - 12:00 | Learning to Optimize: Algorithm Unrolling | [Youtube][Bilibili] | Wotao Yin |
Contact the Organizing Committee: yanweifu@fudan.edu.cn, yi-kai.wang@outlook.com