Learning Sparse deep neural networks using efficient structured projections on convex constraints for green AI - Université Nice Sophia Antipolis Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

Learning Sparse deep neural networks using efficient structured projections on convex constraints for green AI

Résumé

In recent years, deep neural networks (DNN) have been applied to different domains and achieved dramatic performance improvements over state-of-the-art classical methods. These performances of DNNs were however often obtained with networks containing millions of parameters and which training required heavy computational power. In order to cope with this computational issue a huge literature deals with proximal regularization methods which are time consuming. In this paper, we propose instead a constrained approach. We provide the general framework for our new splitting projection gradient method. Our splitting algorithm iterates a gradient step and a projection on convex constraints. We study algorithms for different constraints: the classical 1 unstructured constraint and structured constraints such as the nuclear norm, the 2,1 constraint (Group LASSO). We propose a new 1,1 structured constraint for which we provide a new projection algorithm. Finally we use the recent "Lottery optimizer" replacing the thresholding by our 1,1 projection. We demonstrate the effectiveness of our method on three popular datasets (MNIST, Fashion MNIST and CIFAR). Experiments on these datasets show that our projection method with our new 1,1 structured constraint provides the best reduction of memory and computational power. Experiments show that fully connected linear DNN are more efficient for memory and MACCs reduction thus for green AI. I. MOTIVATION In recent years, deep neural networks have been applied to different domains and achieved dramatic accuracy improvements in image recognition [33], speech recognition [44] or natural language processing [46]. These works rely on deep networks with millions or even billions of parameters. For instance, the original training of ResNet-50 [26] (image classification) contains 25.6M parameters and required 29 hours using 8 GPUs. Storing the model requires 98MB. The memory cost of the inference on a single 224x224 image is about 103MB and 4 GFLOPs are needed [6]. The recent development of DNNs hardware accelerators like GPUs and the availability of deep learning frameworks for smartphones [30] suggest seamless transfer of DNNs models trained on servers onto mobile devices. However, it turns out that memory [48] and energy consumption [21] are still the main bottlenecks on running DNNs on such devices. Thus computational cost has an impact on carbon footprint. The authors of [53] argued that this trend is environmentally unfriendly. The authors of [50] advocate a practical solution by making an efficient evaluation criterion. In this paper, we propose a new splitting projection-gradient method with an efficient structured constraint to cope with these computational and memory issues. In the formulation of our method, a constraint defines a convex set and the regularization is replaced by a projection onto this convex set. The benefits of this formulation are twofold. The constraint has a direct geometric interpretation whereas the impact of parameter values in traditional regularization methods are more difficult to understand. Furthermore, the convergence of this new method is formally proved. The paper is organized as follows. We first present related works in Section II, then we develop in Section III the theoretical background of our constrained projection method. In Section IV, we give experimental comparisons between methods. The tests involve several datasets with different neural network architectures. II. RELATED WORKS Weights sparsification It is well known [12] that DNN models are largely over-parametrized and that in practice, relatively few network weights are actually necessary to learn accurately data features. Based on this result, numerous methods have been proposed in order to remove network weights (weight sparsification) either on pre-trained models or during the training phase. A basic idea to sparsify the weights of the neural network is to use the Least Absolute Shrinkage and Selection Operator (LASSO) formulation [55], [24], [17], [25], [1]. The 1 penalty added to the classification cost can be interpreted as a convexification of the 0 penalty [13]. In [22], weights with the smallest amplitude in pre-trained networks are removed. Model sensitivity to weights can also be used [54], [19]. where weights with weak influence on network output are pruned. Constraint optimization is used in [7] in order to learning sparse networks with 0 , 1 or 2 constraints on the weights.
Fichier principal
Vignette du fichier
DNN-ICPR-Final.pdf (689.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02556382 , version 1 (28-04-2020)
hal-02556382 , version 2 (16-09-2020)
hal-02556382 , version 3 (28-10-2020)

Identifiants

  • HAL Id : hal-02556382 , version 2

Citer

Frederic Guyard, Michel Barlaud. Learning Sparse deep neural networks using efficient structured projections on convex constraints for green AI. 2020. ⟨hal-02556382v2⟩
270 Consultations
1707 Téléchargements

Partager

Gmail Facebook X LinkedIn More