Abstract
Selecting an appropriate learning rate is crucial for training neural networks, but classical line-search methods, while effective, often rely on costly matrix–vector multiplications that limit their practical use. We propose two relaxed line-search formulations that avoid matrix–vector multiplications and can be solved efficiently: a naive relaxation based on a global Lipschitz constant and a more adaptive relaxation using a local Lipschitz constant. Instead of updating all layers simultaneously with a single learning rate, we adopt a layerwise update strategy in which one layer is updated at a time, simplifying the analysis and enabling rigorous convergence results. Numerical experiments further demonstrate that optimal learning rates vary significantly across layers, supporting the effectiveness of the proposed approach.
| Original language | English |
|---|---|
| Article number | 100807 |
| Journal | Array |
| Volume | 30 |
| DOIs | |
| State | Published - Jul 2026 |
Bibliographical note
Publisher Copyright:© 2026 The Authors
Keywords
- Layerwise training
- Learning rate selection
- Line-search minimization
- Lipschitz constant
- Neural network
Fingerprint
Dive into the research topics of 'A relaxation approach to layerwise determination of learning rates in deep neural networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver