4.6 Article

Gradient Descent Learning With Floats

期刊

IEEE TRANSACTIONS ON CYBERNETICS
卷 52, 期 3, 页码 1763-1771

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2020.2997399

关键词

Complexity; convergence; floats; gradient descent; learning

资金

  1. National Key Research and Development Program of China [2018YFB0204300]
  2. National Natural Science Foundation of China [61932001, 61672478, 61906200]
  3. Guangdong Provincial Key Laboratory [2020B121201001]
  4. Program for Guangdong Introducing Innovative and Entrepreneurial Teams [2017ZT07X386]

向作者/读者索取更多资源

This article investigates the learning algorithm of gradient descent with floating point numbers in computers, and explores the performance of three gradient descent methods for smooth objective functions. The research shows that the convergence speed can be improved under certain conditions of the objective function.
The gradient learning descent method is the main workhorse of training tasks in artificial intelligence and machine-learning research. Current theoretical studies of gradient descent only use the continuous domains, which is unreal since electronic computers use the float point numbers to store and deal with data. Although existing results are sufficient for the extremely tiny errors in high-precision machines, they need to be improved for low-precision cases. This article presents an understanding of the learning algorithm in computers with floats. The performances of three gradient descents with the floating domain are investigated when the objective function is smooth. When the function is assumed to have the PL condition, the convergence speed can be improved. We proved that for floating gradient descent to obtain an error with is an element of, the iteration is O(1/is an element of) for the general smooth case, and O(ln(1/is an element of)) for the PL case. But is an element of should be larger than the s-bit machine epsilon delta(s) in the deterministic case, that is, is an element of >= Omega (delta(s)), while is an element of >= Omega (root delta(s)) for the stochastic case. Floating stochastic and sign gradient descents can both output an is an element of noised result in O(1/is an element of(2)) iterations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据