☆ 4.6 Article

Gradient Descent Learning With Floats

IEEE TRANSACTIONS ON CYBERNETICS (2022)

期刊

IEEE TRANSACTIONS ON CYBERNETICS

卷 52, 期 3, 页码 1763-1771

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2020.2997399

关键词

Complexity; convergence; floats; gradient descent; learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

National Key Research and Development Program of China [2018YFB0204300]
National Natural Science Foundation of China [61932001, 61672478, 61906200]
Guangdong Provincial Key Laboratory [2020B121201001]
Program for Guangdong Introducing Innovative and Entrepreneurial Teams [2017ZT07X386]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article investigates the learning algorithm of gradient descent with floating point numbers in computers, and explores the performance of three gradient descent methods for smooth objective functions. The research shows that the convergence speed can be improved under certain conditions of the objective function.

The gradient learning descent method is the main workhorse of training tasks in artificial intelligence and machine-learning research. Current theoretical studies of gradient descent only use the continuous domains, which is unreal since electronic computers use the float point numbers to store and deal with data. Although existing results are sufficient for the extremely tiny errors in high-precision machines, they need to be improved for low-precision cases. This article presents an understanding of the learning algorithm in computers with floats. The performances of three gradient descents with the floating domain are investigated when the objective function is smooth. When the function is assumed to have the PL condition, the convergence speed can be improved. We proved that for floating gradient descent to obtain an error with is an element of, the iteration is O(1/is an element of) for the general smooth case, and O(ln(1/is an element of)) for the PL case. But is an element of should be larger than the s-bit machine epsilon delta(s) in the deterministic case, that is, is an element of >= Omega (delta(s)), while is an element of >= Omega (root delta(s)) for the stochastic case. Floating stochastic and sign gradient descents can both output an is an element of noised result in O(1/is an element of(2)) iterations.

Gradient Descent Learning With Floats

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Gradient Descent Learning With Floats

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文