Article
Computer Science, Software Engineering
Marko Lange
Summary: The study introduces a new accurate summation algorithm based on error-free summation, providing faithfully rounded floating-point approximations. The algorithm is compared with other accurate and high-precision summation approaches.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
(2022)
Article
Computer Science, Hardware & Architecture
Sylvie Boldo, Christoph Lauter, Jean-Michel Muller
Summary: The 2019 version of the IEEE 754 Standard recommends new augmented operations for binary formats, utilizing a new rounding direction: round-to-nearest ties-to-zero. These operations can be implemented using currently available operations, with a partial formal proof of correctness using round-to-nearest ties-to-even.
IEEE TRANSACTIONS ON COMPUTERS
(2021)
Article
Engineering, Electrical & Electronic
Gennaro Di Meo, Antonio Giuseppe Maria Strollo, Davide De Caro
Summary: This paper proposes a new method for approximating floating-point division by calculating coefficients to minimize the Mean Relative Error Distance (MRED). The hardware implementation uses a lookup table, multipliers, and an adder, with an aggressive coefficients quantization to optimize the design. The results show that the proposed design outperforms the state-of-the-art, offering the best trade-off between hardware complexity and accuracy, and achieves remarkable performance in image processing applications.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
(2023)
Article
Mathematics, Applied
Michael P. Connolly, Nicholas J. Higham, Theo Mary
Summary: Stochastic rounding rounds real numbers to the nearest larger or smaller floating-point numbers based on certain probabilities, showing potential benefits in low precision computations for deep learning. It is compared with round to nearest method, revealing both similarities and significant differences in their properties. The analysis demonstrates that rounding errors in stochastic rounding are mean independent random variables, resulting in unconditionally bounded backward errors for a range of linear algebra computations.
SIAM JOURNAL ON SCIENTIFIC COMPUTING
(2021)
Article
Mathematics, Applied
M. Croci, M. B. Giles
Summary: Motivated by machine learning, low-precision hardware-supported computing has regained attention in recent years. This paper studies the accumulation of rounding errors in the solution of the heat equation using different rounding methods. It demonstrates how to implement the scheme to reduce rounding errors and provides estimates for local and global rounding errors.
IMA JOURNAL OF NUMERICAL ANALYSIS
(2023)
Article
Mathematics, Applied
Massimiliano Fasi, Nicholas J. Higham, Florent Lopez, Theo Mary, Mantas Mikaitis
Summary: This paper investigates the use of multiword arithmetic to improve the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply--add (FMA) hardware, focusing on NVIDIA GPUs' tensor cores. The authors develop an error analysis of multiword matrix multiplication and implement several algorithms using double-fp16 arithmetic. However, they find that double-fp16 is less accurate than fp32 arithmetic, despite satisfying the same worst-case error bound. By using probabilistic error analysis, they identify the rounding mode used by the NVIDIA tensor cores as the likely cause and propose a parameterized blocked summation algorithm to alleviate the problem and improve the performance-accuracy tradeoff.
SIAM JOURNAL ON SCIENTIFIC COMPUTING
(2023)
Review
Multidisciplinary Sciences
Matteo Croci, Massimiliano Fasi, Nicholas J. Higham, Theo Mary, Mantas Mikaitis
Summary: Stochastic rounding is a rounding mode that randomly maps a real number to one of the closest values in a finite precision number system. It has been proposed for use in computer arithmetic and has gained renewed interest. Compared to round to nearest, stochastic rounding is immune to stagnation and provides a higher probability error bound.
ROYAL SOCIETY OPEN SCIENCE
(2022)
Article
Computer Science, Information Systems
Filippo Savi, Amin Farjudian, Giampaolo Buticchi, Davide Barater, Giovanni Franceschini
Summary: This paper proposes a method to assess the numerical stability of control algorithms using interval analysis, with a case study on electric drive systems. The results show that resonant control is more robust in terms of numerical stability compared to vector space decomposition, making it the preferred choice for mission-critical electric drive control.
Article
Mathematics, Applied
Xiaojun Lei, Tongxiang Gu, Stef Graillat, Hao Jiang, Jin Qi
Summary: This paper presents a new parallel accurate algorithm, PAccSumK, for computing the summation of floating-point numbers. Experimental results show that our algorithm outperforms the PSumK algorithm in terms of accuracy and computing time for summation problems with large condition numbers.
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS
(2022)
Article
Computer Science, Information Systems
Raphael Seidel, Nikolay Tcholtchev, Sebastian Bock, Colin Kai-Uwe Becker, Manfred Hauswirth
Summary: One of the major promises of quantum computing is the realization of SIMD operations using superposition. This paper introduces the formalism of encoding semi-boolean polynomials, which can be used for generating arithmetic quantum circuits. The application of these methods to integer multiplication shows a significant reduction in circuit depth.
Article
Computer Science, Information Systems
Massimiliano Fasi, Mantas Mikaitis
Summary: This research presents algorithms for performing the five elementary arithmetic operations in floating point arithmetic with stochastic rounding, which can simulate the rounding mode when hardware does not support it, enabling exploration of the behavior of this rounding mode without specific hardware.
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING
(2021)
Article
Engineering, Electrical & Electronic
Reza Omidi, Sepehr Sharifzadeh
Summary: This paper proposes an approximate floating-point adder by designing inexact mantissa adder and exponent subtractor to reduce power consumption and delay. Experimental results show that the proposed method reduces power consumption and delay by 37% and 62% compared to IEEE-754 single-precision floating-point adder.
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS
(2021)
Article
Computer Science, Information Systems
N. S. Sathyavathi, P. Augusta Sophy Beulet
Summary: Floating-point numbers are widely used in computer and signal processing applications, but errors can occur in floating-point representation due to rounding. This paper introduces various rounding methods and proposes a new LGRS method. The LGRS method allows for the prediction of errors caused by rounding. Statistical analysis and graphical illustrations are used to analyze the various rounding methods.
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
(2023)
Article
Mathematics, Applied
Stephen F. McCormick, Joseph Benzaken, Rasmus Tamstorf
Summary: This paper establishes the first theoretical framework for analyzing the rounding-error effects on multigrid methods using mixed-precision iterative-refinement solvers, providing normwise forward error analysis and introducing the notion of progressive precision for multigrid solvers. The theoretical results show that rounding an exact result to finite precision causes an error in the energy norm proportional to the square root of the matrix condition number K, indicating that the limiting accuracy for both V-cycles and full multigrid is proportional to k(1/2) in energy. Additionally, the loss of convergence rate due to rounding grows in proportion to k(1/2), but is argued to be insignificant in practice.
SIAM JOURNAL ON SCIENTIFIC COMPUTING
(2021)
Article
Engineering, Electrical & Electronic
Zijing Niu, Tingting Zhang, Honglan Jiang, Bruce F. Cockburn, Leibo Liu, Jie Han
Summary: This article proposes five hardware-efficient logarithmic floating-point multipliers, which use simple operators and radix-4 logarithms to reduce hardware complexity and achieve better trade-offs between accuracy and hardware. The proposed multipliers show superior performance in terms of image quality and accuracy in JPEG image compression and neural network applications, while consuming less energy and occupying smaller area compared to state-of-the-art designs.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
(2023)