問題描述
我正在為僅支持 32 位單精度浮點(diǎn)運(yùn)算的嵌入式硬件編寫程序.但是,我正在實(shí)現(xiàn)的算法需要 64 位雙精度加法和比較.我正在嘗試使用兩個 float
的元組來模擬 double
數(shù)據(jù)類型.因此,double d
將被模擬為包含元組的 struct
:(float d. float d.low)
.
I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double
datatype using a tuple of two float
s. So a double d
will be emulated as a struct
containing the tuple: (float d. float d.low)
.
使用字典順序進(jìn)行比較應(yīng)該很簡單.然而,添加有點(diǎn)棘手,因?yàn)槲也淮_定我應(yīng)該使用哪個基地.應(yīng)該是 FLT_MAX
嗎?我如何檢測進(jìn)位?
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX
? And how can I detect a carry?
如何做到這一點(diǎn)?
編輯(清晰):我需要額外的有效數(shù)字而不是額外的范圍.
Edit (Clarity): I need the extra significant digits rather than the extra range.
推薦答案
double-float 是一種技術(shù),它使用單精度數(shù)對實(shí)現(xiàn)幾乎兩倍于單精度算術(shù)的精度,同時(shí)略微降低單精度指數(shù)范圍(由于范圍遠(yuǎn)端的中間下溢和溢出).基本算法由 T.J.1970 年代的德克爾和威廉·卡漢.下面我列出了兩篇最近發(fā)表的論文,展示了這些技術(shù)如何適用于 GPU,但這些論文中涵蓋的大部分內(nèi)容都適用于獨(dú)立于平臺的平臺,因此應(yīng)該對手頭的任務(wù)有用.
double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.
https://hal.archives-ouvertes.fr/hal-00021443紀(jì)堯姆·達(dá)·格拉薩,大衛(wèi)·德福在圖形硬件上實(shí)現(xiàn) float-float 運(yùn)算符,第七屆實(shí)數(shù)與計(jì)算機(jī)會議,RNC7.
https://hal.archives-ouvertes.fr/hal-00021443 Guillaume Da Gra?a, David Defour Implementation of float-float operators on graphics hardware, 7th conference on Real Numbers and Computers, RNC7.
http://andrewthall.org/papers/df64_qf128.pdf安德魯·索爾用于 GPU 計(jì)算的擴(kuò)展精度浮點(diǎn)數(shù).
http://andrewthall.org/papers/df64_qf128.pdf Andrew Thall Extended-Precision Floating-Point Numbers for GPU Computation.
這篇關(guān)于模擬“雙重"使用 2 個“浮動"的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!