pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

為什么編譯器不合并冗余的 std::atomic 寫入?

Why don#39;t compilers merge redundant std::atomic writes?(為什么編譯器不合并冗余的 std::atomic 寫入?)
本文介紹了為什么編譯器不合并冗余的 std::atomic 寫入?的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我想知道為什么沒有編譯器準(zhǔn)備將相同值的連續(xù)寫入合并到單個(gè)原子變量,例如:

I'm wondering why no compilers are prepared to merge consecutive writes of the same value to a single atomic variable, e.g.:

#include <atomic>
std::atomic<int> y(0);
void f() {
  auto order = std::memory_order_relaxed;
  y.store(1, order);
  y.store(1, order);
  y.store(1, order);
}

我嘗試過的每個(gè)編譯器都會(huì)發(fā)出上述寫入的 3 次.哪個(gè)合法的、無種族的觀察者可以看到上述代碼與經(jīng)過一次寫入的優(yōu)化版本之間的差異(即as-if"規(guī)則不適用)?

Every compiler I've tried will issue the above write three times. What legitimate, race-free observer could see a difference between the above code and an optimized version with a single write (i.e. doesn't the 'as-if' rule apply)?

如果變量是可變的,那么顯然沒有優(yōu)化是適用的.在我的情況下是什么阻止了它?

If the variable had been volatile, then obviously no optimization is applicable. What's preventing it in my case?

這是編譯器資源管理器中的代碼.

推薦答案

C++11/C++14 標(biāo)準(zhǔn)編寫確實(shí)允許將三個(gè)商店折疊/合并為一個(gè)商店的最終值.即使在這樣的情況下:

The C++11 / C++14 standards as written do allow the three stores to be folded/coalesced into one store of the final value. Even in a case like this:

  y.store(1, order);
  y.store(2, order);
  y.store(3, order); // inlining + constant-folding could produce this in real code

該標(biāo)準(zhǔn)保證在 y 上旋轉(zhuǎn)的觀察者(使用原子負(fù)載或 CAS)將永遠(yuǎn)看到 y == 2.依賴于此的程序?qū)⒕哂袛?shù)據(jù)競(jìng)爭(zhēng)錯(cuò)誤,但只有普通錯(cuò)誤類型的競(jìng)爭(zhēng),而不是 C++ 未定義行為類型的數(shù)據(jù)競(jìng)爭(zhēng).(它只是帶有非原子變量的 UB).一個(gè)希望有時(shí)看到它的程序甚至不一定有缺陷.(見下文:進(jìn)度條.)

The standard does not guarantee that an observer spinning on y (with an atomic load or CAS) will ever see y == 2. A program that depended on this would have a data race bug, but only the garden-variety bug kind of race, not the C++ Undefined Behaviour kind of data race. (It's UB only with non-atomic variables). A program that expects to sometimes see it is not necessarily even buggy. (See below re: progress bars.)

在 C++ 抽象機(jī)器上可能的任何排序都可以(在編譯時(shí))被選為 總是 發(fā)生的排序.這是實(shí)際中的 as-if 規(guī)則.在這種情況下,好像所有三個(gè)存儲(chǔ)都以全局順序背靠背發(fā)生,在 y=1y=3.

Any ordering that's possible on the C++ abstract machine can be picked (at compile time) as the ordering that will always happen. This is the as-if rule in action. In this case, it's as if all three stores happened back-to-back in the global order, with no loads or stores from other threads happening between the y=1 and y=3.

它不依賴于目標(biāo)架構(gòu)或硬件;就像編譯時(shí)重新排序一樣,即使在以強(qiáng)序 x86 為目標(biāo).編譯器不必保留您在考慮要編譯的硬件時(shí)可能期望的任何內(nèi)容,因此您需要障礙.屏障可以編譯成零匯編指令.

It doesn't depend on the target architecture or hardware; just like compile-time reordering of relaxed atomic operations are allowed even when targeting strongly-ordered x86. The compiler doesn't have to preserve anything you might expect from thinking about the hardware you're compiling for, so you need barriers. The barriers may compile into zero asm instructions.

這是一個(gè)實(shí)施質(zhì)量問題,可能會(huì)改變?cè)谡鎸?shí)硬件上觀察到的性能/行為.

It's a quality-of-implementation issue, and can change observed performance / behaviour on real hardware.

最明顯的問題是進(jìn)度條.將存儲(chǔ)從循環(huán)(不包含其他原子操作)中取出并將它們?nèi)空郫B為一個(gè)將導(dǎo)致進(jìn)度條保持在 0,然后在最后變?yōu)?100%.

The most obvious case where it's a problem is a progress bar. Sinking the stores out of a loop (that contains no other atomic operations) and folding them all into one would result in a progress bar staying at 0 and then going to 100% right at the end.

沒有 C++11 std::atomic 方法可以阻止他們?cè)谀悴幌胍那闆r下這樣做,所以現(xiàn)在編譯器只需選擇永遠(yuǎn)不要將多個(gè)原子操作合并為一個(gè).(將它們?nèi)亢喜橐粋€(gè)操作不會(huì)改變它們相對(duì)于彼此的順序.)

There's no C++11 std::atomic way to stop them from doing it in cases where you don't want it, so for now compilers simply choose never to coalesce multiple atomic operations into one. (Coalescing them all into one operation doesn't change their order relative to each other.)

編譯器編寫者已經(jīng)正確地注意到,程序員期望每次源代碼執(zhí)行 y.store() 時(shí),原子存儲(chǔ)實(shí)際上會(huì)發(fā)生在內(nèi)存中.(請(qǐng)參閱此問題的大多數(shù)其他答案,這些答案聲稱商店需要單獨(dú)發(fā)生,因?yàn)榭赡艿淖x者等待看到中間值.)即它違反了 最小驚喜原則.

Compiler-writers have correctly noticed that programmers expect that an atomic store will actually happen to memory every time the source does y.store(). (See most of the other answers to this question, which claim the stores are required to happen separately because of possible readers waiting to see an intermediate value.) i.e. It violates the principle of least surprise.

但是,在某些情況下它會(huì)非常有用,例如避免在循環(huán)中使用無用的 shared_ptr ref count inc/dec.

However, there are cases where it would be very helpful, for example avoiding useless shared_ptr ref count inc/dec in a loop.

顯然,任何重新排序或合并都不能違反任何其他排序規(guī)則.例如,num++;num--; 仍然必須完全阻止運(yùn)行時(shí)和編譯時(shí)重新排序,即使它不再觸及 num 處的內(nèi)存.

Obviously any reordering or coalescing can't violate any other ordering rules. For example, num++; num--; would still have to be full barrier to runtime and compile-time reordering, even if it no longer touched the memory at num.

正在討論擴(kuò)展 std::atomic API 以讓程序員控制此類優(yōu)化,此時(shí)編譯器將能夠在有用時(shí)進(jìn)行優(yōu)化,從而即使在并非故意低效的精心編寫的代碼中也可能發(fā)生.以下工作組討論/提案鏈接中提到了一些有用的優(yōu)化案例示例:

Discussion is under way to extend the std::atomic API to give programmers control of such optimizations, at which point compilers will be able to optimize when useful, which can happen even in carefully-written code that isn't intentionally inefficient. Some examples of useful cases for optimization are mentioned in the following working-group discussion / proposal links:

  • http://wg21.link/n4455:N4455 沒有健全的編譯器會(huì)優(yōu)化原子
  • http://wg21.link/p0062:WG21/P0062R1:編譯器應(yīng)該何時(shí)優(yōu)化原子?莉>
  • http://wg21.link/n4455: N4455 No Sane Compiler Would Optimize Atomics
  • http://wg21.link/p0062: WG21/P0062R1: When should compilers optimize atomics?

另請(qǐng)參閱 Richard Hodges 對(duì) int num"的 num++ 可以是原子的嗎?(見評(píng)論).另請(qǐng)參閱同一問題的我的回答的最后一部分,我更詳細(xì)地論證了允許這種優(yōu)化.(在此簡(jiǎn)短,因?yàn)槟切?C++ 工作組鏈接已經(jīng)承認(rèn)當(dāng)前編寫的標(biāo)準(zhǔn)確實(shí)允許這樣做,而且當(dāng)前的編譯器只是沒有故意優(yōu)化.)

See also discussion about this same topic on Richard Hodges' answer to Can num++ be atomic for 'int num'? (see the comments). See also the last section of my answer to the same question, where I argue in more detail that this optimization is allowed. (Leaving it short here, because those C++ working-group links already acknowledge that the current standard as written does allow it, and that current compilers just don't optimize on purpose.)

在當(dāng)前標(biāo)準(zhǔn)中,volatile atomic;y 將是確保不允許對(duì)其進(jìn)行優(yōu)化的一種方法.(正如 Herb Sutter 在 SO 答案中指出的,volatileatomic 已經(jīng)共享了一些需求,但它們是不同的).另請(qǐng)參閱 std::memory_ordervolatile 在 cppreference 上.

Within the current standard, volatile atomic<int> y would be one way to ensure that stores to it are not allowed to be optimized away. (As Herb Sutter points out in an SO answer, volatile and atomic already share some requirements, but they are different). See also std::memory_order's relationship with volatile on cppreference.

對(duì) volatile 對(duì)象的訪問不允許被優(yōu)化掉(因?yàn)樗鼈兛赡苁莾?nèi)存映射的 IO 寄存器,例如).

Accesses to volatile objects are not allowed to be optimized away (because they could be memory-mapped IO registers, for example).

使用 volatile atomic 主要修復(fù)了進(jìn)度條問題,但如果/當(dāng) C++ 決定使用不同的語(yǔ)法來控制優(yōu)化以便編譯器使用不同的語(yǔ)法時(shí),它有點(diǎn)丑陋并且可能在幾年后看起來很傻可以開始實(shí)踐了.

Using volatile atomic<T> mostly fixes the progress-bar problem, but it's kind of ugly and might look silly in a few years if/when C++ decides on different syntax for controlling optimization so compilers can start doing it in practice.

我認(rèn)為我們可以確信編譯器不會(huì)開始進(jìn)行這種優(yōu)化,除非有一種方法可以控制它.希望它是某種選擇加入(如 memory_order_release_coalesce),在編譯為 C++ 時(shí)不會(huì)改變現(xiàn)有代碼 C++11/14 代碼的行為.但它可能類似于 wg21/p0062 中的提議:使用 [[brittle_atomic]] 標(biāo)記不優(yōu)化案例.

I think we can be confident that compilers won't start doing this optimization until there's a way to control it. Hopefully it will be some kind of opt-in (like a memory_order_release_coalesce) that doesn't change the behaviour of existing code C++11/14 code when compiled as C++whatever. But it could be like the proposal in wg21/p0062: tag don't-optimize cases with [[brittle_atomic]].

wg21/p0062 警告說,即使 volatile atomic 也不能解決所有問題,因此不鼓勵(lì)將其用于此目的.它給出了這個(gè)例子:

wg21/p0062 warns that even volatile atomic doesn't solve everything, and discourages its use for this purpose. It gives this example:

if(x) {
    foo();
    y.store(0);
} else {
    bar();
    y.store(0);  // release a lock before a long-running loop
    for() {...} // loop contains no atomics or volatiles
}
// A compiler can merge the stores into a y.store(0) here.

即使使用 volatile atomicy,允許編譯器從 if/else 中提取 y.store() 并且只做一次,因?yàn)樗匀恢蛔?1存儲(chǔ)相同的值.(這將在 else 分支中的長(zhǎng)循環(huán)之后).特別是如果商店只是 relaxedrelease 而不是 seq_cst.

Even with volatile atomic<int> y, a compiler is allowed to sink the y.store() out of the if/else and just do it once, because it's still doing exactly 1 store with the same value. (Which would be after the long loop in the else branch). Especially if the store is only relaxed or release instead of seq_cst.

volatile 確實(shí)停止了問題中討論的合并,但這指出 atomic<> 上的其他優(yōu)化對(duì)于實(shí)際性能也可能存在問題.

volatile does stop the coalescing discussed in the question, but this points out that other optimizations on atomic<> can also be problematic for real performance.

不優(yōu)化的其他原因包括:沒有人編寫復(fù)雜的代碼來允許編譯器安全地進(jìn)行這些優(yōu)化(而不會(huì)出錯(cuò)).這還不夠,因?yàn)?N4455 表示 LLVM 已經(jīng)實(shí)現(xiàn)或可以輕松實(shí)現(xiàn)它提到的幾個(gè)優(yōu)化.

Other reasons for not optimizing include: nobody's written the complicated code that would allow the compiler to do these optimizations safely (without ever getting it wrong). This is not sufficient, because N4455 says LLVM already implements or could easily implement several of the optimizations it mentioned.

不過,讓程序員感到困惑的原因當(dāng)然是有道理的.無鎖代碼一開始就很難正確編寫.

The confusing-for-programmers reason is certainly plausible, though. Lock-free code is hard enough to write correctly in the first place.

不要隨意使用原子武器:它們并不便宜,也沒有進(jìn)行太多優(yōu)化(目前根本沒有).但是,使用 std::shared_ptr<T> 避免冗余原子操作并不總是那么容易,因?yàn)樗鼪]有非原子版本(盡管 這里的一個(gè)答案給出了一個(gè)簡(jiǎn)單的方法為 gcc 定義一個(gè) shared_ptr_unsynchronized).

Don't be casual in your use of atomic weapons: they aren't cheap and don't optimize much (currently not at all). It's not always easy easy to avoid redundant atomic operations with std::shared_ptr<T>, though, since there's no non-atomic version of it (although one of the answers here gives an easy way to define a shared_ptr_unsynchronized<T> for gcc).

這篇關(guān)于為什么編譯器不合并冗余的 std::atomic 寫入?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

What is the fastest way to transpose a matrix in C++?(在 C++ 中轉(zhuǎn)置矩陣的最快方法是什么?)
Sorting zipped (locked) containers in C++ using boost or the STL(使用 boost 或 STL 在 C++ 中對(duì)壓縮(鎖定)容器進(jìn)行排序)
Rotating a point about another point (2D)(圍繞另一個(gè)點(diǎn)旋轉(zhuǎn)一個(gè)點(diǎn) (2D))
Image Processing: Algorithm Improvement for #39;Coca-Cola Can#39; Recognition(圖像處理:Coca-Cola Can 識(shí)別的算法改進(jìn))
How do I construct an ISO 8601 datetime in C++?(如何在 C++ 中構(gòu)建 ISO 8601 日期時(shí)間?)
Sort list using STL sort function(使用 STL 排序功能對(duì)列表進(jìn)行排序)
主站蜘蛛池模板: 众品家具网-家具品牌招商_家具代理加盟_家具门户的首选网络媒体。 | 活动策划,舞台搭建,活动策划公司-首选美湖上海活动策划公司 | 泰国试管婴儿_泰国第三代试管婴儿费用|成功率|医院—新生代海外医疗 | 亚克力制品定制,上海嘉定有机玻璃加工制作生产厂家—官网 | 中国品牌排名投票_十大品牌榜单_中国著名品牌【中国品牌榜】 | 热风机_工业热风机生产厂家上海冠顶公司提供专业热风机图片价格实惠 | 钛板_钛管_钛棒_钛盘管-无锡市盛钛科技有限公司 | 超声波乳化机-超声波分散机|仪-超声波萃取仪-超声波均质机-精浩机械|首页 | 全自动在线分板机_铣刀式在线分板机_曲线分板机_PCB分板机-东莞市亿协自动化设备有限公司 | 彩超机-黑白B超机-便携兽用B超机-多普勒彩超机价格「大为彩超」厂家 | 微水泥_硅藻泥_艺术涂料_艺术漆_艺术漆加盟-青岛泥之韵环保壁材 武汉EPS线条_EPS装饰线条_EPS构件_湖北博欧EPS线条厂家 | 带式压滤机_污泥压滤机_污泥脱水机_带式过滤机_带式压滤机厂家-河南恒磊环保设备有限公司 | 混合反应量热仪-高温高压量热仪-微机差热分析仪DTA|凯璞百科 | 帽子厂家_帽子工厂_帽子定做_义乌帽厂_帽厂_制帽厂_帽子厂_浙江高普制帽厂 | 起好名字_取个好名字_好名网免费取好名在线打分 | 昆山PCB加工_SMT贴片_PCB抄板_线路板焊接加工-昆山腾宸电子科技有限公司 | vr安全体验馆|交通安全|工地安全|禁毒|消防|安全教育体验馆|安全体验教室-贝森德(深圳)科技 | SRRC认证|CCC认证|CTA申请_IMEI|MAC地址注册-英利检测 | 橡胶膜片,夹布膜片,橡胶隔膜密封,泵阀设备密封膜片-衡水汉丰橡塑科技公司网站 | 工业CT-无锡璟能智能仪器有限公司| 钢制暖气片散热器_天津钢制暖气片_卡麦罗散热器厂家 | 空气能采暖,热泵烘干机,空气源热水机组|设备|厂家,东莞高温热泵_正旭新能源 | loft装修,上海嘉定酒店式公寓装修公司—曼城装饰 | 分类168信息网 - 分类信息网 免费发布与查询 | 蓄电池回收,ups电池后备电源回收,铅酸蓄电池回收,机房电源回收-广州益夫铅酸电池回收公司 | 环压强度试验机-拉链拉力试验机-上海倾技仪器仪表科技有限公司 | 冰晶石|碱性嫩黄闪蒸干燥机-有机垃圾烘干设备-草酸钙盘式干燥机-常州市宝康干燥 | 华夏医界网_民营医疗产业信息平台_民营医院营销管理培训 | RO反渗透设备_厂家_价格_河南郑州江宇环保科技有限公司 | 智慧消防-消防物联网系统云平台| 不锈钢钢格栅板_热浸锌钢格板_镀锌钢格栅板_钢格栅盖板-格美瑞 | 刺绳_刀片刺网_刺丝滚笼_不锈钢刺绳生产厂家_安平县浩荣金属丝网制品有限公司-安平县浩荣金属丝网制品有限公司 | 新疆系统集成_新疆系统集成公司_系统集成项目-新疆利成科技 | 桁架楼承板-钢筋桁架楼承板-江苏众力达钢筋楼承板厂 | 高效节能电机_伺服主轴电机_铜转子电机_交流感应伺服电机_图片_型号_江苏智马科技有限公司 | 鑫铭东办公家具一站式定制采购-深圳办公家具厂家直销 | 气动|电动调节阀|球阀|蝶阀-自力式调节阀-上海渠工阀门管道工程有限公司 | 汽车水泵_汽车水泵厂家-瑞安市骏迪汽车配件有限公司 | 东莞工厂厂房装修_无尘车间施工_钢结构工程安装-广东集景建筑装饰设计工程有限公司 | 餐饮小吃技术培训-火锅串串香培训「何小胖培训」_成都点石成金[官网] | AR开发公司_AR增强现实_AR工业_AR巡检|上海集英科技 |