pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

如何使用 c++11 CAS 實現 ABA 計數器?

How can I implement ABA counter with c++11 CAS?(如何使用 c++11 CAS 實現 ABA 計數器?)
本文介紹了如何使用 c++11 CAS 實現 ABA 計數器?的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我正在實現一個基于此算法的無鎖隊列,它使用一個計數器來解決 ABA 問題.但我不知道如何用 c++11 CAS 實現這個計數器.例如從算法:

I am implementing a lock-free queue based on this algorithm, which uses a counter to solve the ABA problem. But I don't know how to implement this counter with c++11 CAS. For example, from the algorithm:

E9:    if CAS(&tail.ptr->next, next, <node, next.count+1>)

這是一個原子操作,意思是如果tail.ptr->next等于next,讓tail.ptr->nextcode> 指向 node同時(原子地)使 next.count+1.但是,使用C++11 CAS,我只能實現:

It is an atomic operation, meaning if tail.ptr->next is equal to next, let tail.ptr->next point to node and simultaneously (atomically) make next.count+1. However, using C++11 CAS, I can only implement:

std::atomic_compare_exchange_weak(&tail.ptr->next, next, node);

不能使 next.count+1 同時發生.

推薦答案

要通過一次原子操作同時原子地修改兩個東西,您需要將它們放在相鄰的內存中,例如在兩個成員的結構中.然后你可以使用 std::atomic 讓 gcc 發出 在 x86-64 上鎖定 cmpxchg16b,例如.

To atomically modify two things at once with a single atomic operation, you need to put them in adjacent memory, e.g. in a two-member struct. Then you can use std::atomic<my_struct> to get gcc to emit lock cmpxchg16b on x86-64, for example.

為此您不需要內聯匯編,為了避免它,需要一點 C++ 語法痛苦.https://gcc.gnu.org/wiki/DontUseInlineAsm.

You don't need inline asm for this, and it's worth a bit of C++ syntax pain to avoid it. https://gcc.gnu.org/wiki/DontUseInlineAsm.

不幸的是,對于當前的編譯器,您需要使用 union 來獲得高效的代碼以僅讀取一對中的一個.顯而易見"對結構進行原子加載然后只使用一個成員的方法仍然會導致 lock cmpxchg16b 讀取整個結構,即使我們只需要一個成員.(慢得多,并且弄臟了緩存線,因此讀者與其他讀者競爭).我相信指針的正常 64b 負載仍然可以正確實現 x86 上的獲取內存排序語義(以及原子性),但是當前的編譯器即使對于 std::memory_order_relaxed<也不會進行這種優化/code>,所以我們用聯合來欺騙他們.

Unfortunately, with current compilers, you need to use a union to get efficient code for reading just one of the pair. The "obvious" way of doing an atomic load of the struct and then only using one member still leads to a lock cmpxchg16b to read the whole struct even though we only need one member. (Much slower, and dirties the cache line so readers contend with other readers). I'm confident that a normal 64b load of the pointer would still correctly implement the acquire memory-ordering semantics on x86 (as well as atomicity), but current compilers don't do that optimization even for std::memory_order_relaxed, so we trick them into it with a union.

(提交了 GCC 錯誤 80835 關于此.TODO:同樣適用于如果這是一個有用的想法,請叮一下.)

(submitted GCC bug 80835 about this. TODO: same for clang if this is a useful idea.)

清單:

  • 確保您的編譯器生成有效代碼,以便在只讀情況下僅加載一個成員,而不是一對 lock cmpxchg16b.例如使用聯合.

確保您的編譯器保證在編寫不同的聯合成員后訪問聯合的一個成員在該實現中具有明確定義的行為.聯合類型雙關在 C99 中是合法的(所以這應該適用于 C11 stdatomic),但它在 ISO C++11 中是 UB.但是,它在 C++ 的 GNU 方言中是合法的(受 gcc、clang 和 ICC 等支持).

Make sure your compiler guarantees that accessing one member of a union after writing a different union member has well-defined behaviour in that implementation. Union type-punning is legal in in C99 (so this should work well with C11 stdatomic), but it's UB in ISO C++11. However, it's legal in the GNU dialect of C++ (supported by gcc, clang, and ICC, among others).

確保您的對象是 16B 對齊的,或 32 位指針的 8B 對齊.更一般地說, alignas(2*sizeof(void*)) 應該可以工作.未對齊的 locked 指令在 x86 上可能非常,特別是當它們跨越緩存線邊界時.如果對象未對齊,clang3.8 甚至會將其編譯為庫調用.

Make sure your object is 16B-aligned, or 8B-aligned for 32-bit pointers. More generally, alignas(2*sizeof(void*)) should work. Misaligned locked instructions can be very slow on x86, especially if they cross a cache-line boundary. clang3.8 even compiles it to a library call if the object isn't aligned.

使用 -mcx16 編譯 x86-64 版本.cmpxchg16b 不被最早的 x86-64 CPU (AMD K8) 支持,但之后的一切都應該支持.如果沒有 -mcx16,您會得到一個庫函數調用(可能使用全局鎖).32 位等效項 cmpxchg8b 已經足夠老,現代編譯器假定支持它.(并且可以使用 SSE、MMX 甚至 x87 來執行 64b 原子加載/存儲,因此在讀取一個成員時使用聯合對于獲得良好性能來說不太重要).

Compile with -mcx16 for x86-64 builds. cmpxchg16b was not supported by the earliest x86-64 CPUs (AMD K8), but should be on everything after that. Without -mcx16, you get a library function call (which probably uses a global lock). The 32-bit equivalent, cmpxchg8b, is old enough that modern compilers assume its support. (And can use SSE, MMX, or even x87 to do 64b atomic loads/stores, so using a union is somewhat less important for good performance when reading one member).

確保指針+uintptr_t 原子對象是無鎖的.這對于 x32 和 32 位 ABI(8B 對象)幾乎是有保證的,但對于 16B 對象則不然.例如MSVC 使用 x86-64 的鎖.

Make sure that a pointer+uintptr_t atomic object is lock-free. This is pretty much guaranteed for x32 and 32-bit ABIs (8B object), but not for 16B objects. e.g. MSVC uses a lock for x86-64.

gcc7 及更高版本將調用 libatomic 而不是內聯 lock cmpxchg16b,并且將從 atomic_is_lock_free ( 的原因包括它太慢了,這不是用戶期望 is_lock_free 的意思),但是至少現在,libatomic 實現仍然在該指令可用的目標上使用 lock cmpxchg16b.(它甚至可以為只讀原子對象出現段錯誤,所以它真的不理想.更重要的是,讀取器與其他讀取器爭奪對緩存行的獨占訪問.這就是為什么我們要如此麻煩地避免 lockcmpxchg16b 當我們只需要一個 8 字節的一半時,這里的讀取端.)

gcc7 and later will call libatomic instead of inlining lock cmpxchg16b, and will return false from atomic_is_lock_free (for reasons including that it's so slow it's not what users expect is_lock_free to mean), but at least for now the libatomic implementation still uses lock cmpxchg16b on targets where that instruction is available. (It can even segfault for read-only atomic objects, so it's really not ideal. More importantly, readers contend with other readers for exclusive access to the cache line. That's why we're going to so much trouble to avoid lock cmpxchg16b for the read side here when we only want one 8-byte half.)

這是一個帶有 CAS 重試循環的代碼示例,它編譯為看起來正確的 asm,并且我認為對于允許聯合類型雙關語的實現,沒有 UB 或其他不安全的 C++.它是用 C 風格編寫的(非成員函數等),但如果你編寫成員函數也一樣.

Here's an example of code with a CAS retry-loop that compiles to asm that looks right, and I think is free of UB or other unsafe C++ for implementations that allow union type punning. It's written in C style (non-member functions, etc.) but it would be the same if you wrote member functions.

請參閱與ASM輸出的來自 Godbolt 編譯器瀏覽器上的 gcc6.3.對于 -m32,它使用 cmpxchg8b 的方式與 64 位代碼使用 cmpxchg16b 的方式相同.使用 -mx32(長模式下的 32 位指針),它可以簡單地使用 64 位 cmpxchg 和普通的 64 位整數加載來獲取一個原子中的兩個成員加載.

See the code with asm output from gcc6.3 on the Godbolt compiler explorer. With -m32, it uses cmpxchg8b the same way 64-bit code uses cmpxchg16b. With -mx32 (32-bit pointers in long mode) it can simply use a 64-bit cmpxchg, and normal 64-bit integer loads to grab both members in one atomic load.

這是可移植的 C++11(聯合類型雙關除外),沒有任何特定于 x86 的內容.但是,它僅在可以 CAS 兩個指針大小的對象的目標上高效.它編譯為對 ARM/ARM64 和 MIPS64 的 __atomic_compare_exchange_16 庫函數的調用,正如您在 Godbolt 上看到的那樣.

This is portable C++11 (except the union type-punning), with nothing x86-specific. It's only efficient on targets that can CAS an object the size of two pointers, though. e.g. it compiles to a call to a __atomic_compare_exchange_16 library function for ARM / ARM64 and MIPS64, as you can see on Godbolt.

它不會在 MSVC 上編譯,其中 atomic 大于 counted_ptr_separate,所以 static_assert 會捕獲它.據推測,MSVC 在原子對象中包含一個鎖成員.

It doesn't compile on MSVC, where atomic<counted_ptr> is larger than counted_ptr_separate, so the static_assert catches it. Presumably MSVC includes a lock member in the atomic object.

#include <atomic>
#include <stdint.h>

using namespace std;

struct node {
  // This alignas is essential for clang to use cmpxchg16b instead of a function call
  // Apparently just having it on the union member isn't enough.
  struct alignas(2*sizeof(node*)) counted_ptr {
    node *    ptr;
    uintptr_t count;  // use pointer-sized integers to avoid padding
  };
  
  // hack to allow reading just the pointer without lock-cmpxchg16b,
  // but still without any C++ data race
  struct counted_ptr_separate {
    atomic<node *>    ptr;
    atomic<uintptr_t> count_separate;  // var name emphasizes that accessing this way isn't atomic with ptr
  };

  static_assert(sizeof(atomic<counted_ptr>) == sizeof(counted_ptr_separate), "atomic<counted_ptr> isn't the same size as the separate version; union type-punning will be bogus");
  //static_assert(std::atomic<counted_ptr>{}.is_lock_free());
  
  union {  // anonymous union: the members are directly part of struct node
    alignas(2*sizeof(node*)) atomic<counted_ptr> next_and_count;
    counted_ptr_separate  next;
  };
  // TODO: write member functions to read next.ptr or read/write next_and_count

  int data[4];
};


// make sure read-only access is efficient.
node *follow(node *p) {   // good asm, just a mov load
  return p->next.ptr.load(memory_order_acquire);
}
node *follow_nounion(node *p) {  // really bad asm, using cmpxchg16b to load the whole thing
  return p->next_and_count.load(memory_order_acquire).ptr;
}


void update_next(node &target, node *desired)
{
  // read the old value efficiently to avoid overhead for the no-contention case
  // tearing (or stale data from a relaxed load) will just lead to a retry
  node::counted_ptr expected = {
      target.next.ptr.load(memory_order_relaxed),
      target.next.count_separate.load(memory_order_relaxed) };

  bool success;
  do {
    node::counted_ptr newval = { desired, expected.count + 1 };
    // x86-64: compiles to cmpxchg16b
    success = target.next_and_count.compare_exchange_weak(
                           expected, newval, memory_order_acq_rel);
    // updates exected on failure
  } while( !success );
}

clang 4.0 -O3 -mcx16 的 asm 輸出是:

The asm output from clang 4.0 -O3 -mcx16 is:

update_next(node&, node*):
    push    rbx             # cmpxchg16b uses rbx implicitly so it has to be saved/restored
    mov     rbx, rsi
    mov     rax, qword ptr [rdi]     # load the pointer
    mov     rdx, qword ptr [rdi + 8] # load the counter
.LBB2_1:                        # =>This Inner Loop Header: Depth=1
    lea     rcx, [rdx + 1]
    lock
    cmpxchg16b      xmmword ptr [rdi]
    jne     .LBB2_1
    pop     rbx
    ret

gcc 做了一些笨重的存儲/重新加載,但基本上是相同的邏輯.

gcc does some clunky store/reloads, but is basically the same logic.

follow(node*) 編譯為 mov rax, [rdi]/ret,因此對指針的只讀訪問為由于工會黑客,它應該很便宜.

follow(node*) compiles to mov rax, [rdi] / ret, so read-only access to the pointer is as cheap as it should be, thanks to the union hack.

它確實依賴于通過一個成員編寫聯合并通過另一個成員讀取它,以便在不使用 lock cmpxchg16b 的情況下高效讀取指針.這保證在 GNU C++(和 ISO C99/C11)中有效,但在 ISO C++ 中無效.許多其他 C++ 編譯器確實保證聯合類型雙關有效,但即使沒有它它也可能仍然有效:我們總是使用 std::atomic 加載,它必須假設值是異步修改的.所以我們應該避免類似別名的問題,即在通過另一個指針(或聯合成員)寫入值后,寄存器中的值仍然被認為是有效的.不過,編譯器認為是獨立的東西的編譯時重新排序可能是一個問題.

It does depend on writing a union through one member and reading it through another, to do efficient reads of just the pointer without using a lock cmpxchg16b. This is guaranteed to work in GNU C++ (and ISO C99/C11), but not in ISO C++. Many other C++ compilers do guarantee that union type-punning works, but even without that it would probably still work: We're always using std::atomic loads which have to assume that the value was modified asynchronously. So we should be immune from aliasing-like problems where values in registers are still considered live after writing the value through another pointer (or union member). Compile-time reordering of things the compiler thinks are independent could be a problem, though.

在指針 + 計數器的原子 cmpxchg 之后原子地讀取指針仍然應該為您提供 x86 上的獲取/釋放語義,但我認為 ISO C++ 對此沒有任何說明.猜測廣泛的發布存儲(作為 compare_exchange_weak 的一部分將與大多數架構上來自同一地址的更窄的負載同步(就像在 x86 上一樣),但是 AFAIK C++ std::atomic 不保證任何類型雙關.

Atomically reading just the pointer after an atomic cmpxchg of the pointer+counter should still give you acquire/release semantics on x86, but I don't think ISO C++ says anything about it. I would guess that a wide release-store (as part of the compare_exchange_weak will synchronize with a narrower load from the same address on most architectures (like it does on x86), but AFAIK C++ std::atomic doesn't guarantee anything about type-punning.

與指針 + ABA 計數器無關,但可以用于其他使用聯合以允許訪問更大原子對象的子集的應用程序:不要使用聯合來允許原子存儲僅指向指針或者只是柜臺.至少如果您關心與該對的獲取負載同步,則不會.即使是強序 x86 也可以 重新排序一個狹窄的商店,用更寬的負載完全包含它.一切仍然是原子的,但就內存排序而言,你進入了奇怪的領域.

Not relevant for pointer + ABA-counter, but could come up for other applications of using a union to allow access to subsets of larger atomic object: Don't use the union to allow atomic stores to just the pointer or just the counter. At least not if you care about synchronization with an acquire-load of the pair. Even strongly-ordered x86 can reorder a narrow store with a wider load that fully contains it. Everything is still atomic, but you get into weird territory as far as memory-ordering goes.

在 x86-64 上,原子 16B 加載需要 lock cmpxchg16b(這是一個完整的內存屏障,防止前面的窄存儲在它之后變得全局可見).但是,如果您將其與 32 位指針(或 32 位數組索引)一起使用,則很容易遇到問題,因為可以使用常規 64b 加載來加載兩半.如果您需要與其他線程同步而不僅僅是原子性,我不知道您會在其他架構上看到什么問題.

On x86-64, an atomic 16B load requires a lock cmpxchg16b (which is a full memory barrier, preventing a preceding narrow store from becoming globally visible after it). But you could easily have a problem if you were using this with 32-bit pointers (or 32-bit array indices), since both halves could be loaded with a regular 64b load. And I have no idea what problems you could see on other architectures, if you need synchronization with other threads and not just atomicity.

要了解有關 std::memory_order 獲取和釋放的更多信息,請參閱 Jeff Preshing 的優秀文章.

這篇關于如何使用 c++11 CAS 實現 ABA 計數器?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

What is the fastest way to transpose a matrix in C++?(在 C++ 中轉置矩陣的最快方法是什么?)
Sorting zipped (locked) containers in C++ using boost or the STL(使用 boost 或 STL 在 C++ 中對壓縮(鎖定)容器進行排序)
Rotating a point about another point (2D)(圍繞另一個點旋轉一個點 (2D))
Image Processing: Algorithm Improvement for #39;Coca-Cola Can#39; Recognition(圖像處理:Coca-Cola Can 識別的算法改進)
How do I construct an ISO 8601 datetime in C++?(如何在 C++ 中構建 ISO 8601 日期時間?)
Sort list using STL sort function(使用 STL 排序功能對列表進行排序)
主站蜘蛛池模板: 口臭的治疗方法,口臭怎么办,怎么除口臭,口臭的原因-口臭治疗网 | 十二星座查询(性格特点分析、星座运势解读) - 玄米星座网 | 磁力反应釜,高压釜,实验室反应釜,高温高压反应釜-威海自控反应釜有限公司 | 挤塑板-XPS挤塑板-挤塑板设备厂家[襄阳欧格] | Trimos测长机_测高仪_TESA_mahr,WYLER水平仪,PWB对刀仪-德瑞华测量技术(苏州)有限公司 | 澳洁干洗店加盟-洗衣店干洗连锁「澳洁干洗免费一对一贴心服务」 干洗加盟网-洗衣店品牌排行-干洗设备价格-干洗连锁加盟指南 | 冲击式破碎机-冲击式制砂机-移动碎石机厂家_青州市富康机械有限公司 | 光伏家 - 太阳能光伏发电_分布式光伏发电_太阳能光伏网 | 碎石机设备-欧版反击破-欧版颚式破碎机(站)厂家_山东奥凯诺机械 高低温试验箱-模拟高低温试验箱订制-北京普桑达仪器科技有限公司【官网】 | 不锈钢/气体/液体玻璃转子流量计(防腐,选型,规格)-常州天晟热工仪表有限公司【官网】 | 抓斗式清污机|螺杆式|卷扬式启闭机|底轴驱动钢坝|污水处理闸门-方源水利机械 | 实体店商新零售|微赢|波后|波后合作|微赢集团 | 上海单片机培训|重庆曙海培训分支机构—CortexM3+uC/OS培训班,北京linux培训,Windows驱动开发培训|上海IC版图设计,西安linux培训,北京汽车电子EMC培训,ARM培训,MTK培训,Android培训 | 防腐木批发价格_深圳_惠州_东莞防腐木厂家_森源(深圳)防腐木有限公司 | YJLV22铝芯铠装电缆-MYPTJ矿用高压橡套电缆-天津市电缆总厂 | 四川成都干燥设备_回转筒干燥机_脉冲除尘器_输送设备_热风炉_成都川工星科机电设备有限公司 | 四探针电阻率测试仪-振实密度仪-粉末流动性测定仪-宁波瑞柯微智能 | 卫生型双针压力表-高温防腐差压表-安徽康泰电气有限公司 | 没斑啦-专业的祛斑美白嫩肤知识网站-去斑经验分享 | 昆明网络公司|云南网络公司|昆明网站建设公司|昆明网页设计|云南网站制作|新媒体运营公司|APP开发|小程序研发|尽在昆明奥远科技有限公司 | 工业淬火油烟净化器,北京油烟净化器厂家,热处理油烟净化器-北京众鑫百科 | 福州甲醛检测-福建室内空气检测_环境检测_水质检测-福建中凯检测技术有限公司 | 次氯酸钠厂家,涉水级次氯酸钠,三氯化铁生产厂家-淄博吉灿化工 | 正压密封性测试仪-静态发色仪-导丝头柔软性测试仪-济南恒品机电技术有限公司 | 北京软件开发_软件开发公司_北京软件公司-北京宜天信达软件开发公司 | 六维力传感器_六分量力传感器_模腔压力传感器-南京数智微传感科技有限公司 | 工业车间焊接-整体|集中除尘设备-激光|等离子切割机配套除尘-粉尘烟尘净化治理厂家-山东美蓝环保科技有限公司 | 拖鞋定制厂家-品牌拖鞋代加工厂-振扬实业中国高端拖鞋大型制造商 | 水热合成反应釜-防爆高压消解罐-西安常仪仪器设备有限公司 | ★店家乐|服装销售管理软件|服装店收银系统|内衣店鞋店进销存软件|连锁店管理软件|收银软件手机版|会员管理系统-手机版,云版,App | 理化生实验室设备,吊装实验室设备,顶装实验室设备,实验室成套设备厂家,校园功能室设备,智慧书法教室方案 - 东莞市惠森教学设备有限公司 | 螺钉式热电偶_便携式温度传感器_压簧式热电偶|无锡联泰仪表有限公司|首页 | 冷却塔减速机器_冷却塔皮带箱维修厂家_凉水塔风机电机更换-广东康明冷却塔厂家 | 大型果蔬切片机-水果冬瓜削皮机-洗菜机切菜机-肇庆市凤翔餐饮设备有限公司 | TYPE-C厂家|TYPE-C接口|TYPE-C防水母座|TYPE-C贴片-深圳步步精 | PCB厂|线路板厂|深圳线路板厂|软硬结合板厂|电路板生产厂家|线路板|深圳电路板厂家|铝基板厂家|深联电路-专业生产PCB研发制造 | 河北中仪伟创试验仪器有限公司是专业生产沥青,土工,水泥,混凝土等试验仪器的厂家,咨询电话:13373070969 | 衬氟止回阀_衬氟闸阀_衬氟三通球阀_衬四氟阀门_衬氟阀门厂-浙江利尔多阀门有限公司 | 板框压滤机-隔膜压滤机-厢式压滤机生产厂家-禹州市君工机械设备有限公司 | 不锈钢拉手厂家|浴室门拉手厂家|江门市蓬江区金志翔五金制品有限公司 | 耳模扫描仪-定制耳机设计软件-DLP打印机-asiga打印机-fitshape「飞特西普」 |