pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?

Is inline assembly language slower than native C++ code?(內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?)
本文介紹了內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我試圖比較內(nèi)聯(lián)匯編語言和C++代碼的性能,所以我寫了一個(gè)函數(shù),將兩個(gè)大小為2000的數(shù)組相加100000次.代碼如下:

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

這是main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"
";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

然后我運(yùn)行該程序五次以獲取處理器的周期,這可以看作是時(shí)間.每次我只調(diào)用上面提到的一個(gè)函數(shù).

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

結(jié)果來了.

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

C++版本的功能:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

發(fā)布模式下的 C++ 代碼幾乎比匯編代碼快 3.7 倍.為什么?

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

我猜我寫的匯編代碼沒有GCC生成的那么有效.像我這樣的普通程序員很難寫出比編譯器生成的代碼更快的代碼.這是否意味著我不應(yīng)該相信自己親手編寫的匯編語言的性能,專注于C++而忘記匯編語言?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

推薦答案

是的,大多數(shù)時(shí)候.

首先,您錯(cuò)誤地假設(shè)低級(jí)語言(在本例中為匯編)將始終比高級(jí)語言(在本例中為 C++ 和 C)生成更快的代碼.這不是真的.C 代碼總是比 Java 代碼快嗎?不,因?yàn)檫€有另一個(gè)變量:程序員.您編寫代碼的方式和架構(gòu)細(xì)節(jié)的知識(shí)極大地影響了性能(正如您在本例中所見).

First of all you start from wrong assumption that a low-level language (assembly in this case) will always produce faster code than high-level language (C++ and C in this case). It's not true. Is C code always faster than Java code? No because there is another variable: programmer. The way you write code and knowledge of architecture details greatly influence performance (as you saw in this case).

您可以總是生成一個(gè)示例,其中手工匯編代碼比編譯代碼更好,但通常這是一個(gè)虛構(gòu)的示例或單個(gè)例程,而不是真實(shí)em> 500.000 多行 C++ 代碼的程序).我認(rèn)為編譯器會(huì)在 95% 的情況下生成更好的匯編代碼,并且有時(shí),只有極少數(shù)情況,您可能需要編寫一些簡短的匯編代碼,高度使用,性能關(guān)鍵 例程或當(dāng)您必須訪問您最喜歡的高級(jí)語言未公開的功能時(shí).你想感受一下這種復(fù)雜性嗎?在 SO 上閱讀這個(gè)很棒的答案.

You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 500.000+ lines of C++ code). I think compilers will produce better assembly code 95% times and sometimes, only some rare times, you may need to write assembly code for few, short, highly used, performance critical routines or when you have to access features your favorite high-level language does not expose. Do you want a touch of this complexity? Read this awesome answer here on SO.

為什么會(huì)這樣?

首先,因?yàn)榫幾g器可以進(jìn)行我們甚至無法想象的優(yōu)化(請參閱這個(gè)短列表),他們會(huì)在內(nèi)完成(當(dāng)我們可能需要幾天時(shí)間時(shí)).

First of all because compilers can do optimizations that we can't even imagine (see this short list) and they will do them in seconds (when we may need days).

當(dāng)您在匯編中編碼時(shí),您必須使用明確定義的調(diào)用接口創(chuàng)建明確定義的函數(shù).但是他們可以考慮整個(gè)程序優(yōu)化和過程間優(yōu)化如注冊分配、常量傳播、常見子表達(dá)式消除、指令調(diào)度和其他復(fù)雜的、不明顯的優(yōu)化(Polytope 模型,例如).在 RISC 架構(gòu)上,人們多年前就不再擔(dān)心這個(gè)問題了(例如,指令調(diào)度非常困難)手動(dòng)調(diào)諧)和現(xiàn)代CISC CPU 有很長的管道也是.

When you code in assembly you have to make well-defined functions with a well-defined call interface. However they can take in account whole-program optimization and inter-procedural optimization such as register allocation, constant propagation, common subexpression elimination, instruction scheduling and other complex, not obvious optimizations (Polytope model, for example). On RISC architecture guys stopped worrying about this many years ago (instruction scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

對(duì)于一些復(fù)雜的微控制器,甚至系統(tǒng)庫都是用 C 語言編寫的,而不是用匯編語言編寫的,因?yàn)樗鼈兊木幾g器會(huì)生成更好(且易于維護(hù))的最終代碼.

For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

編譯器有時(shí)可以自行自動(dòng)使用一些 MMX/SIMDx 指令,如果您不要使用它們你根本無法比較(其他答案已經(jīng)很好地審查了你的匯編代碼).僅用于循環(huán),這是一個(gè)循環(huán)優(yōu)化的簡短列表常見 由編譯器檢查(當(dāng) C# 程序的日程安排已經(jīng)確定后,你認(rèn)為你可以自己做嗎?)如果你用匯編寫一些東西,我認(rèn)為你至少必須考慮一些 簡單優(yōu)化.數(shù)組的教科書示例是展開循環(huán)(其大小在編譯時(shí)已知).這樣做并再次運(yùn)行您的測試.

Compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (other answers already reviewed your assembly code very well). Just for loops this is a short list of loop optimizations of what is commonly checked for by a compiler (do you think you could do it by yourself when your schedule has been decided for a C# program?) If you write something in assembly, I think you have to consider at least some simple optimizations. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again.

如今,由于另一個(gè)原因需要使用匯編語言也非常罕見:過多的不同CPU.你想支持他們嗎?每個(gè)都有一個(gè)特定的微架構(gòu)和一些特定指令集.它們具有不同數(shù)量的功能單元,應(yīng)安排匯編指令以保持它們.如果您用 C 編寫,您可以使用 PGO 但在匯編中,您將需要豐富的知識(shí)特定架構(gòu)(以及為另一個(gè)架構(gòu)重新思考和重做一切).對(duì)于小任務(wù),編譯器通常做得更好,而對(duì)于復(fù)雜任務(wù)通常,工作沒有得到回報(bào)(并且 編譯器可能做得更好.

These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs. Do you want to support them all? Each has a specific microarchitecture and some specific instruction sets. They have different number of functional units and assembly instructions should be arranged to keep them all busy. If you write in C you may use PGO but in assembly you will then need a great knowledge of that specific architecture (and rethink and redo everything for another architecture). For small tasks the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).

如果你坐下來看看你的代碼,你可能會(huì)發(fā)現(xiàn)重新設(shè)計(jì)算法比轉(zhuǎn)換為匯編會(huì)獲得更多(閱讀這篇這里是SO的好帖子),您可以在之前有效地應(yīng)用高級(jí)優(yōu)化(和編譯器提示)你需要求助于匯編語言.可能值得一提的是,經(jīng)常使用內(nèi)在函數(shù)可以獲得您正在尋找的性能提升,并且編譯器仍然能夠執(zhí)行大部分優(yōu)化.

If you sit down and you take a look at your code probably you'll see that you'll gain more to redesign your algorithm than to translate to assembly (read this great post here on SO), there are high-level optimizations (and hints to compiler) you can effectively apply before you need to resort to assembly language. It's probably worth to mention that often using intrinsics you will have performance gain your're looking for and compiler will still be able to perform most of its optimizations.

綜上所述,即使您可以生成快 5 到 10 倍的匯編代碼,您也應(yīng)該詢問您的客戶他們是否愿意支付一周您的時(shí)間購買速度快 50 美元的 CPU.我們大多數(shù)人通常不需要極端優(yōu)化(尤其是在 LOB 應(yīng)用程序中).

All this said, even when you can produce a 5~10 times faster assembly code, you should ask your customers if they prefer to pay one week of your time or to buy a 50$ faster CPU. Extreme optimization more often than not (and especially in LOB applications) is simply not required from most of us.

這篇關(guān)于內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

How can I read and manipulate CSV file data in C++?(如何在 C++ 中讀取和操作 CSV 文件數(shù)據(jù)?)
In C++ why can#39;t I write a for() loop like this: for( int i = 1, double i2 = 0; (在 C++ 中,為什么我不能像這樣編寫 for() 循環(huán): for( int i = 1, double i2 = 0;)
How does OpenMP handle nested loops?(OpenMP 如何處理嵌套循環(huán)?)
Reusing thread in loop c++(在循環(huán) C++ 中重用線程)
Precise thread sleep needed. Max 1ms error(需要精確的線程睡眠.最大 1ms 誤差)
Is there ever a need for a quot;do {...} while ( )quot; loop?(是否需要“do {...} while ()?環(huán)形?)
主站蜘蛛池模板: 东莞市踏板石餐饮管理有限公司_正宗桂林米粉_正宗桂林米粉加盟_桂林米粉加盟费-东莞市棒子桂林米粉 | 运动木地板厂家,篮球场木地板品牌,体育场馆木地板安装 - 欧氏运动地板 | 济南ISO9000认证咨询代理公司,ISO9001认证,CMA实验室认证,ISO/TS16949认证,服务体系认证,资产管理体系认证,SC食品生产许可证- 济南创远企业管理咨询有限公司 郑州电线电缆厂家-防火|低压|低烟无卤电缆-河南明星电缆 | 质检报告_CE认证_FCC认证_SRRC认证_PSE认证_第三方检测机构-深圳市环测威检测技术有限公司 | 东莞动力锂电池保护板_BMS智能软件保护板_锂电池主动均衡保护板-东莞市倡芯电子科技有限公司 | 耐磨陶瓷,耐磨陶瓷管道_厂家-淄博拓创陶瓷科技 | MTK核心板|MTK开发板|MTK模块|4G核心板|4G模块|5G核心板|5G模块|安卓核心板|安卓模块|高通核心板-深圳市新移科技有限公司 | 连续油炸机,全自动油炸机,花生米油炸机-烟台茂源食品机械制造有限公司 | 送料机_高速冲床送料机_NC伺服滚轮送料机厂家-东莞市久谐自动化设备有限公司 | 油漆辅料厂家_阴阳脚线_艺术漆厂家_内外墙涂料施工_乳胶漆专用防霉腻子粉_轻质粉刷石膏-魔法涂涂 | 亳州网络公司 - 亳州网站制作 - 亳州网站建设 - 亳州易天科技 | 天津蒸汽/热水锅炉-电锅炉安装维修直销厂家-天津鑫淼暖通设备有限公司 | 一体式钢筋扫描仪-楼板测厚仪-裂缝检测仪-泰仕特(北京) | 禹城彩钢厂_钢结构板房_彩钢复合板-禹城泰瑞彩钢复合板加工厂 | 螺旋压榨机-刮泥机-潜水搅拌机-电动泥斗-潜水推流器-南京格林兰环保设备有限公司 | 美能达分光测色仪_爱色丽分光测色仪-苏州方特电子科技有限公司 | 净水器代理,净水器招商,净水器加盟-FineSky德国法兹全屋净水 | 定制/定做衬衫厂家/公司-衬衫订做/订制价格/费用-北京圣达信 | 山楂片_雪花_迷你山楂片_山楂条饼厂家-青州市丰源食品厂 | Win10系统下载_32位/64位系统/专业版/纯净版下载 | 断桥铝破碎机_铝合金破碎机_废铁金属破碎机-河南鑫世昌机械制造有限公司 | 除湿机|工业除湿机|抽湿器|大型地下室车间仓库吊顶防爆除湿机|抽湿烘干房|新风除湿机|调温/降温除湿机|恒温恒湿机|加湿机-杭州川田电器有限公司 | 顶呱呱交易平台-行业领先的公司资产交易服务平台| 在线PH计-氧化锆分析仪-在线浊度仪-在线溶氧仪- 无锡朝达 | 升降机-高空作业车租赁-蜘蛛车-曲臂式伸缩臂剪叉式液压升降平台-脚手架-【普雷斯特公司厂家】 | 诗词大全-古诗名句 - 古诗词赏析| 天津仓库出租网-天津电商仓库-天津云仓一件代发-【博程云仓】 | 美甲贴片-指甲贴片-穿戴美甲-假指甲厂家--薇丝黛拉 | 重庆LED显示屏_显示屏安装公司_重庆LED显示屏批发-彩光科技公司 重庆钣金加工厂家首页-专业定做监控电视墙_操作台 | 礼仪庆典公司,礼仪策划公司,庆典公司,演出公司,演艺公司,年会酒会,生日寿宴,动工仪式,开工仪式,奠基典礼,商务会议,竣工落成,乔迁揭牌,签约启动-东莞市开门红文化传媒有限公司 | 网带通过式抛丸机,,网带式打砂机,吊钩式,抛丸机,中山抛丸机生产厂家,江门抛丸机,佛山吊钩式,东莞抛丸机,中山市泰达自动化设备有限公司 | PCB接线端子_栅板式端子_线路板连接器_端子排生产厂家-置恒电气 喷码机,激光喷码打码机,鸡蛋打码机,手持打码机,自动喷码机,一物一码防伪溯源-恒欣瑞达有限公司 假肢-假肢价格-假肢厂家-河南假肢-郑州市力康假肢矫形器有限公司 | 北京网站建设公司_北京网站制作公司_北京网站设计公司-北京爱品特网站建站公司 | 贵阳用友软件,贵州财务软件,贵阳ERP软件_贵州优智信息技术有限公司 | 恒温振荡混匀器-微孔板振荡器厂家-多管涡旋混匀器厂家-合肥艾本森(www.17world.net) | 永嘉县奥阳陶瓷阀门有限公司| 淘趣英语网 - 在线英语学习,零基础英语学习网站 | 贴片电感_贴片功率电感_贴片绕线电感_深圳市百斯特电子有限公司 贴片电容代理-三星电容-村田电容-风华电容-国巨电容-深圳市昂洋科技有限公司 | 轴流风机-鼓风机-离心风机-散热风扇-罩极电机,生产厂家-首肯电子 | 阿里巴巴诚信通温州、台州、宁波、嘉兴授权渠道商-浙江联欣科技提供阿里会员办理 | 雷蒙磨,雷蒙磨粉机,雷蒙磨机 - 巩义市大峪沟高峰机械厂 |