pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?

Is inline assembly language slower than native C++ code?(內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?)
本文介紹了內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我試圖比較內(nèi)聯(lián)匯編語言和C++代碼的性能,所以我寫了一個(gè)函數(shù),將兩個(gè)大小為2000的數(shù)組相加100000次.代碼如下:

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

這是main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"
";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

然后我運(yùn)行該程序五次以獲取處理器的周期,這可以看作是時(shí)間.每次我只調(diào)用上面提到的一個(gè)函數(shù).

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

結(jié)果來了.

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

C++版本的功能:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

發(fā)布模式下的 C++ 代碼幾乎比匯編代碼快 3.7 倍.為什么?

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

我猜我寫的匯編代碼沒有GCC生成的那么有效.像我這樣的普通程序員很難寫出比編譯器生成的代碼更快的代碼.這是否意味著我不應(yīng)該相信自己親手編寫的匯編語言的性能,專注于C++而忘記匯編語言?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

推薦答案

是的,大多數(shù)時(shí)候.

首先,您錯(cuò)誤地假設(shè)低級(jí)語言(在本例中為匯編)將始終比高級(jí)語言(在本例中為 C++ 和 C)生成更快的代碼.這不是真的.C 代碼總是比 Java 代碼快嗎?不,因?yàn)檫€有另一個(gè)變量:程序員.您編寫代碼的方式和架構(gòu)細(xì)節(jié)的知識(shí)極大地影響了性能(正如您在本例中所見).

First of all you start from wrong assumption that a low-level language (assembly in this case) will always produce faster code than high-level language (C++ and C in this case). It's not true. Is C code always faster than Java code? No because there is another variable: programmer. The way you write code and knowledge of architecture details greatly influence performance (as you saw in this case).

您可以總是生成一個(gè)示例,其中手工匯編代碼比編譯代碼更好,但通常這是一個(gè)虛構(gòu)的示例或單個(gè)例程,而不是真實(shí)em> 500.000 多行 C++ 代碼的程序).我認(rèn)為編譯器會(huì)在 95% 的情況下生成更好的匯編代碼,并且有時(shí),只有極少數(shù)情況,您可能需要編寫一些簡短的匯編代碼,高度使用,性能關(guān)鍵 例程或當(dāng)您必須訪問您最喜歡的高級(jí)語言未公開的功能時(shí).你想感受一下這種復(fù)雜性嗎?在 SO 上閱讀這個(gè)很棒的答案.

You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 500.000+ lines of C++ code). I think compilers will produce better assembly code 95% times and sometimes, only some rare times, you may need to write assembly code for few, short, highly used, performance critical routines or when you have to access features your favorite high-level language does not expose. Do you want a touch of this complexity? Read this awesome answer here on SO.

為什么會(huì)這樣?

首先,因?yàn)榫幾g器可以進(jìn)行我們甚至無法想象的優(yōu)化(請參閱這個(gè)短列表),他們會(huì)在內(nèi)完成(當(dāng)我們可能需要幾天時(shí)間時(shí)).

First of all because compilers can do optimizations that we can't even imagine (see this short list) and they will do them in seconds (when we may need days).

當(dāng)您在匯編中編碼時(shí),您必須使用明確定義的調(diào)用接口創(chuàng)建明確定義的函數(shù).但是他們可以考慮整個(gè)程序優(yōu)化和過程間優(yōu)化如注冊分配、常量傳播、常見子表達(dá)式消除、指令調(diào)度和其他復(fù)雜的、不明顯的優(yōu)化(Polytope 模型,例如).在 RISC 架構(gòu)上,人們多年前就不再擔(dān)心這個(gè)問題了(例如,指令調(diào)度非常困難)手動(dòng)調(diào)諧)和現(xiàn)代CISC CPU 有很長的管道也是.

When you code in assembly you have to make well-defined functions with a well-defined call interface. However they can take in account whole-program optimization and inter-procedural optimization such as register allocation, constant propagation, common subexpression elimination, instruction scheduling and other complex, not obvious optimizations (Polytope model, for example). On RISC architecture guys stopped worrying about this many years ago (instruction scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

對(duì)于一些復(fù)雜的微控制器,甚至系統(tǒng)庫都是用 C 語言編寫的,而不是用匯編語言編寫的,因?yàn)樗鼈兊木幾g器會(huì)生成更好(且易于維護(hù))的最終代碼.

For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

編譯器有時(shí)可以自行自動(dòng)使用一些 MMX/SIMDx 指令,如果您不要使用它們你根本無法比較(其他答案已經(jīng)很好地審查了你的匯編代碼).僅用于循環(huán),這是一個(gè)循環(huán)優(yōu)化的簡短列表常見 由編譯器檢查(當(dāng) C# 程序的日程安排已經(jīng)確定后,你認(rèn)為你可以自己做嗎?)如果你用匯編寫一些東西,我認(rèn)為你至少必須考慮一些 簡單優(yōu)化.數(shù)組的教科書示例是展開循環(huán)(其大小在編譯時(shí)已知).這樣做并再次運(yùn)行您的測試.

Compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (other answers already reviewed your assembly code very well). Just for loops this is a short list of loop optimizations of what is commonly checked for by a compiler (do you think you could do it by yourself when your schedule has been decided for a C# program?) If you write something in assembly, I think you have to consider at least some simple optimizations. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again.

如今,由于另一個(gè)原因需要使用匯編語言也非常罕見:過多的不同CPU.你想支持他們嗎?每個(gè)都有一個(gè)特定的微架構(gòu)和一些特定指令集.它們具有不同數(shù)量的功能單元,應(yīng)安排匯編指令以保持它們.如果您用 C 編寫,您可以使用 PGO 但在匯編中,您將需要豐富的知識(shí)特定架構(gòu)(以及為另一個(gè)架構(gòu)重新思考和重做一切).對(duì)于小任務(wù),編譯器通常做得更好,而對(duì)于復(fù)雜任務(wù)通常,工作沒有得到回報(bào)(并且 編譯器可能做得更好.

These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs. Do you want to support them all? Each has a specific microarchitecture and some specific instruction sets. They have different number of functional units and assembly instructions should be arranged to keep them all busy. If you write in C you may use PGO but in assembly you will then need a great knowledge of that specific architecture (and rethink and redo everything for another architecture). For small tasks the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).

如果你坐下來看看你的代碼,你可能會(huì)發(fā)現(xiàn)重新設(shè)計(jì)算法比轉(zhuǎn)換為匯編會(huì)獲得更多(閱讀這篇這里是SO的好帖子),您可以在之前有效地應(yīng)用高級(jí)優(yōu)化(和編譯器提示)你需要求助于匯編語言.可能值得一提的是,經(jīng)常使用內(nèi)在函數(shù)可以獲得您正在尋找的性能提升,并且編譯器仍然能夠執(zhí)行大部分優(yōu)化.

If you sit down and you take a look at your code probably you'll see that you'll gain more to redesign your algorithm than to translate to assembly (read this great post here on SO), there are high-level optimizations (and hints to compiler) you can effectively apply before you need to resort to assembly language. It's probably worth to mention that often using intrinsics you will have performance gain your're looking for and compiler will still be able to perform most of its optimizations.

綜上所述,即使您可以生成快 5 到 10 倍的匯編代碼,您也應(yīng)該詢問您的客戶他們是否愿意支付一周您的時(shí)間購買速度快 50 美元的 CPU.我們大多數(shù)人通常不需要極端優(yōu)化(尤其是在 LOB 應(yīng)用程序中).

All this said, even when you can produce a 5~10 times faster assembly code, you should ask your customers if they prefer to pay one week of your time or to buy a 50$ faster CPU. Extreme optimization more often than not (and especially in LOB applications) is simply not required from most of us.

這篇關(guān)于內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

How can I read and manipulate CSV file data in C++?(如何在 C++ 中讀取和操作 CSV 文件數(shù)據(jù)?)
In C++ why can#39;t I write a for() loop like this: for( int i = 1, double i2 = 0; (在 C++ 中,為什么我不能像這樣編寫 for() 循環(huán): for( int i = 1, double i2 = 0;)
How does OpenMP handle nested loops?(OpenMP 如何處理嵌套循環(huán)?)
Reusing thread in loop c++(在循環(huán) C++ 中重用線程)
Precise thread sleep needed. Max 1ms error(需要精確的線程睡眠.最大 1ms 誤差)
Is there ever a need for a quot;do {...} while ( )quot; loop?(是否需要“do {...} while ()?環(huán)形?)
主站蜘蛛池模板: 众品地板网-地板品牌招商_地板装修设计_地板门户的首选网络媒体。 | 电子万能试验机_液压拉力试验机_冲击疲劳试验机_材料试验机厂家-济南众标仪器设备有限公司 | 铸铁平台,大理石平台专业生产厂家_河北-北重机械 | 乳化沥青设备_改性沥青设备_沥青加温罐_德州市昊通路桥工程有限公司 | 地埋式垃圾站厂家【佳星环保】小区压缩垃圾中转站转运站 | 物流之家新闻网-最新物流新闻|物流资讯|物流政策|物流网-匡匡奈斯物流科技 | 日本东丽膜_反渗透膜_RO膜价格_超滤膜_纳滤膜-北京东丽阳光官网 日本细胞免疫疗法_肿瘤免疫治疗_NK细胞疗法 - 免疫密码 | 酒吧霸屏软件_酒吧霸屏系统,酒吧微上墙,夜场霸屏软件,酒吧点歌软件,酒吧互动游戏,酒吧大屏幕软件系统下载 | 插针变压器-家用电器变压器-工业空调变压器-CD型电抗器-余姚市中驰电器有限公司 | 利浦顿蒸汽发生器厂家-电蒸汽发生器/燃气蒸汽发生器_湖北利浦顿热能科技有限公司官网 | 安徽合肥项目申报咨询公司_安徽合肥高新企业项目申报_安徽省科技项目申报代理 | 热回收盐水机组-反应釜冷水机组-高低温冷水机组-北京蓝海神骏科技有限公司 | 大立教育官网-一级建造师培训-二级建造师培训-造价工程师-安全工程师-监理工程师考试培训 | 冲击式破碎机-冲击式制砂机-移动碎石机厂家_青州市富康机械有限公司 | 沈阳液压泵_沈阳液压阀_沈阳液压站-沈阳海德太科液压设备有限公司 | 沈阳庭院景观设计_私家花园_别墅庭院设计_阳台楼顶花园设计施工公司-【沈阳现代时园艺景观工程有限公司】 | 电伴热系统施工_仪表电伴热保温箱厂家_沃安电伴热管缆工业技术(济南)有限公司 | 永嘉县奥阳陶瓷阀门有限公司| 品牌策划-品牌设计-济南之式传媒广告有限公司官网-提供品牌整合丨影视创意丨公关活动丨数字营销丨自媒体运营丨数字营销 | 400电话_400电话申请_866元/年_【400电话官方业务办理】-俏号网 3dmax渲染-效果图渲染-影视动画渲染-北京快渲科技有限公司 | 烟气换热器_GGH烟气换热器_空气预热器_高温气气换热器-青岛康景辉 | 光泽度计_测量显微镜_苏州压力仪_苏州扭力板手维修-苏州日升精密仪器有限公司 | TPU薄膜_TPU薄膜生产厂家_TPU热熔胶膜厂家定制_鑫亘环保科技(深圳)有限公司 | 桁架机器人_桁架机械手_上下料机械手_数控车床机械手-苏州清智科技装备制造有限公司 | 【星耀裂变】_企微SCRM_任务宝_视频号分销裂变_企业微信裂变增长_私域流量_裂变营销 | 上海橡胶接头_弹簧减震器_金属软接头厂家-上海淞江集团 | 基业箱_环网柜_配电柜厂家_开关柜厂家_开关断路器-东莞基业电气设备有限公司 | 高温热泵烘干机,高温烘干热泵,热水设备机组_正旭热泵 | 山东螺杆空压机,烟台空压机,烟台开山空压机-烟台开山机电设备有限公司 | 校园文化空间设计-数字化|中医文化空间设计-党建|法治廉政主题文化空间施工-山东锐尚文化传播公司 | 广东高华家具-公寓床|学生宿舍双层铁床厂家【质保十年】 | 重庆小面培训_重庆小面技术培训学习班哪家好【终身免费复学】 | 合肥角钢_合肥槽钢_安徽镀锌管厂家-昆瑟商贸有限公司 | 非标压力容器_碳钢储罐_不锈钢_搪玻璃反应釜厂家-山东首丰智能环保装备有限公司 | 山东信蓝建设有限公司官网 | 防腐储罐_塑料储罐_PE储罐厂家_淄博富邦滚塑防腐设备科技有限公司 | 纯水电导率测定仪-万用气体检测仪-低钠测定仪-米沃奇科技(北京)有限公司www.milwaukeeinst.cn 锂辉石检测仪器,水泥成分快速分析仪-湘潭宇科分析仪器有限公司 手术室净化装修-手术室净化工程公司-华锐手术室净化厂家 | 广西绿桂涂料--承接隔热涂料、隔音涂料、真石漆、多彩仿石漆等涂料工程双包施工 | 圆盘鞋底注塑机_连帮鞋底成型注塑机-温州天钢机械有限公司 | 天坛家具官网 | 耐磨陶瓷,耐磨陶瓷管道_厂家-淄博拓创陶瓷科技|