問題描述
我試圖比較內(nèi)聯(lián)匯編語言和C++代碼的性能,所以我寫了一個(gè)函數(shù),將兩個(gè)大小為2000的數(shù)組相加100000次.代碼如下:
I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:
#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
for(int i = 0; i < TIMES; i++)
{
for(int j = 0; j < length; j++)
x[j] += y[j];
}
}
void calcuAsm(int *x,int *y,int lengthOfArray)
{
__asm
{
mov edi,TIMES
start:
mov esi,0
mov ecx,lengthOfArray
label:
mov edx,x
push edx
mov eax,DWORD PTR [edx + esi*4]
mov edx,y
mov ebx,DWORD PTR [edx + esi*4]
add eax,ebx
pop edx
mov [edx + esi*4],eax
inc esi
loop label
dec edi
cmp edi,0
jnz start
};
}
這是main()
:
int main() {
bool errorOccured = false;
setbuf(stdout,NULL);
int *xC,*xAsm,*yC,*yAsm;
xC = new int[2000];
xAsm = new int[2000];
yC = new int[2000];
yAsm = new int[2000];
for(int i = 0; i < 2000; i++)
{
xC[i] = 0;
xAsm[i] = 0;
yC[i] = i;
yAsm[i] = i;
}
time_t start = clock();
calcuC(xC,yC,2000);
// calcuAsm(xAsm,yAsm,2000);
// for(int i = 0; i < 2000; i++)
// {
// if(xC[i] != xAsm[i])
// {
// cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
// errorOccured = true;
// break;
// }
// }
// if(errorOccured)
// cout<<"Error occurs!"<<endl;
// else
// cout<<"Works fine!"<<endl;
time_t end = clock();
// cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"
";
cout<<"time = "<<end - start<<endl;
return 0;
}
然后我運(yùn)行該程序五次以獲取處理器的周期,這可以看作是時(shí)間.每次我只調(diào)用上面提到的一個(gè)函數(shù).
Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.
結(jié)果來了.
Debug Release
---------------
732 668
733 680
659 672
667 675
684 694
Average: 677
C++版本的功能:
Debug Release
-----------------
1068 168
999 166
1072 231
1002 166
1114 183
Average: 182
發(fā)布模式下的 C++ 代碼幾乎比匯編代碼快 3.7 倍.為什么?
The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?
我猜我寫的匯編代碼沒有GCC生成的那么有效.像我這樣的普通程序員很難寫出比編譯器生成的代碼更快的代碼.這是否意味著我不應(yīng)該相信自己親手編寫的匯編語言的性能,專注于C++而忘記匯編語言?
I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?
推薦答案
是的,大多數(shù)時(shí)候.
首先,您錯(cuò)誤地假設(shè)低級(jí)語言(在本例中為匯編)將始終比高級(jí)語言(在本例中為 C++ 和 C)生成更快的代碼.這不是真的.C 代碼總是比 Java 代碼快嗎?不,因?yàn)檫€有另一個(gè)變量:程序員.您編寫代碼的方式和架構(gòu)細(xì)節(jié)的知識(shí)極大地影響了性能(正如您在本例中所見).
First of all you start from wrong assumption that a low-level language (assembly in this case) will always produce faster code than high-level language (C++ and C in this case). It's not true. Is C code always faster than Java code? No because there is another variable: programmer. The way you write code and knowledge of architecture details greatly influence performance (as you saw in this case).
您可以總是生成一個(gè)示例,其中手工匯編代碼比編譯代碼更好,但通常這是一個(gè)虛構(gòu)的示例或單個(gè)例程,而不是真實(shí)em> 500.000 多行 C++ 代碼的程序).我認(rèn)為編譯器會(huì)在 95% 的情況下生成更好的匯編代碼,并且有時(shí),只有極少數(shù)情況,您可能需要編寫一些簡短的匯編代碼,高度使用,性能關(guān)鍵 例程或當(dāng)您必須訪問您最喜歡的高級(jí)語言未公開的功能時(shí).你想感受一下這種復(fù)雜性嗎?在 SO 上閱讀這個(gè)很棒的答案.
You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 500.000+ lines of C++ code). I think compilers will produce better assembly code 95% times and sometimes, only some rare times, you may need to write assembly code for few, short, highly used, performance critical routines or when you have to access features your favorite high-level language does not expose. Do you want a touch of this complexity? Read this awesome answer here on SO.
為什么會(huì)這樣?
首先,因?yàn)榫幾g器可以進(jìn)行我們甚至無法想象的優(yōu)化(請參閱這個(gè)短列表),他們會(huì)在秒內(nèi)完成(當(dāng)我們可能需要幾天時(shí)間時(shí)).
First of all because compilers can do optimizations that we can't even imagine (see this short list) and they will do them in seconds (when we may need days).
當(dāng)您在匯編中編碼時(shí),您必須使用明確定義的調(diào)用接口創(chuàng)建明確定義的函數(shù).但是他們可以考慮整個(gè)程序優(yōu)化和過程間優(yōu)化如注冊分配、常量傳播、常見子表達(dá)式消除、指令調(diào)度和其他復(fù)雜的、不明顯的優(yōu)化(Polytope 模型,例如).在 RISC 架構(gòu)上,人們多年前就不再擔(dān)心這個(gè)問題了(例如,指令調(diào)度非常困難)手動(dòng)調(diào)諧)和現(xiàn)代CISC CPU 有很長的管道也是.
When you code in assembly you have to make well-defined functions with a well-defined call interface. However they can take in account whole-program optimization and inter-procedural optimization such as register allocation, constant propagation, common subexpression elimination, instruction scheduling and other complex, not obvious optimizations (Polytope model, for example). On RISC architecture guys stopped worrying about this many years ago (instruction scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.
對(duì)于一些復(fù)雜的微控制器,甚至系統(tǒng)庫都是用 C 語言編寫的,而不是用匯編語言編寫的,因?yàn)樗鼈兊木幾g器會(huì)生成更好(且易于維護(hù))的最終代碼.
For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.
編譯器有時(shí)可以自行自動(dòng)使用一些 MMX/SIMDx 指令,如果您不要使用它們你根本無法比較(其他答案已經(jīng)很好地審查了你的匯編代碼).僅用于循環(huán),這是一個(gè)循環(huán)優(yōu)化的簡短列表常見 由編譯器檢查(當(dāng) C# 程序的日程安排已經(jīng)確定后,你認(rèn)為你可以自己做嗎?)如果你用匯編寫一些東西,我認(rèn)為你至少必須考慮一些 簡單優(yōu)化.數(shù)組的教科書示例是展開循環(huán)(其大小在編譯時(shí)已知).這樣做并再次運(yùn)行您的測試.
Compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (other answers already reviewed your assembly code very well).
Just for loops this is a short list of loop optimizations of what is commonly checked for by a compiler (do you think you could do it by yourself when your schedule has been decided for a C# program?) If you write something in assembly, I think you have to consider at least some simple optimizations. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again.
如今,由于另一個(gè)原因需要使用匯編語言也非常罕見:過多的不同CPU.你想支持他們嗎?每個(gè)都有一個(gè)特定的微架構(gòu)和一些特定指令集.它們具有不同數(shù)量的功能單元,應(yīng)安排匯編指令以保持它們忙.如果您用 C 編寫,您可以使用 PGO 但在匯編中,您將需要豐富的知識(shí)特定架構(gòu)(以及為另一個(gè)架構(gòu)重新思考和重做一切).對(duì)于小任務(wù),編譯器通常做得更好,而對(duì)于復(fù)雜任務(wù)通常,工作沒有得到回報(bào)(并且 編譯器可能做得更好.
These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs. Do you want to support them all? Each has a specific microarchitecture and some specific instruction sets. They have different number of functional units and assembly instructions should be arranged to keep them all busy. If you write in C you may use PGO but in assembly you will then need a great knowledge of that specific architecture (and rethink and redo everything for another architecture). For small tasks the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).
如果你坐下來看看你的代碼,你可能會(huì)發(fā)現(xiàn)重新設(shè)計(jì)算法比轉(zhuǎn)換為匯編會(huì)獲得更多(閱讀這篇這里是SO的好帖子),您可以在之前有效地應(yīng)用高級(jí)優(yōu)化(和編譯器提示)你需要求助于匯編語言.可能值得一提的是,經(jīng)常使用內(nèi)在函數(shù)可以獲得您正在尋找的性能提升,并且編譯器仍然能夠執(zhí)行大部分優(yōu)化.
If you sit down and you take a look at your code probably you'll see that you'll gain more to redesign your algorithm than to translate to assembly (read this great post here on SO), there are high-level optimizations (and hints to compiler) you can effectively apply before you need to resort to assembly language. It's probably worth to mention that often using intrinsics you will have performance gain your're looking for and compiler will still be able to perform most of its optimizations.
綜上所述,即使您可以生成快 5 到 10 倍的匯編代碼,您也應(yīng)該詢問您的客戶他們是否愿意支付一周您的時(shí)間或購買速度快 50 美元的 CPU.我們大多數(shù)人通常不需要極端優(yōu)化(尤其是在 LOB 應(yīng)用程序中).
All this said, even when you can produce a 5~10 times faster assembly code, you should ask your customers if they prefer to pay one week of your time or to buy a 50$ faster CPU. Extreme optimization more often than not (and especially in LOB applications) is simply not required from most of us.
這篇關(guān)于內(nèi)聯(lián)匯編語言是否比本機(jī) C++ 代碼慢?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!