亚洲乱亚洲乱妇24p,久久精品7亚洲午夜a,亚洲Av无码国产情品久久

求整數(shù)中比特為1的二進制位數(shù) （轉）

求整數(shù)中比特為1的二進制位數(shù)

分類： C/C++ 2009-11-14 01:05 2306人閱讀評論(3) 收藏舉報

performance 匯編測試 methods iostream express

好幾次在CSDN上看到別人討論如何求出一個整數(shù)的二進制表示中狀態(tài)為1的比特位數(shù)。今天寫了個程序把從網(wǎng)上看來的加上自己想出來的共有5種方法測試了一下，覺得好玩，就寫了這篇博客。

首先簡單介紹一下這五種方法。

第一種：最簡單的，通過移位操作逐位測試并計數(shù)，不妨稱為“逐位測試法”；

第二種：注意到對于“單比特二進制”而言，其狀態(tài)與數(shù)值“相同”。即對于單個比特的“數(shù)”而言，為0即表示數(shù)值0，“全部為1”即表示數(shù)值1（注意多比特數(shù)值顯然沒有這個特點，比如一個字節(jié)有8個比特，當8個比特全為1時并不代表整數(shù)8，而代表255）。利用這一特點，從單個比特開始，以相鄰的一位、兩位、四位、八位和十六位為分組，一組一組地相加并逐步累計，最終得出結果；不妨稱為“分組統(tǒng)計法”；

第三種：注意到一個整數(shù)減1的會把最后一位1變成0，而其后的0全變成1，利用這一特點，把一個數(shù)不斷與它減1后的結果做“按位與”，直到它變成0，那么循環(huán)的次數(shù)便是其狀態(tài)為1的比特位數(shù)。不妨稱之為“循環(huán)減一相與法”；

第四種：考虛到多數(shù)處理器都提供“找到第一個為1的比特位”這種指令，在我的PC上直接調(diào)用X86處理器的BSF或BSR指令，每次直接找到第一個不為0的位并消掉，直到該數(shù)變成0，那么循環(huán)的次數(shù)即為狀態(tài)為1的比特位數(shù)。這種不妨稱為“匯編嵌入法”；

第五種，一個字節(jié)一共有256種狀態(tài)，將每一個取值所對應的0比特位數(shù)的事先寫在程序中（注意這些數(shù)值是有規(guī)律的），也耗不了太多內(nèi)存，然后程序運行的時候，把整數(shù)的四個字節(jié)拆開逐字節(jié)查表并相加，這種可稱為“查表法”。

以下是程序。程序中對應以上五種方法的函數(shù)分別命名為f1到f5。另外還有三個函數(shù)，correctness_test通過幾個簡單而又特殊的取值測試各個函數(shù)的正確性，相當于一個小單元測試；performance_test則分別將這5個函數(shù)作用于一億個隨機整數(shù)同進行性能測試；prepare_test_data則是準備1億個隨機整數(shù)的程序（這個函數(shù)實際并沒有為測試數(shù)據(jù)提供足夠的隨機性，但這一點對性能測試的結果應該沒有太大影響）

[cpp] view plain copy print ?

#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
int f1(unsigned int num);
int f2(unsigned int num);
int f3(unsigned int num);
int f4(unsigned int num);
int f5(unsigned int num);
void correctness_test();
void performance_test();
void prepare_test_data(unsigned int data[], int size);
int main() {
correctness_test();
performance_test();
return 0;
}
int f1(unsigned int num) {
int count = 0;
while(num) {
if(num & 1) ++count;
num >>= 1;
}
return count;
}
int f2(unsigned int num) {
const unsigned int M1 = 0x55555555;
const unsigned int M2 = 0x33333333;
const unsigned int M4 = 0x0f0f0f0f;
const unsigned int M8 = 0x00ff00ff;
const unsigned int M16 = 0x0000ffff;
num = (num & M1) + ((num >> 1) & M1);
num = (num & M2) + ((num >> 2) & M2);
num = (num & M4) + ((num >> 4) & M4);
num = (num & M8) + ((num >> 8) & M8);
num = (num & M16) + ((num >> 16) & M16);
return num;
}
int f3(unsigned int num) {
int count = 0;
while(num) {
num &= (num - 1);
++count;
}
return count;
}
int f4(unsigned int num) {
int count = 0;
while(num) {
int n;
__asm {
bsr eax, num
mov n, eax
}
++count;
num ^= (1 << n);
}
return count;
}
int f5(unsigned int num) {
static const signed char TABLE[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
};
unsigned char* p = reinterpret_cast<unsigned char*>(&num);
int count = 0;
while(p != reinterpret_cast<unsigned char*>(&num + 1)) {
count += TABLE[*p++];
}
return count;
}
void correctness_test() {
unsigned int test_data[] = {0, 1, 2, 3, 0x01234567, 0x89abcdef, 0xffffffff};
unsigned int corect_result[] = {0, 1, 1, 2, 12, 20, 32};
int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
for(int j = 0; j < sizeof(test_data) / sizeof(*test_data); ++j) {
if(fn[i](test_data[j]) != corect_result[j]) {
cout << "f" << (i + 1) << " failed!" << endl;
exit(-1);
}
}
}
cout << "All methods passed correctness test." << endl;
}
void performance_test() {
const int TEST_DATA_SIZE = 100000000;
unsigned int* test_data = new unsigned int[TEST_DATA_SIZE];
prepare_test_data(test_data, TEST_DATA_SIZE);
int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
clock_t start = clock();
for(int j = 0; j < TEST_DATA_SIZE; ++j) {
fn[i](test_data[j]);
}
int ticks = clock() - start;
double seconds = ticks * 1.0 / CLOCKS_PER_SEC;
cout << "f" << (i + 1) << " consumed " << seconds << " seconds." << endl;
}
delete[] test_data;
}
void prepare_test_data(unsigned int* data, int len) {
srand(0);
for(int i = 0; i < len; ++i) {
data[i] = static_cast<unsigned int>(rand() * 1.0 / RAND_MAX * 0xffffffff);
}
}

#include <iostream> 
#include <cstdlib>
#include <ctime>
using namespace std;

int f1(unsigned int num);
int f2(unsigned int num);
int f3(unsigned int num);
int f4(unsigned int num);
int f5(unsigned int num);

void correctness_test();
void performance_test();
void prepare_test_data(unsigned int data[], int size);

int main() {
	correctness_test();
	performance_test();
	return 0; 
}

int f1(unsigned int num) {
	int count = 0;
	while(num) {
		if(num & 1) ++count;
		num >>= 1;
	}
	return count;
}

int f2(unsigned int num) {
	const unsigned int M1 = 0x55555555;
	const unsigned int M2 = 0x33333333;
	const unsigned int M4 = 0x0f0f0f0f;
	const unsigned int M8 = 0x00ff00ff;
	const unsigned int M16 = 0x0000ffff;

num = (num & M1) + ((num >> 1) & M1);
	num = (num & M2) + ((num >> 2) & M2);
	num = (num & M4) + ((num >> 4) & M4);
	num = (num & M8) + ((num >> 8) & M8);
	num = (num & M16) + ((num >> 16) & M16);
	return num;
}

int f3(unsigned int num) {
	int count = 0;
	while(num) {
		num &= (num - 1);
		++count;
	}
	return count;
}

int f4(unsigned int num) {
	int count = 0;
	while(num) {
		int n;
		__asm {
			bsr eax, num
			mov n, eax
		}
		++count;
		num ^= (1 << n);
	}
	return count;
}

int f5(unsigned int num) {
	static const signed char TABLE[256] = { 
		0, 1, 1, 2, 1, 2, 2, 3,	1, 2, 2, 3, 2, 3, 3, 4,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
		4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
	};

unsigned char* p = reinterpret_cast<unsigned char*>(&num);
	int count = 0;
	while(p != reinterpret_cast<unsigned char*>(&num + 1)) {
		count += TABLE[*p++];		
	}
	return count;
}

void correctness_test() {
	unsigned int test_data[] = {0, 1, 2, 3, 0x01234567, 0x89abcdef, 0xffffffff};
	unsigned int corect_result[] = {0, 1, 1, 2, 12, 20, 32};

int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
	for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
		for(int j = 0; j < sizeof(test_data) / sizeof(*test_data); ++j) {
			if(fn[i](test_data[j]) != corect_result[j]) {
				cout << "f" << (i + 1) << " failed!" << endl;
				exit(-1);
			}
		}
	}
	cout << "All methods passed correctness test." << endl;
}

void performance_test() {
	const int TEST_DATA_SIZE = 100000000;
	unsigned int* test_data = new unsigned int[TEST_DATA_SIZE];
	prepare_test_data(test_data, TEST_DATA_SIZE);

int (*fn[])(unsigned int) = {f1, f2, f3, f4, f5};
	for(int i = 0; i < sizeof(fn) / sizeof(*fn); ++i) {
		clock_t start = clock();
		for(int j = 0; j < TEST_DATA_SIZE; ++j) {
			fn[i](test_data[j]);
		}
		int ticks = clock() - start;
		double seconds = ticks * 1.0 / CLOCKS_PER_SEC;

cout << "f" << (i + 1) << " consumed " << seconds << " seconds." << endl;
	}
	delete[] test_data;
}

void prepare_test_data(unsigned int* data, int len) {
	srand(0);
	for(int i = 0; i < len; ++i) {
		data[i] = static_cast<unsigned int>(rand() * 1.0 / RAND_MAX * 0xffffffff);
	}
}

在我的機器上（AMD Phenom 8560處理器，Windows XP SP2），使用Visual C++ 2008 Express Edition編譯并運行（Release版），某一次得到的輸出如下：

All methods passed correctness test.

f1 consumed 14.156 seconds.

f2 consumed 1.032 seconds.

f3 consumed 4.656 seconds.

f4 consumed 13.687 seconds.

f5 consumed 1.422 seconds.

從結果來看，最慢的是第一種“逐位測試法”，最快的是第二種“分組統(tǒng)計法”。兩者相差近14倍！

第三種“循環(huán)減一相與法”表現(xiàn)也很不錯，雖然跟最快的相比遜色很多，但比最慢的強多了；

第四種“匯編嵌入法”，表面上看，其復雜度是跟數(shù)值中1的位數(shù)相關，這一點與方法三一樣。而不像方法一那樣復雜度跟整個數(shù)的位數(shù)有關。但其表現(xiàn)并不令人滿意，結果幾乎跟方法一一樣差，而無法跟方法三相比。查了一下指令手冊，發(fā)現(xiàn)BSR指令并不是一條固定周期的指令，作用于32位整數(shù)時，快的時候它只需幾個CPU時鐘周期，慢的時候需要40幾個時鐘周期，我想它極有可能是在CPU內(nèi)部通過類似于“逐位查找”的微命令實現(xiàn)的。

第五種“查表法”的表現(xiàn)倒讓人相當滿意，性能逼近方法二。注意我只用了基于字節(jié)編碼的表，如果實際使用中需要這種運算，我們完全可以構造一個基于unsigned short編碼的表，這樣一個表將占用64K內(nèi)存，在現(xiàn)代PC上仍屬小菜一碟，而每個32位數(shù)只需要把前后兩半查表的結果一加即可。那樣一來，其性能會不會比方法二還要強呢？有興趣的朋友可以試試。：P

最后，或許有朋友會問：第四種方法中既然采用嵌入?yún)R編，為何不把整個函數(shù)都用匯編來寫呢？那樣或許效率還會好一些。但那對其它幾種方法來說是不公平的，因為所有的方法都可以改用匯編來寫。所以，在這個函數(shù)中我只是把依賴于特定處理器（X86）、且無法使用C++語言及其標準庫直接實現(xiàn)的部分用匯編實現(xiàn)，其它的計算仍然用C++語言寫出。剩下的事情，跟其它幾種方法的實現(xiàn)一樣——讓編譯器看著辦吧，它愛怎么優(yōu)化就怎么優(yōu)化。

posted on 2013-11-11 18:57 ** 閱讀(230) 評論(0) 編輯收藏

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導航: 博客園 IT新聞 Chat2DB C++博客博問管理

Hopes

求整數(shù)中比特為1的二進制位數(shù) （轉）

求整數(shù)中比特為1的二進制位數(shù)

導航

統(tǒng)計

公告

常用鏈接

留言簿(2)

隨筆檔案

文章分類

文章檔案

新聞檔案

相冊

收藏夾

C#學習

友情鏈接

搜索

最新評論

閱讀排行榜

評論排行榜