I wonder if hand-crafted assembly procedure with loop operating on two registers wouldn't be faster than any O(1) solutions based on precomputed table kept in RAM...
Probably this will be considered a bad trick, but anyway the popcnt machine instruction introduced in the SSE4 extension does the trick (obviously in constant time)
I wonder if hand-crafted assembly procedure with loop operating on two registers wouldn't be faster than any O(1) solutions based on precomputed table kept in RAM...
ReplyDeleteProbably this will be considered a bad trick, but anyway the popcnt machine instruction introduced in the SSE4 extension does the trick (obviously in constant time)
ReplyDeletehttp://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT