-->

Fastest way to sort a list of number and their ind

2020-06-14 00:51发布

问题:

I have a question that could seem very basic, but it is in a context where "every CPU tick counts" (this is a part of a larger algorithm that will be used on supercomputers).

The problem is quite simple : what is the fastest way to sort a list of unsigned long long int numbers and their original indexes ? (At the beginning, the unsigned long long int numbers are in a completely random order.)

Example :
Before
Numbers: 32 91 11 72
Indexes: 0 1 2 3
After
Numbers: 11 32 72 91
Indexes: 2 0 3 1 

By "fastest way", I mean : what algorithm to use : std::sort, C qsort, or another sorting algorithm available on the web ? What container to use (C array, std::vector, std::map...) ? How to sort the indexes at the same time (use structures, std::pair, std::map...) ?

How many element to sort ? -> typically 4Go of numbers

回答1:

The obvious starting point would be a structure with operator< defined for it:

struct data { 
    unsigned long long int number;
    size_t index;
};

struct by_number { 
    bool operator()(data const &left, data const &right) { 
        return left.number < right.number;
    }
};

...and an std::vector to hold the data:

 std::vector<data> items;

and std::sort to do the sorting:

 std::sort(items.begin(), items.end(), by_number());

The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.

Edit: of course in C++11, you can use a lambda expression instead:

std::sort(items.begin(), items.end(), 
          [](data const &a, data const &b) { return a.number < b.number; });

This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say sort ... by_number is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.



回答2:

std::pair and std::sort fit your requirements ideally: if you put the value into the pair.first and the index in pair.second, you can simply call a sort on a vector of pairs, like this:

// This is your original data. It does not need to be in a vector
vector<long> orig;
orig.push_back(10);
orig.push_back(3);
orig.push_back(6);
orig.push_back(11);
orig.push_back(2);
orig.push_back(19);
orig.push_back(7);
// This is a vector of {value,index} pairs
vector<pair<long,size_t> > vp;
vp.reserve(orig.size());
for (size_t i = 0 ; i != orig.size() ; i++) {
    vp.push_back(make_pair(orig[i], i));
}
// Sorting will put lower values ahead of larger ones,
// resolving ties using the original index
sort(vp.begin(), vp.end());
for (size_t i = 0 ; i != vp.size() ; i++) {
    cout << vp[i].first << " " << vp[i].second << endl;
}


回答3:

std::sort has proven to be faster than the old qsort because of the lack of indirection and the possibility of inlining critical operations.

The implementations of std::sort are likely to be highly optimized and hard to beat, but not impossible. If your data is fixed length and short you might find Radix sort to be faster. Timsort is relatively new and has delivered good results for Python.

You might keep the index array separate from the value array, but I think the extra level of indirection will prove to be a speed killer. Better to keep them together in a struct or std::pair.

As always with any speed critical application, you must try some actual implementations and compare them to know for sure which is fastest.



回答4:

It might be worth separating numbers and indexes and then just sorting indexes, like this:

#include <vector>
#include <algorithm>
#include <iostream>

void PrintElements(const std::vector<unsigned long long>& numbers, const std::vector<size_t>& indexes) {

    std::cout << "\tNumbers:";
    for (auto i = indexes.begin(); i != indexes.end(); ++i)
        std::cout << '\t' << numbers[*i];
    std::cout << std::endl;

    std::cout << "\tIndexes:";
    for (auto i = indexes.begin(); i != indexes.end(); ++i)
        std::cout << '\t' << *i;
    std::cout << std::endl;

}

int main() {

    std::vector<unsigned long long> numbers;
    std::vector<size_t> indexes;

    numbers.reserve(4); // An overkill for this few elements, but important for billions.
    numbers.push_back(32);
    numbers.push_back(91);
    numbers.push_back(11);
    numbers.push_back(72);

    indexes.reserve(numbers.capacity());
    indexes.push_back(0);
    indexes.push_back(1);
    indexes.push_back(2);
    indexes.push_back(3);

    std::cout << "BEFORE:" << std::endl;
    PrintElements(numbers, indexes);

    std::sort(
        indexes.begin(),
        indexes.end(),
        [&numbers](size_t i1, size_t i2) {
            return numbers[i1] < numbers[i2];
        }
    );

    std::cout << "AFTER:" << std::endl;
    PrintElements(numbers, indexes);

    return EXIT_SUCCESS;

}

This prints:

BEFORE:
        Numbers:        32      91      11      72
        Indexes:        0       1       2       3
AFTER:
        Numbers:        11      32      72      91
        Indexes:        2       0       3       1

The idea is that the elements being sorted are small and thus fast to move around during the sort. On modern CPUs however, the effects of indirect access to numbers on caching could spoil these gains, so I recommend benchmarking on realistic amounts of data before making a final decision to use it.



回答5:

struct SomeValue
{
    unsigned long long val;
    size_t index;
    bool operator<(const SomeValue& rhs)const
    { 
       return val < rhs.val;
    }
}

 #include <algorithm>
 std::vector<SomeValue> somevec;
 //fill it...
 std::sort(somevec.begin(),somevec.end());


回答6:

Use std::vector and std::sort. That should provided the fastest sort method. To Find the original index create a struct.

struct A {
    int num;
    int index;
}

Then make your own compare Predicate for sort that compares the num in the struct.

struct Predicate {
    bool operator()(const A first, const A second) {
        return first.num < second.num;
    }
}

std::sort(vec.begin(), vec.end(), Predicate())



回答7:

This will be used on supercomputers?

In that case you may want to look into parallel sorting algorithms. That will only make sense for sorting large data sets, but the win if you need it is substantial.



回答8:

You might find this to be an interesting read. I would start with STL's sort and only then try and improve on it if I could. I'm not sure if you have access to a C++11 compiler (like gcc4.7) on this super computer, but I would suggest that std::sort with std::futures and std::threads would get you quite a bit of the way there with regard to parallelizing the problem in a maintainable way.

Here is another question that compares std::sort with qsort.

Finally, there is this article in Dr. Dobb's that compares the performance of parallel algorithms.