Looking up a C++ Hash Table with a pre-known hash

Posted by Jannik Glückert on Wed 22 May 2024

A recap of associative containers in C++

Associative containers allow for fast insertion, deletion, and most importantly search of elements.
The C++ standard library has provided associative containers since the beginning, but as with all things in C++, life wasn't always great.

I don't intend to go over all the details, so here's a quick overview:

std::unordered_set and std::unordered_map are by far the most important and most used of the bunch, and the rest of this article is exclusively about them.

Heterogeneous lookup

The initial function signatures for unordered container lookup were pretty simple, take for example find:

template<typename T>
std::unordered_set<T>::iterator std::unordered_set<T>::find(const T &key);

However, simple isn't always best - this requires always constructing an instance of T. The most common culprit is

std::unordered_set<std::string> my_set;
my_set.erase("foo");

This constructs a temporary std::string from the string literal, which is of course unnecessary as std::string can compare to literals directly.

The common solution is to use heterogeneous lookup as described in P0919R3:

struct StringHash {
    using is_transparent = void; // Enables heterogeneous operations.

    std::size_t operator()(std::string_view sv) const {
        return std::hash<std::string_view>{}(sv);
    }
};

void example() {
    std::unordered_set<std::string, StringHash, std::equal_to<>> my_set;

    // Converts "foo" to string_view and uses it for hash and compare.
    my_set.erase("foo");
}

Usage of non-heterogeneous associative string containers is also diagnosed by SonarLint.

However, there is another neat trick you can do, one that I haven't seen mentioned but can be quite helpful in some situations:
The transparent hash (and equality predicate) operators do not have to consume a type that is convertible to the key type.

Passing a hash to the heterogeneous lookup

And with that we get to today's big revelation: With heterogeneous lookup, you can pass a known hash directly!
This is helpful when you need to find an object in one of multiple sets. Where before each set would recompute the hash for itself, which can be quite expensive relative to the lookup, now you can just precompute the hash once and pass it to the lookup functions!

using key_hash_pair = std::tuple<std::string_view, std::size_t>;

struct Hash {
    using is_transparent = void;

    std::size_t operator()(std::string_view sv) const {
        return std::hash<std::string_view>{}(sv);
    }
    std::size_t operator()(key_hash_pair pair) const {
        return std::get<1>(pair);
    }
};

struct KeyEqual {
    using is_transparent = void;

    bool operator()(std::string_view lhs, std::string_view rhs) const {
        return lhs == rhs;
    }
    bool operator()(key_hash_pair lhs, std::string_view rhs) const {
        return std::get<0>(lhs) == rhs;
    }
};

using Set = std::unordered_set<std::string, Hash, KeyEqual>;

int main() {
    Set set{"foo"};

    const std::string string{"foo"};
    const std::size_t hash{std::hash<std::string>{}(string)};
    const key_hash_pair pair{string, hash};

    assert(set.contains(pair));
}

There's one issue with this - nothing prevents us from passing a hash from a different hash function, or reusing the pair with a set of different hash type, or just passing some bogus integer!
A better approach would be to use a rich type for key_hash_pair that takes care of this, such as

template<typename Hash>
class KeyHashPair {
private:
    std::string_view key_;
    std::size_t hash_;
public:
    KeyHashPair() = delete;
    KeyHashPair(std::string_view sv) : key_(sv), hash_(Hash{}(key_)) {}

    std::string_view key() const { return key_; }
    std::size_t hash() const { return hash_; }
};

struct Hash {
    using is_transparent = void;

    std::size_t operator()(std::string_view sv) const {
        return std::hash<std::string_view>{}(sv);
    }
    std::size_t operator()(KeyHashPair<Hash> pair) const {
        return pair.hash();
    }
};

struct KeyEqual {
    using is_transparent = void;

    bool operator()(std::string_view lhs, std::string_view rhs) const {
        return lhs == rhs;
    }
    template<typename Hash>
    bool operator()(KeyHashPair<Hash> lhs, std::string_view rhs) const {
        return lhs.key() == rhs;
    }
};

using Set = std::unordered_set<std::string, Hash, KeyEqual>;

int main() {
    Set set{"foo"};

    const std::string string{"foo"};
    const KeyHashPair<Set::hasher> pair{string};

    assert(set.contains(pair));
}

You can find a godbolt example here.

Closing remarks

Of course, this code is missing various decorators such as noexcept, [[nodiscard]], passing by const T&, and the (highly recommended) [[clang::lifetimebound]].
I shall leave that as an excercise for the reader.

The example makes use of std::string_view, but of course the transparent hash lookup can also be used for types which themselves can not be meaningfully used in transparent comparisons.

Heterogeneous lookup is also supported by the (blazingly fast) unordered containers from Boost.Unordered.

Tangentially related, if you find yourself storing multiple copies of the same object in various sets, perhaps consider using the Flyweight pattern to reduce memory overhead, for example via the lovely Boost.Flyweight.

tags: C++