开发者

non STL hash table type structure

开发者 https://www.devze.com 2023-03-04 12:18 出处:网络
Is there a way to write simple hashtable with the key as \"strings\" and value as the frequency, so that there are NO collisons?There will no be removal from the hashtable, and if the object already e

Is there a way to write simple hashtable with the key as "strings" and value as the frequency, so that there are NO collisons? There will no be removal from the hashtable, and if the object already exists in the hashtable, t开发者_开发技巧hen just update its frequency(add them together).

I was thinking there might be a algorithm that can compute a unique number from the string which will be used as the index.

Yes, i am avoiding the use of all STL construct including unordered_map.


You can use any perfect hash generator like gperf

See here for a list: http://en.wikipedia.org/wiki/Perfect_hash_function

PS. You'd still possibly want to use a map instead of flat array/vector in case the mapped domain gets too big/sparse


It really depends on what you mean by 'simple'.

The std::map is a fairly simple class. Still, it uses a red-black tree with all of the insertion, deletion, and balancing nicely hidden away, and it is templated to handle any orderable type as a key and any type as the value. Most map classes use a similar implementation, and avoid any sort of hashing functionality.

Hashing without collisions is not a trivial matter whatsoever. Perhaps the simplest method is Pearson Hashing.

It seems like you have 3 choices:

  1. Implement your own perfect hashing class. This would be a pretty good sized class with a lot of functionality and some decently complex algorithms. I don't think this is simple.

  2. Download and use a perfect hashing library that is already out there. Of course, you have to worry about deployability.

  3. Use STL's map class. It's embedded, well-documented, easy to use, type-flexible, and completely cross-platform. This seems like the 'simplest' solution.

If I may ask, Why are you avoiding STL?


If the set of possible strings is known beforehand, you can use a perfect hash function generator to do this. But otherwise, what you ask is impossible.

Now, it IS possible to make the likelihood of collisions extremely low by using a good hash function and making sure your table is huge. You basically need a big enough table to make the likelihood of invoking the Birthday Paradox low enough to suit you. Then you just use n bits of output from SHA-1, and 2^n will be your table size.

I'm also wondering if maybe you could use a Bloom filter and have an actual counter instead of bits. Keep a list of all the words you've stuffed into the bloom filter and what entries they've incremented (which will be the same each time) and you have yourself a gigantic linear function that you might be able to solve to get all the individual counts back out again.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号