How do we choose a hashing function ?
Desired property: expected number of keys/bucket is the same for all buckets (uniform distribution)
- and is relatively small
- ideally all buckets consist of only one page
Example of a bad hash function for a string attribute
- first three letters of the string value
A potentially good hash function
- (char1 + char2 + … + charN) modulo b, where b is the number of buckets
Read Knuth Vol. 3 for more on good hashing functions