Hash Map (Hash Table)
2015-11-10 01:37
337 查看
Reference: Wiki PrincetonAlgorithm
A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
hash function: transform the search key into an array index
What is Hash Collisions
Different keys that are assigned by the hash function to the same bucket.
Perfect Hashing
A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions.
Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most chaining and open addressing methods, where the time for lookup is low on average, but may be very large, O(n), for some sets of keys.
Load Factor
A critical statistic for a hash table.
Load factor = n / k
where:
n = number of entries
k = number of buckets
high load factor: hash table becomes slower, and it may even fail to work (depending on the method used).
low load factor: considering the proportion of unused areas in the hash table. This results in wasted memory.
Also, one should examine the variance of number of entries per bucket. For example, two tables both have 1000 entries and 1000 buckets; one has exactly one entry in each bucket, the other has all entries in the same bucket. Clearly the hashing is not working in the second one.
It should be deterministic - equal keys must produce the same hash value.
It should be efficient to compute.
It should uniformly distribute the keys.
Suppose we have an array that can hold M key-value pairs, then we need a function that can transform any given key into an index into that array.
Positive Integers
modular hashing
Choose the array size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M. (k % M)
Floating-Point Numbers
Simple but defective way:
If the keys are real numbers between 0 and 1, we might just multiply by M and round off to the nearest integer to get an index between 0 and M-1.
This approach is defective because it gives more weight to the most significant bits of the keys; the least significant bits play no role.
Better way (adopted by Java):
Use modular hashing on the binary representation of the key.
String
Simply treat them as huge integers.
R is prime number, Java uses 31. Calculate each bit of the array.
Also, we can use MD5 to randomize input keys
APR hash function uses magic number 33. Similar with the first method.
Compound Keys
We can follow the ways of processing string.
For example, we use a tuple as key (element1, element2, element3)
By Using hashCode() in Java
Separate Chaining
Open Addressing
Robin Hood Hashing
2-Choic Hashing
details check here
What is Hash Table
Hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values.A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
hash function: transform the search key into an array index
What is Hash Collisions
Different keys that are assigned by the hash function to the same bucket.
Perfect Hashing
A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions.
Perfect hashing allows for constant time lookups in the worst case. This is in contrast to most chaining and open addressing methods, where the time for lookup is low on average, but may be very large, O(n), for some sets of keys.
Load Factor
A critical statistic for a hash table.
Load factor = n / k
where:
n = number of entries
k = number of buckets
high load factor: hash table becomes slower, and it may even fail to work (depending on the method used).
low load factor: considering the proportion of unused areas in the hash table. This results in wasted memory.
Also, one should examine the variance of number of entries per bucket. For example, two tables both have 1000 entries and 1000 buckets; one has exactly one entry in each bucket, the other has all entries in the same bucket. Clearly the hashing is not working in the second one.
Hash Functions
3 primary requirements in implementing a good hash function for a given data type:It should be deterministic - equal keys must produce the same hash value.
It should be efficient to compute.
It should uniformly distribute the keys.
Suppose we have an array that can hold M key-value pairs, then we need a function that can transform any given key into an index into that array.
Positive Integers
modular hashing
Choose the array size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M. (k % M)
Floating-Point Numbers
Simple but defective way:
If the keys are real numbers between 0 and 1, we might just multiply by M and round off to the nearest integer to get an index between 0 and M-1.
This approach is defective because it gives more weight to the most significant bits of the keys; the least significant bits play no role.
Better way (adopted by Java):
Use modular hashing on the binary representation of the key.
String
Simply treat them as huge integers.
R is prime number, Java uses 31. Calculate each bit of the array.
int hash = 0; for (int i = 0; i < s.length(); i++) hash = (R * hash + s.charAt(i)) % M;
Also, we can use MD5 to randomize input keys
int hashFunc(String key) { return md5(key) % HASH_TABLE_SIZE; }
APR hash function uses magic number 33. Similar with the first method.
int hashFunc(String key) { int sum = 0; for (int i = 0; i < key.length(); i++) { sum = sum * 33 + (int) key.charAt(i); sum = sum % HASH_TABLE_SIZE; } return sum; }
Compound Keys
We can follow the ways of processing string.
For example, we use a tuple as key (element1, element2, element3)
int hash = (((element1 * R + element2) % M) * R + element3) % M;
By Using hashCode() in Java
private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }
Collision Resolutions
Generally, there are four ways to resolve collision:Separate Chaining
Open Addressing
Robin Hood Hashing
2-Choic Hashing
details check here
相关文章推荐
- HashMap Collision Resolution
- 关于配置文件加载的疑虑
- Mesh Renderer详解
- 系统调用和库函数
- Quick小白书系列(二)创建自己的项目并做点有意思的小内容
- Unity3D中MeshRenderer的使用
- Xcode7 使用NSURLSession发送HTTP请求报错
- Quick小白书系列(一)Quick的基本结构及如何开始入门
- 【绑定自定义类至Lua】(四)使用绑定C++至Lua的自定义类
- 提高代码质量技巧
- 【绑定自定义类至Lua】(三)动手绑定自定义类至Lua
- curl的安装
- Java的异常处理机制
- Python爬虫(图片)编写过程中遇到的问题
- 【绑定自定义类至Lua】(二)新建项目中配制环境
- 【绑定自定义类至Lua】(一)环境搭建
- PHP可变变量,预定义变量,综述,$_GET数组
- 十大流行Linux发行版
- hdoj2060(snooker
- (java)输出前一天的当前时间