您的位置:首页 > 其它

[笔记]HashMap

2015-07-31 14:59 260 查看
in binary sort the algorithm complexcity is O(logn).

can we do better ?

is there an algorithm that it has O(1) time complexcity for searching?

yes there is.

that is hashing.

a hash table is a collection of items which can store in such a way as to make it easy to find them later.

each position of the hash table, often called a slot , can hold an item and is named by an integer value starting at 0.

the mapping between an item and the slot where that item belongs in the hash table is called the hash function

the hash function will take any item in the collection and return an integer in the range of slot names, between 0 and n-1.

using the “remainder method” as the hash function for example, using hash(item) = item % 11 as the hash function. then, there has hash(54) = 10, hash(26) = 4, hash(93) = 5…

once the hash value is computed, we can insert each item

into the hash table at the designed position.

note that, if a hash table can carry m items at most

and then the hash table is containing k items now.

there a parameter lambda is introduced, “load factor”,

which equals to k/m.

when we want to search for an item, we only need to compute the hash value of the item and use it as

the slot name to refer to the item. so, the entire time complexity is O(1).

however what will happen if two items have the same hash value?

for instance, hash(44) = 44%11 = 0, and hash(77) = 77%11 = 0 ,44 and 77 hash the same hash value now…

that fact is called collision. it happens when two or more items those have the same hash value.

Creating a hash table, it is necessary to solve the collisions.

how perfect it is if a hash function maps each item to distinct slot! such a hash function is called perfect hash function in general.

unfortunately, there is no systematic way to construct a perfect hash function, given an arbitrary collection of items.

luckily, we do not need the perfect hash function to still get performance efficiency enough.

one way to always have a better hash function is to increase the size of the hash table so that each possible value in the item range can be accommodated. this guarantees that each item will have a unique slot although it is practical for small numbers of items, it is not feasible when the number of possible items is for example, if the items were nine-digit social security numbers, this method would require almost one billion slots. if we want to store data for a class

of 25 students, we will be wasting an enormous amount of memory.

considering the goal is to minimize the collision times in a hash table, there are a number of methods to extend the simple remainder method.

the first one: folding method. divide the item into equal-size pieces.

add the pieces together to give the resulting hash value.

(436-555-4601)-> (43+65+55+46+01) = 210 hash(210) = 210%11 = 1

another method is mid-square method

44*44 = 1936

1936 % 11 = 5

collision resolution

when the collision happened, we must figure out a way to place the second item in the hash table

one method is look for another open slot to store

the second item. to implement this idea, a simple way is to start

at the original hash value position and then move

in a sequential manner through the slots until we

encounter the first empty slot.

when we arrive at the end of the slot, we should

return back to the start of the slot to check out the entire slots.

this resolution process is called open addressing

in that it tries to find the next open slot or

address in the hash table.

rehashing

the general name of the process of looking for another

slot after a collision is rehashing.

newhashvalue = rehash(oldhashvalue)

rehash(pos) = (pos+3)% sizeoftable

a variation of the linear probing idea is called quadratic probing.

instead of using a constant “skip” value, we use a refresh function that increments . the hash value by 1,3,5,7,9 and so on.this means if the first hash value is h,the successive value are h+1, h+4, h+9,

h+16,h+25…

an alternative method for handling the collision problem is to

allow each slot to hold a reference to a collect of items. Chaining allows many items to exit at the same location in the hash table.

when collisions happen, the item is still placed

in the proper slot of the hash table.

as more and more items hash to the same location, the difficulty of searching for the item in the collection increases.

class HashTable(object):
"""
a hash table have 11 slots better to be prime number
"""
def __init__(self):
self.size = 11
self.slots = [None] * self.size
self.data = [None] * self.size

def hash(self, key):
"""
key:int, value:orther
return the hash value of key
"""
return key % self.size

def rehash(self, oldhashvalue):
"""
using to deal with collision
oldhashvalue: the value from hash(key)
return: rehash value
"""
return (oldhashvalue + 1) % self.size

def put(self, key, value):
"""
put the key in slot s[hash value]
put the value in data[hash value]
"""
hsv = self.hash(key)
if self.slots[hsv] is None:
self.slots[hsv] = key
self.data[hsv] = value
else:
if key == self.slots[hsv]:
self.data[hsv] = value # replace

else:
rhsv = self.rehash(hsv)
while (self.slots[rhsv] is not None) and (hsv != rhsv):
rhsv = self.rehash(rhsv)

if self.slots[rhsv] is None:
self.slots[rhsv] = key
self.data[rhsv] = value
else:
self.data[rhsv] = value # replace
self.slots[rhsv] = key

def get(self, key):
"""
key : int
return the correspond value in self.data
"""
hsv = self.hash(key)
if self.slots[hsv] == key:
return self.data[hsv]
else:
rhsv = self.rehash(hsv)
while self.slots[rhsv] != key and rhsv != hsv:
rhsv = self.rehash(rhsv)
if self.slots[rhsv] == key:
return self.data[rhsv]
else:
return None

def __getitem__(self, key):
return self.get(key)

def __setitem__(self, key, value):
self.put(key, value)

if __name__ == "__main__":

H=HashTable()
H[54]='cat'
H[26]='dog'
H[93]='lion'
H[17]='tiger'
H[77]='bird'
H[31]='cow'
H[44]="goat"
H[55]='pig'
H[20]='chicken'
print(H.slots)
print(H.data)
print("add 44=goat")
H[44]="goat"
print(H.slots)
print(H.data)
print("add 88=hot")
H[88]='hot'
print(H.slots)
print(H.data)
print("add 99=bomb")
H[99]='bomb'
print(H.slots)
print(H.data)
print("add 66=fish")
H[66]='fish'
print(H.slots)
print(H.data)


[77, 44, 55, 20, 26, 93, 17, None, None, 31, 54]

[‘bird’, ‘goat’, ‘pig’, ‘chicken’, ‘dog’, ‘lion’, ‘tiger’, None, None, ‘cow’, ‘c

at’]

add 44=goat

[77, 44, 55, 20, 26, 93, 17, 44, None, 31, 54]

[‘bird’, ‘goat’, ‘pig’, ‘chicken’, ‘dog’, ‘lion’, ‘tiger’, ‘goat’, None, ‘cow’,

‘cat’]

add 88=hot

[77, 44, 55, 20, 26, 93, 17, 44, 88, 31, 54]

[‘bird’, ‘goat’, ‘pig’, ‘chicken’, ‘dog’, ‘lion’, ‘tiger’, ‘goat’, ‘hot’, ‘cow’,

‘cat’]

add 99=bomb

[99, 44, 55, 20, 26, 93, 17, 44, 88, 31, 54]

[‘bomb’, ‘goat’, ‘pig’, ‘chicken’, ‘dog’, ‘lion’, ‘tiger’, ‘goat’, ‘hot’, ‘cow’,

‘cat’]

add 66=fish (replace)

[66, 44, 55, 20, 26, 93, 17, 44, 88, 31, 54]

[‘fish’, ‘goat’, ‘pig’, ‘chicken’, ‘dog’, ‘lion’, ‘tiger’, ‘goat’, ‘hot’, ‘cow’,

‘cat’]
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: