走进STL - 哈希表，散装称重么

1、哈希表 - >散装称重表

哈希表（hash table）,英译为散列表。但这不是我称之为“散装称重表”的主要原因。

哈希表可提供对任何有名项的存取和删除操作，由于对象是有名项，所以哈希表也可以像和map有一点远亲关系，都可以被视为一种字典结构。哈希表的用意在于提供常数时间的基本操作。

如果使用一个有序排列的array数组，倒是可以实现的，不过占用的资源将大的不切实际。（意思就是先把空间开好，用不用就是另一回事了，反正先开了放那里）那么如何避免使用一个大的荒谬的array呢？办法之一就是某种映射函数，将某一元素映射为一个大小可接受的索引，这样的函数称为“散列函数”。

不过使用“散列函数“又会遇到另一个问题：可能有不同的元素被映射到相同的位置。这无法避免，因为元素个数大于array容量。解决碰撞问题的方法有很多，包括线性探测、二次探测、开链等做法。每一种实现都不难，如果看完本系列前面的铺垫文章的话。但是导出效率就各有千秋了，我个人是比较喜欢“开链法”的。

2、线性探测与二次探测

这一块我就带过思想。

2.1 负载系数

负载系数 = 元素个数/表格大小。如果是线性探测和二次探测法，负载系数永远在0-1之间，除非使用开链。

2.2 线性探测

当hash function（散列函数）计算出某个元素的插入位置，而该位置已经有“土著”了，线性探测法将循序往下一一寻找（如果到了尾端，就从头再找），直到找到一个可用空间。找不到就不太现实了，只要表格够大。

搜寻的时候也是这样，如果散列函数计算出来的位置上的元素值与我们搜寻的目标不符，就循序往下找，直到找到，或者遇上空格元素。

删除的时候要采用惰性删除，也就只是给它标个删除符号而已，真正的删除要到调整表格的时候。因为哈希表中的每个元素已经不止关系到它自己了，还关系到其他元素的排列。

当然，上面这些操作要花多少时间那就不好说了。

看上面这一些表述就很麻烦了。

2.2 二次探测

二次探测用来解决主集团问题，何为“主集团”问题？就是线性探测插着插着的有些元素就抱团（大块连续空间占用）了，那么这样不论是对循序插入或是循序搜寻都是个巨大的老鼠屎。

二次探测：以F(i) = i^2的方式来解决主集团问题。说白了，就是将普通线性探测的循序查找方式转化为以平方形式加速查找。比方说第一次往后 1个位置，第二次往后第四个位置，第三次往后第九个位置进行查找，如此下去。

虽然看着感觉费事有没用，不过效率是比上面的线性探测要好。

3、开链法

这种做法是在每一个表格元素中维护一个list，只要list够短，速度还是很快的。

使用开链法，负载系数就会大于1,。SGI STL的哈希表便采用这种方式。

看图：

这张图，你品，你细品。里面有vector，也有list。

4、哈希表的节点定义

template <class Value>

struct __hashtable_node
{
	__hashtable_node* next;
	Value val;
}

5、哈希表迭代器

template <class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
struct __hashtable_iterator
{
	typedef hashtable<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc> hashtable;
	typedef __hashtable_iterator<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc> iterator;
	typedef __hashtable_const_iterator<Value,Key,HashFcn,ExtractKey,EqualKey,Alloc> const_iterator;
	typedef __hashtable_node<Value> node;

	typedef forward_iterator_tag iterator_category;
	typedef Value value_type;
	typedef ptrdiff_t difference_type;
	typedef size_t size_type;
	typedef Value& reference;
	typedef Value* pointer;

	node* cur;	//当前节点
	hashtable* ht;	//保持对容器的连接关系（Vector的连接）

	__hashtable_iterator(node* n,hashtable* tab):cur(n),ht(tab){}
	__hashtable_iterator(){}
	reference operator*() const { return cur->val;}
	pointer operator->() const { return &(operator*());}
	iterator& operator++();
	iterator operator++(int);
	bool operator==(const iterator& it) const {return cur == it.cur;}
	bool operator!=(const iterator& it) const {return cur != it.cur;}
}

看到没，里面有两个指针。一个属于bucket list的，一个属于bucket vector的。如果不清楚要怎么遍历，那先带着疑问看下去。

先把上面声明的几个运算符重载实现一下：

template<class V,class K,class HF,class ExK,class EqK,class A>
__hashtable_iterator<V,K,HF,ExK,EqK,A>& __hashtable_iterator<V,K,HF,ExK,EqK,A>::operater++()
{
	const node* old = cur;
	cur = cur->next;
	if(!cur)
	{
		size_type bucket = ht->bkt_num(old->val);
		while(!cur && ++bucket < ht->buckets.size())
			cur = ht->buckets[bucket];
	}
	return *this;
}

template<class V,class K,class HF,class ExK,class EqK,class A>
inline __hashtable_iterator<V,K,HF,ExK,EqK,A>& __hashtable_iterator<V,K,HF,ExK,EqK,A>::operater++(int)
{
	iterator tmp = *this;
	++*this;
	return tmp;
}

哈希迭代器并没有撤退可言！

6、哈希表数据结构

template<class Value,class Key,class HashFcn,class ExtractKey,class EqualKey,class Alloc>
class hashtable
{
public:
	typedef HashFcn hasher;
	typedef EqualKey key_equal;
	typedef size_t size_type;

private:
	hasher hash;
	key_equal equals;
	ExtractKey get_key;

	typedef __hashtable_node<Value> node;
	typedef simple_alloc<node,Alloc> node_allocator;

	vector<node*,Alloc> buckets;	//迭代器中可见
	size_type 	num_elements;

public:
	size_type bucket_count() const { return buckets.size(); }

···	
}

7、哈希表插入操作

unique插入

下面直接看不需要重新配置表格的情况

template<class V,class K,class HF,class ExK,class EqK,class A>
pair<typename hashtable<V,K,HF,Ex,Eq,A>::iterator,bool> hashtable<V,K,HF,Ex,Eq,A>::insert_unique_noresize(const value_type& obj)
{
	const size_type n = bkt_num(obj);
	node* first = buckets[n];
	
	//判断键值是否唯一
	for(node* cur = first;cui;cur = cur->next)
	{
		if(equals(get_key(cur->val),get_key(obj)))
		{
			return pair<iterator,bool>(iterator(cur,this),false);
		}
	}
	node* tmp = new_node(obj);
	tmp->next = first;
	buckets[n]  =tmp;
	++num_elements;
	return pair<iterator,bool>(iterator(tmp,this),true);
}

8、哈希表运用实例

看了这么多，脑子都乱了。来看个运用实例》

#include <hash_set>	//客端不能直接包<stl_hashtable.h>,应该用含有hashtable的容器的头文件
#include <iostream>

using namespace std;

int main()
{
	hashtable<int,int,hash<int>,identity<int>,equal_to<int>,alloc>
		iht(50,hash<int>(),equal_to<int>);	//保留50个桶
	
	cout<<iht.size()<<endl;
	iht,insert_unique(11);
	iht,insert_unique(22);
	iht,insert_unique(33);		
	cout<<iht.size()<<endl;
	
	hashtable<int,int,hash<int>,identity<int>,equal_to<int>,alloc>::iterator ite = iht.begin();	//声明一个哈希表的迭代器
	//使用迭代器遍历hashtable打印
	for(int i = 0; i < iht.size();++i,++ite)
		cout<<*ite<<' ';
	cout<<endl;
	
	iht,insert_unique(111);
	iht,insert_unique(222);
	iht,insert_unique(333);
	
	//使用迭代器遍历所有的list打印
	for(i = 0;i<iht.bucket_count();++i)
	{
		int n = iht.elems_in_bucket(i);
		if( n!=0 )
			cout<<"BL["<<i<<"]has"<<n<<"elems"<<endl;
	}
}

下一期：hash_set与hash_map,哈希表与STL关联容器的有机结合！