您的位置:首页 > 理论基础 > 计算机网络

简易网络爬虫程序的开发(4)(c#版)

2008-05-20 13:38 811 查看


AbsThreadManager 类:

AbsThreadManager的主要功能是管理开启WorkThread工作线程,与监控线线程的,WorkThread对象与Thread对象一一对应,这两个对象都被封在ObjThread对象里,先看看ObjThread源码:

namespace WebSpider
{
internal class ObjThread
{
private WorkThread _workThread;

private System.Threading.Thread _thread;

internal WorkThread WorkThread { get { return _workThread; } set { _workThread = value; } }

internal System.Threading.Thread Thread { get { return _thread; } set { _thread = value; } }

}
}

ObjThread类是非常的简单的,只有一个Thread对象与一个WorkThread对象.

在AbsThreadManagers中用List<ObjThread>来维护一系列的线程对象与WorkThread对象,同时在AbsThreadManagers中增加了一个监控线程,用来查看工作线程的工作线程,若工作线程死去,由监控线程重新启动。源码如下:

namespace WebSpider
{
public abstract class AbsThreadManager
{
public int _maxThread = Convert.ToInt32(System.Configuration.ConfigurationManager.AppSettings["MaxCount"]);

internal List<ObjThread> list = new List<ObjThread>();

private bool _isRun = false;

private System.Threading.Thread _watchThread = null;

/// <summary>
/// 当前深度
/// </summary>
public int Current { get { return UrlStack.Instance.Count; } }

/// <summary>
/// 开启服务
/// </summary>
/// <param name="url">种子URL</param>
public void Start(string url)
{
UrlStack.Instance.Push(url);

_isRun = true;

for (int i = 0; i < _maxThread; i++)
{
AddObjThread();
}
_watchThread = new System.Threading.Thread(Watch);
_watchThread.Start();
}

private void AddObjThread()
{
ObjThread thread = new ObjThread();
thread.WorkThread = new WorkThread();
thread.WorkThread.ChainMain.SetProcessHandler(GetChainHeader());
thread.Thread = new System.Threading.Thread(thread.WorkThread.Start);
list.Add(thread);
thread.Thread.Start();
}

/// <summary>
/// 停止服务
/// </summary>
public void Stop()
{
_isRun = false;
_watchThread.Join();
foreach (ObjThread obj in list)
{
obj.WorkThread.Stop();
obj.Thread.Abort();
obj.Thread.Join();
}
list.RemoveRange(0, list.Count);

}

/// <summary>
/// 设置职责链头节点
/// </summary>
/// <returns>返回用户定义的Chain</returns>
protected abstract AbsChain GetChainHeader();

internal void Watch()
{
List<ObjThread> newList = new List<ObjThread>();
while (_isRun)
{
try
{
foreach (ObjThread temp in list)
{
if (temp.WorkThread.IsRun && temp.Thread.IsAlive)
{
newList.Add(temp);
}
}
list.RemoveRange(0, list.Count);

list.AddRange(newList);

int newCount = _maxThread - list.Count;

for (int i = 0; i < newCount; i++)
{
AddObjThread();
}
newList.RemoveRange(0, newList.Count);

System.Threading.Thread.Sleep(5 * 1000);

}
catch
{
}
}
}
}
}

在这个类中只有一个抽象方法 protected abstract AbsChain GetChainHeader(),用户通过重定义GetChainHeader返加一个继承了AbsChain类的对象,这个对象将会被设置到ChainMain的_handler中。

至此,Spider程序集中的对有类都介绍完了,下一篇将会利用这个程序集来完成一个完整的蜘蛛程序

未完,待续……
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: