VB快速查找大型文件中包含的字符串
2017-08-01 15:07
357 查看
关于查找大型文件中包含的字符串,一般都把文件内容读入到内存,然后在内存里进行比较,却不知这种办法有一个致命的弱点,那就是由于大量的内存申请和释放导致的内存颠簸,会使系统性能下降,严重影响了查找的速度。特别是在递归中对多个文件进行查找时,这个问题会更加突出,有时甚至会导致VB程序挂掉。为避免这种情况,同时加快大型文件中字符串的查找速度,俺基于内存影射文件和VB模拟指针技术,编写了一个通用字符串查找函数。
首先,先看一个普通的查找函数:
[vb] view
plain copy
'使用普通方式查找文件中包含的字符串(返回字符位置)
Private Function FindText(ByVal strFileName As String, ByVal strText As String) As Long
Dim fn As Integer
Dim strFileText As String
Dim MyString, MyNumber
Dim S As String
fn = FreeFile()
Open strFileName For Binary As #fn ' 打开输入文件。
strFileText = Input(LOF(fn), fn)
Close #fn
FindText = InStr(strFileText, strText)
End Function
用一个400K的文本进行测试,测试次数为20次,测试代码如下:
[vb] view
plain copy
Sub Main()
Dim lStartTime As Long
'比较两个方式的运行速度
lStartTime = GetTickCount
Call FindText("G:/Inst/小说/沧海凤歌.txt", "打打秋风") '此返回值为字符位置
Debug.Print GetTickCount - lStartTime
End Sub
根据测试结果,最大耗时为2050ms,最小耗时为890ms,平均在950ms左右。
然后,我看再看一下基于内存影射和模拟指针的查找函数,代码如下:
[vb] view
plain copy
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Long, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long
Private Const GENERIC_READ = &H80000000
Private Const GENERIC_WRITE = &H40000000
Private Const OPEN_EXISTING = 3
Private Const FILE_SHARE_READ = &H1
Private Const FILE_SHARE_WRITE = &H2
Private Const FILE_ATTRIBUTE_NORMAL = &H80
Private Const FILE_ATTRIBUTE_ARCHIVE = &H20
Private Const FILE_ATTRIBUTE_READONLY = &H1
Private Const FILE_ATTRIBUTE_HIDDEN = &H2
Private Const FILE_ATTRIBUTE_SYSTEM = &H4
Private Declare Function CreateFileMapping Lib "kernel32" Alias "CreateFileMappingA" (ByVal hFile As Long, ByVal lpFileMappigAttributes As Long, ByVal flProtect As Long, ByVal dwMaximumSizeHigh As Long, ByVal dwMaximumSizeLow As Long, ByVal lpName As String) As Long
Private Declare Function MapViewOfFile Lib "kernel32" (ByVal hFileMappingObject As Long, ByVal dwDesiredAccess As Long, ByVal dwFileOffsetHigh As Long, ByVal dwFileOffsetLow As Long, ByVal dwNumberOfBytesToMap As Long) As Long
Private Declare Function UnmapViewOfFile Lib "kernel32" (lpBaseAddress As Any) As Long
Private Const PAGE_READWRITE = &H4
Private Const FILE_MAP_READ = &H4
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Ptr() As Any) As Long
Private Type SAFEARRAYBOUND
cElements As Long
lLbound As Long
End Type
Private Type SAFEARRAY1D
cDims As Integer
fFeatures As Integer
cbElements As Long
clocks As Long
pvData As Long
rgsabound(0) As SAFEARRAYBOUND
End Type
'使用内存映射方式查找大型文件中包含的字符串
Function FindTextInFile(ByVal strFileName As String, ByVal strText As String) As Long
Dim hFile As Long, hFileMap As Long
Dim nFileSize As Long, lpszFileText As Long, pbFileText() As Byte
Dim ppSA As Long, pSA As Long
Dim tagNewSA As SAFEARRAY1D, tagOldSA As SAFEARRAY1D
hFile = CreateFile(strFileName, _
GENERIC_READ Or GENERIC_WRITE, _
FILE_SHARE_READ Or FILE_SHARE_WRITE, _
0, _
OPEN_EXISTING, _
FILE_ATTRIBUTE_NORMAL Or FILE_ATTRIBUTE_ARCHIVE Or FILE_ATTRIBUTE_READONLY Or _
FILE_ATTRIBUTE_HIDDEN Or FILE_ATTRIBUTE_SYSTEM, _
0) '打开文件
If hFile <> 0 Then
nFileSize = GetFileSize(hFile, ByVal 0&) '获得文件大小
hFileMap = CreateFileMapping(hFile, 0, PAGE_READWRITE, 0, 0, vbNullString) '创建文件映射对象
lpszFileText = MapViewOfFile(hFileMap, FILE_MAP_READ, 0, 0, 0) '将映射对象映射到进程内部的地址空间
ReDim pbFileText(0) '初始化数组
ppSA = VarPtrArray(pbFileText) '获得指向SAFEARRAY的指针的指针
CopyMemory pSA, ByVal ppSA, 4 '获得指向SAFEARRAY的指针
CopyMemory tagOldSA, ByVal pSA, Len(tagOldSA) '保存原来的SAFEARRAY成员信息
CopyMemory tagNewSA, tagOldSA, Len(tagNewSA) '复制SAFEARRAY成员信息
tagNewSA.rgsabound(0).cElements = nFileSize '修改数组元素个数
tagNewSA.pvData = lpszFileText '修改数组数据地址
CopyMemory ByVal pSA, tagNewSA, Len(tagNewSA) '将映射后的数据地址绑定至数组
FindTextInFile = InStr(pbFileText, StrConv(strText, vbFromUnicode)) '查找子字符串位置
CopyMemory ByVal pSA, tagOldSA, Len(tagOldSA) '恢复数组的SAFEARRAY结构成员信息
Erase pbFileText '删除数组
UnmapViewOfFile lpszFileText '取消地址映射
CloseHandle hFileMap '关闭文件映射对象的句柄
End If
CloseHandle hFile '关闭文件
End Function
这个函数明显比上一个函数复杂得到,按理说,它运行速度肯定相应的要慢一些,咱们先不下定论,还是经过测试后再说吧,测试代码如下:
调用代码如下:
[vb] view
plain copy
Sub Main()
lStartTime = GetTickCount
Call FindTextInFile("G:/Inst/小说/沧海凤歌.txt", "打打秋风") '此返回值为字节位置
Debug.Print GetTickCount - lStartTime
End Sub
使用了同一个文本文件,同样测试了20次,嘿,第二个函数最大耗时为17ms,最小耗时为0ms,平均不超过1ms,这进一步验证了我的设计初衷。
如果你有更好的思路和建议,恳请告诉俺,俺在此表示感谢了!
原文链接 :http://blog.csdn.net/lyserver/article/details/4106290
首先,先看一个普通的查找函数:
[vb] view
plain copy
'使用普通方式查找文件中包含的字符串(返回字符位置)
Private Function FindText(ByVal strFileName As String, ByVal strText As String) As Long
Dim fn As Integer
Dim strFileText As String
Dim MyString, MyNumber
Dim S As String
fn = FreeFile()
Open strFileName For Binary As #fn ' 打开输入文件。
strFileText = Input(LOF(fn), fn)
Close #fn
FindText = InStr(strFileText, strText)
End Function
用一个400K的文本进行测试,测试次数为20次,测试代码如下:
[vb] view
plain copy
Sub Main()
Dim lStartTime As Long
'比较两个方式的运行速度
lStartTime = GetTickCount
Call FindText("G:/Inst/小说/沧海凤歌.txt", "打打秋风") '此返回值为字符位置
Debug.Print GetTickCount - lStartTime
End Sub
根据测试结果,最大耗时为2050ms,最小耗时为890ms,平均在950ms左右。
然后,我看再看一下基于内存影射和模拟指针的查找函数,代码如下:
[vb] view
plain copy
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Long, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long
Private Const GENERIC_READ = &H80000000
Private Const GENERIC_WRITE = &H40000000
Private Const OPEN_EXISTING = 3
Private Const FILE_SHARE_READ = &H1
Private Const FILE_SHARE_WRITE = &H2
Private Const FILE_ATTRIBUTE_NORMAL = &H80
Private Const FILE_ATTRIBUTE_ARCHIVE = &H20
Private Const FILE_ATTRIBUTE_READONLY = &H1
Private Const FILE_ATTRIBUTE_HIDDEN = &H2
Private Const FILE_ATTRIBUTE_SYSTEM = &H4
Private Declare Function CreateFileMapping Lib "kernel32" Alias "CreateFileMappingA" (ByVal hFile As Long, ByVal lpFileMappigAttributes As Long, ByVal flProtect As Long, ByVal dwMaximumSizeHigh As Long, ByVal dwMaximumSizeLow As Long, ByVal lpName As String) As Long
Private Declare Function MapViewOfFile Lib "kernel32" (ByVal hFileMappingObject As Long, ByVal dwDesiredAccess As Long, ByVal dwFileOffsetHigh As Long, ByVal dwFileOffsetLow As Long, ByVal dwNumberOfBytesToMap As Long) As Long
Private Declare Function UnmapViewOfFile Lib "kernel32" (lpBaseAddress As Any) As Long
Private Const PAGE_READWRITE = &H4
Private Const FILE_MAP_READ = &H4
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Ptr() As Any) As Long
Private Type SAFEARRAYBOUND
cElements As Long
lLbound As Long
End Type
Private Type SAFEARRAY1D
cDims As Integer
fFeatures As Integer
cbElements As Long
clocks As Long
pvData As Long
rgsabound(0) As SAFEARRAYBOUND
End Type
'使用内存映射方式查找大型文件中包含的字符串
Function FindTextInFile(ByVal strFileName As String, ByVal strText As String) As Long
Dim hFile As Long, hFileMap As Long
Dim nFileSize As Long, lpszFileText As Long, pbFileText() As Byte
Dim ppSA As Long, pSA As Long
Dim tagNewSA As SAFEARRAY1D, tagOldSA As SAFEARRAY1D
hFile = CreateFile(strFileName, _
GENERIC_READ Or GENERIC_WRITE, _
FILE_SHARE_READ Or FILE_SHARE_WRITE, _
0, _
OPEN_EXISTING, _
FILE_ATTRIBUTE_NORMAL Or FILE_ATTRIBUTE_ARCHIVE Or FILE_ATTRIBUTE_READONLY Or _
FILE_ATTRIBUTE_HIDDEN Or FILE_ATTRIBUTE_SYSTEM, _
0) '打开文件
If hFile <> 0 Then
nFileSize = GetFileSize(hFile, ByVal 0&) '获得文件大小
hFileMap = CreateFileMapping(hFile, 0, PAGE_READWRITE, 0, 0, vbNullString) '创建文件映射对象
lpszFileText = MapViewOfFile(hFileMap, FILE_MAP_READ, 0, 0, 0) '将映射对象映射到进程内部的地址空间
ReDim pbFileText(0) '初始化数组
ppSA = VarPtrArray(pbFileText) '获得指向SAFEARRAY的指针的指针
CopyMemory pSA, ByVal ppSA, 4 '获得指向SAFEARRAY的指针
CopyMemory tagOldSA, ByVal pSA, Len(tagOldSA) '保存原来的SAFEARRAY成员信息
CopyMemory tagNewSA, tagOldSA, Len(tagNewSA) '复制SAFEARRAY成员信息
tagNewSA.rgsabound(0).cElements = nFileSize '修改数组元素个数
tagNewSA.pvData = lpszFileText '修改数组数据地址
CopyMemory ByVal pSA, tagNewSA, Len(tagNewSA) '将映射后的数据地址绑定至数组
FindTextInFile = InStr(pbFileText, StrConv(strText, vbFromUnicode)) '查找子字符串位置
CopyMemory ByVal pSA, tagOldSA, Len(tagOldSA) '恢复数组的SAFEARRAY结构成员信息
Erase pbFileText '删除数组
UnmapViewOfFile lpszFileText '取消地址映射
CloseHandle hFileMap '关闭文件映射对象的句柄
End If
CloseHandle hFile '关闭文件
End Function
这个函数明显比上一个函数复杂得到,按理说,它运行速度肯定相应的要慢一些,咱们先不下定论,还是经过测试后再说吧,测试代码如下:
调用代码如下:
[vb] view
plain copy
Sub Main()
lStartTime = GetTickCount
Call FindTextInFile("G:/Inst/小说/沧海凤歌.txt", "打打秋风") '此返回值为字节位置
Debug.Print GetTickCount - lStartTime
End Sub
使用了同一个文本文件,同样测试了20次,嘿,第二个函数最大耗时为17ms,最小耗时为0ms,平均不超过1ms,这进一步验证了我的设计初衷。
如果你有更好的思路和建议,恳请告诉俺,俺在此表示感谢了!
原文链接 :http://blog.csdn.net/lyserver/article/details/4106290
相关文章推荐
- VB快速查找大型文件中包含的字符串
- linux 查找出包含某个字符串的所有文件的方法详解
- 查找包含某个字符串的所有文件
- 使用grep来查找当前文件夹下边所有包含某个字符串的文件
- 查找文件夹下包含指定字符串的文件名字
- grep命令查找当前目录下包含某个字符串的所有文件
- Linux在指定目录下查找包含指定字符串的文件
- 查找某目录下包含某字符串的所有某一类文件
- linux下查找某个目录下包含某个字符串的文件
- Linux下查找某路径下包含某字符串的文件
- Linux在指定目录下查找包含指定字符串的文件
- svn-checkout后,循环遍历查找包含某字符串的文件
- Linux查找文件内容包含特定字符串的文件
- 在Linux下某一个文件夹下,如何查找包含某一个字符串的文件
- Linux下查找目录下的所有文件是否包含某字符串
- 在所有文件中查找包含某字符串的文件
- linux 查找 包含某字符串的文件
- linux下查找某目录下所有文件包含某字符串的命令
- 编写一个Python程序,能在当前目录以及当前目录的所有子目录下查找文件名包含指定字符串的文件,并打印出相对路径
- 在某个目录下的所有文件的内容中,查找包含某个字符串的文件