在參閱本篇文章之前,建議先參閱以下兩篇文章
計(jì)算字符串的相似度(VB2005)——思索之一
計(jì)算字符串的相似度(VB2005)——思索之二
在第二篇文章中的代碼完成之后,照例還是對(duì)代碼測(cè)試了一番。還是用兩個(gè)相似的字符串,長(zhǎng)度分別為1500和1507,結(jié)果能出來,但是效率差了點(diǎn)。在筆者的電腦上用了6秒中左右。僅僅是比較文本,就要6秒鐘,比較難以接受,而且從代碼看時(shí)間復(fù)雜度和空間復(fù)雜度都是O(n2)。
必須得改進(jìn)!??!
在看了代碼之后,發(fā)現(xiàn)代碼運(yùn)行速度慢可能出現(xiàn)在兩個(gè)地方。一個(gè)是mDic對(duì)象,用的是Dictionary對(duì)象,在運(yùn)行中反復(fù)讀取和存儲(chǔ)可能會(huì)影響速度,如果改為用數(shù)組可能效果會(huì)好點(diǎn)。哪位對(duì)這個(gè)有研究的同道,望不吝賜教。一個(gè)是String對(duì)象的Chars(Index)的方法??赡茉诿看螆?zhí)行到這一步時(shí),會(huì)先把字符串轉(zhuǎn)化為字符數(shù)組再返回一個(gè)字符,或者是遍歷這個(gè)字符串,返回一個(gè)字符。對(duì)于本例中,大約需要執(zhí)行1500×1500次,等于反復(fù)遍歷,時(shí)間就浪費(fèi)了。建議一開始就轉(zhuǎn)化為字符數(shù)組,等到比較時(shí)就不需要遍歷或轉(zhuǎn)化了。
按照這兩個(gè)思路對(duì)代碼進(jìn)行了修改,然后測(cè)試。效果很滿意,本例測(cè)試幾乎就是一瞬間。
現(xiàn)將代碼賦予其后,用的是VB2005
Public Class clsCalculateStringDistanceEx2
Implements IDistance
Private mStringA() As Char
Private mStringB() As Char
Private mLenA As Integer
Private mLenB As Integer
Private mIsSame As Boolean
Private mDic(,) As Integer
Private Function CalculateStringDistance( _
ByVal StartLower As Integer, _
ByVal EndUper As Integer) As Integer
Dim i As Integer, j As Integer
Dim T1 As Integer, T2 As Integer, T3 As Integer
Dim jA As Integer = mLenA - StartLower - EndUper + 2
Dim jB As Integer = mLenB - StartLower - EndUper + 2
ReDim mDic(jA, jB)
mDic(jA, jB) = 0
For i = jA - 1 To 0 Step -1
mDic(i, jB) = jA - i
Next
For i = jB - 1 To 0 Step -1
mDic(jA, i) = jB - i
Next
For i = jA - 1 To 0 Step -1
For j = jB - 1 To 0 Step -1
If mStringA(i - 1 + StartLower) = _
mStringB(j - 1 + StartLower) Then
mDic(i, j) = mDic(i + 1, j + 1)
Else
T1 = mDic(i + 1, j)
T2 = mDic(i, j + 1)
T3 = mDic(i + 1, j + 1)
mDic(i, j) = Min(T1, T2, T3) + 1
End If
Next
Next
Return mDic(0, 0)
End Function
Private Function Min(ByVal ParamArray M() As Integer) As Integer
Dim i As Integer, J As Integer
J = M(0)
For i = 1 To M.GetUpperBound(0)
If M(i) < J Then J = M(i)
Next
Return J
End Function
Public Function CalculateStringDistance() As Integer _
Implements IDistance.CalculateStringDistance
If mLenA = 0 Then Return mLenB
If mLenB = 0 Then Return mLenA
mIsSame = True
Dim i As Integer, j1 As Integer, j2 As Integer
For i = 1 To Min(mLenA, mLenB)
If mStringA(i - 1) <> mStringB(i - 1) Then
mIsSame = False
j1 = i
Exit For
End If
Next
If mIsSame = True Then Return Math.Abs(mLenA - mLenB)
For i = 1 To Min(mLenA, mLenB)
If mStringA(mLenA - i) <> mStringB(mLenB - i) Then
mIsSame = False
j2 = i
Exit For
End If
Next
If mIsSame = True Then Return Math.Abs(mLenA - mLenB)
Return CalculateStringDistance(j1, j2)
End Function
Public ReadOnly Property DicCount() As Integer _
Implements IDistance.DicCount
Get
Return mDic.Length
End Get
End Property
Public Sub SetString(ByVal S1 As String, ByVal S2 As String) _
Implements IDistance.SetString
mStringA = S1.ToCharArray
mStringB = S2.ToCharArray
mLenA = S1.Length
mLenB = S2.Length
End Sub
End Class
如對(duì)本文有疑問,請(qǐng)?zhí)峤坏浇涣髡搲瑥V大熱心網(wǎng)友會(huì)為你解答?。?點(diǎn)擊進(jìn)入論壇