确定词频(Microsoft Word)
在分析文档时,您可能想知道是否存在一种创建单词频率列表的方法。换句话说,您可能希望生成文档中每个唯一单词的列表及其出现的次数。
不幸的是,Word没有包含这样的功能。但是,您可以使用宏创建自己的宏。以下VBA宏是一个示例:
Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for unique words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j As Integer 'Temporary variables Dim k As Integer ' Dim l As Integer ' Dim Temp As Integer ' Dim tword As String ' ' Set up excluded words Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" ' Find out how to sort ByFreq = True ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "WORD") If ans = "" Then End If UCase(ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(LCase(aword)) If SingleWord < "a" Or SingleWord > "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) > 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum > maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. _ Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l Next l If k <> j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Trim(Str(Freq(j))) & vbTab & Words(j) & vbCrLf Next j End With System.Cursor = wdCursorNormal j = MsgBox("There were " & Trim(Str(WordNum)) & _ " different words ", vbOKOnly, "Finished") End Sub
当您打开文档并运行此宏时,系统会询问您是否要创建按单词或频率排序的列表。如果选择单词,则结果列表将按字母顺序显示。如果选择频率,则根据单词在文档中出现的次数,结果列表将按降序排列。
在宏运行时,状态栏指示正在发生的事情。
根据文档的大小和计算机的速度,宏可能需要一段时间才能完成。 (我用719页的文档处理了34.9万多个单词,大约花了五分钟的时间。)
请注意,宏中有一行在Excludes字符串中设置值。该字符串包含将单词列表放在一起时宏将忽略的单词。如果要将单词添加到列表中,只需将它们添加到[方括号]之间的字符串中即可。另外,请确保排除词为小写。
注意:
如果您想知道如何使用此页面(或_WordTips_网站上的任何其他页面)中描述的宏,我准备了一个包含有用信息的特殊页面。
_WordTips_是您进行经济有效的Microsoft Word培训的来源。
(Microsoft Word是世界上最流行的文字处理软件。)本技巧(879)适用于Microsoft Word 97、2000、2002和2003。