Question:
I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.
So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
#ERROR REPORTING ALL Set-StrictMode -Version latest $path = "c:\MORLAB" $files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) } $output = "c:\wordfiletry.txt" $application = New-Object -comobject word.application $application.visible = $False $findtext = "CRHPCD01" Function getStringMatch { # Loop through all *.doc files in the $path directory Foreach ($file In $files) { $document = $application.documents.open($file.FullName,$false,$true) $range = $document.content $wordFound = $range.find.execute($findText) if($wordFound) { "$file.fullname has $wordfound" | Out-File $output -Append } } $document.close() $application.quit() } getStringMatch |
Answer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
#ERROR REPORTING ALL Set-StrictMode -Version latest $path = "c:\Temp" $files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) } $output = "c:\temp\wordfiletry.csv" $application = New-Object -comobject word.application $application.visible = $False $findtext = "First" $charactersAround = 30 $results = @{} Function getStringMatch { # Loop through all *.doc files in the $path directory Foreach ($file In $files) { $document = $application.documents.open($file.FullName,$false,$true) $range = $document.content If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){ $properties = @{ File = $file.FullName Match = $findtext TextAround = $Matches[0] } $results += New-Object -TypeName PsCustomObject -Property $properties } } If($results){ $results | Export-Csv $output -NoTypeInformation } $document.close() $application.quit() } getStringMatch import-csv $output |
There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.
We have the variable $charactersAround
which sets the number of characters to match around the $findtext
. Also I though the output was a better fit for a CSV file so I used $results
to capture a hashtable of properties that, in the end, are output to a csv file.
Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.
Sample Output
1 2 3 4 |
Match TextAround File ----- ---------- ---- First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx |