Question:
Is there a way to determine whether a specified file contains a specified byte array (at any position) in powershell?
Something like:
1 2 |
fgrep --binary-files=binary "$data" "$filepath" |
Of course, I can write a naive implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
function posOfArrayWithinArray { param ([byte[]] $arrayA, [byte[]]$arrayB) if ($arrayB.Length -ge $arrayA.Length) { foreach ($pos in 0..($arrayB.Length - $arrayA.Length)) { if ([System.Linq.Enumerable]::SequenceEqual( $arrayA, [System.Linq.Enumerable]::Skip($arrayB, $pos).Take($arrayA.Length) )) {return $pos} } } -1 } function posOfArrayWithinFile { param ([byte[]] $array, [string]$filepath) posOfArrayWithinArray $array (Get-Content $filepath -Raw -AsByteStream) } // They return position or -1, but simple $false/$true are also enough for me. |
— but it’s extremely slow.
Answer:
I’ve determined that the following can work as a workaround:
1 2 |
(Get-Content $filepath -Raw -Encoding 28591).IndexOf($fragment) |
— i.e. any bytes can be successfully matched by PowerShell string
s (in fact, .NET System.String
s) when we specify binary-safe encoding. Of course, we need to use the same encoding for both the file and fragment, and the encoding must be really binary-safe (e.g. 1250, 1000 and 28591 fit, but various species of Unicode (including the default BOM-less UTF-8) don’t, because they convert any non-well-formed code-unit to the same replacement character (U+FFFD)). Thanks to Theo for clarification.
On older PowerShell, you can use:
1 2 3 4 |
[System.Text.Encoding]::GetEncoding(28591). GetString([System.IO.File]::ReadAllBytes($filepath)). IndexOf($fragment) |
Sadly, I haven’t found a way to match sequences universally (i.e. a common method to match sequences with any item type: integer, object, etc). I believe that it must exist in .NET (especially that particual implementation for sequences of characters exists). Hopefully, someone will suggest it.