Question:
I can count all the files in a folder and sub-folders, the folders themselves are not counted.
1 2 |
(gci -Path *Fill_in_path_here* -Recurse -File | where Name -like "*STB*").Count |
However, powershell is too slow for the amount of files (up to 700k). I read that cmd is faster in executing this kind of task.
Unfortunately I have no knowledge of cmd code at all. In the example above I am counting all the files with STB
in the file name.
That is what I would like to do in cmd as well.
Any help is appreciated.
Answer:
Theo’s helpful answer based on direct use of .NET ([System.IO.Directory]::EnumerateFiles()
) is the fastest option (in my tests; YMMV – see the benchmark code below[1]).
Its limitations in the .NET Framework (FullCLR) – on which Windows PowerShell is built – are:
- An exception is thrown when an inaccessible directory is encountered (due to lack of permissions). You can catch the exception, but you cannot continue the enumeration; that is, you cannot robustly enumerate all items that you can access while ignoring those that you cannot.
- Hidden items are invariably included.
- With recursive enumeration, symlinks / junctions to directories are invariably followed.
By contrast, the cross-platform .NET Core framework, since v2.1 – on which PowerShell Core is built – offers ways around these limitations, via the EnumerationOptions
options – see this answer for an example.
Note that you can also perform enumeration via the related [System.IO.DirectoryInfo]
type, which – similar to Get-ChildItem
– returns rich objects rather than mere path strings, allowing for much for versatile processing; e.g., to get an array of all file sizes (property .Length
, implicitly applied to each file object):
1 2 |
([System.IO.DirectoryInfo] $somePath).EnumerateFiles('*STB*', 'AllDirectories').Length |
A native PowerShell solution that addresses these limitations and is still reasonably fast is to use Get-ChildItem
with the -Filter
parameter.
1 2 |
(Get-ChildItem -LiteralPath $somePath -Filter *STB* -Recurse -File).Count |
- Hidden items are excluded by default; add
-Force
to include them. - To ignore permission problems, add
-ErrorAction SilentlyContinue
or-ErrorAction Ignore
; the advantage ofSilentlyContinue
is that you can later inspect the$Error
collection to determine the specific errors that occurred, so as to ensure that the errors truly only stem from permission problems.- Note that PowerShell Core – unlike Windows PowerShell – helpfully ignores the inability to enumerate the contents of the hidden system junctions that exist for pre-Vista compatibility only, such as
$env:USERPROFILE\Cookies
.
- Note that PowerShell Core – unlike Windows PowerShell – helpfully ignores the inability to enumerate the contents of the hidden system junctions that exist for pre-Vista compatibility only, such as
- In Windows PowerShell,
Get-ChildItem -Recurse
invariably follows symlinks / junctions to directories, unfortunately; more sensibly, PowerShell Core by default does not, and offers opt-in via-FollowSymlink
. - Like the
[System.IO.DirectoryInfo]
-based solution,Get-ChildItem
outputs rich objects ([System.IO.FileInfo]
/[System.IO.DirectoryInfo]
) describing each enumerated file-system item, allowing for versatile processing.
Note that while you can also pass wildcard arguments to -Path
(the implied first positional parameter) and -Include
(as in TobyU’s answer), it is only -Filter
that provides
significant speed improvements, due to filtering at the source (the filesystem driver), so that PowerShell only receives the already-filtered results; by contrast, -Path
/ -Include
must first enumerate everything and match against the wildcard pattern afterwards.[2]
Caveats re -Filter
use:
- Its wildcard language is not the same as PowerShell’s; notably, it doesn’t support character sets/ranges (e.g.
*[0-9]
) and it has legacy quirks – see this answer. - It only supports a single wildcard pattern, whereas
-Include
supports multiple (as an array).
That said, -Filter
processes wildcards the same way as cmd.exe
‘s dir
.
Finally, for the sake of completeness, you can adapt MC ND’s helpful answer based on cmd.exe
‘s dir
command for use in PowerShell, which simplifies matters:
1 2 |
(cmd /c dir /s /b /a-d "$somePath/*STB*").Count |
PowerShell captures an external program’s stdout output as an array of lines, whose element count you can simply query with the .Count
(or .Length
) property.
That said, this may or may not be faster than PowerShell’s own Get-ChildItem -Filter
, depending on the filtering scenario; also note that dir /s
can only ever return path strings, whereas Get-ChildItem
returns rich objects whose properties you can query.
Caveats re dir
use:
/a-d
excludes directories, i.e., only reports files, but then also includes hidden files, whichdir
doesn’t do by default.dir /s
invariably descends into hidden directories too during the recursive enumeration; an/a
(attribute-based) filter is only applied to the leaf items of the enumeration (only to files in this case).dir /s
invariably follows symlinks / junctions to other directories (assuming it has the requisite permissions – see next point).dir /s
quietly ignores directories or symlinks / junctions to directories if it cannot enumerate their contents due to lack of permissions – while this is helpful in the specific case of the aforementioned hidden system junctions (you can find them all withcmd /c dir C:\ /s /ashl
), it can cause you to miss the content of directories that you do want to enumerate, but can’t for true lack of permissions, becausedir /s
will give no indication that such content may even exist (if you directly target an inaccessible directory, you get a somewhat misleadingFile Not Found
error message, and the exit code is set to1
).
Performance comparison:
- The following tests compare pure enumeration performance without filtering, for simplicity, using a sizable directory tree assumed to be present on all systems,
c:\windows\winsxs
; that said, it’s easy to adapt the tests to also compare filtering performance. - The tests are run from PowerShell, which means that some overhead is introduced by creating a child process for
cmd.exe
in order to invokedir /s
, though (a) that overhead should be relatively low and (b) the larger point is that staying in the realm of PowerShell is well worthwhile, given its vastly superior capabilities compared tocmd.exe
. - The tests use function
Time-Command
, which can be downloaded from this Gist, which averages 10 runs by default.
1 2 3 4 5 6 7 8 9 10 |
# Warm up the filesystem cache for the target dir., # both from PowerShell and cmd.exe, to be safe. gci 'c:\windows\winsxs' -rec >$null; cmd /c dir /s 'c:\windows\winsxs' >$null Time-Command ` { @([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count }, { (Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count }, { (cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count }, { cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' } |
On my single-core VMWare Fusion VM with Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.523) I get the following timings, from fastest to slowest (scroll to the right to see the Factor
column to show relative performance):
1 2 3 4 5 6 7 |
Command Secs (10-run avg.) TimeSpan Factor ------- ------------------ -------- ------ @([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count 11.016 00:00:11.0158660 1.00 (cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count 15.128 00:00:15.1277635 1.37 cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' 16.334 00:00:16.3343607 1.48 (Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count 24.525 00:00:24.5254979 2.23 |
Interestingly, both [System.IO.Directory]::EnumerateFiles()
and the Get-ChildItem
solution are significantly faster in PowerShell Core, which runs on top of .NET Core (as of PowerShell Core 6.2.0-preview.4, .NET Core 2.1):
1 2 3 4 5 6 7 |
Command Secs (10-run avg.) TimeSpan Factor ------- ------------------ -------- ------ @([System.IO.Directory]::EnumerateFiles('c:\windows\winsxs', '*', 'AllDirectories')).Count 5.094 00:00:05.0940364 1.00 (cmd /c dir /s /b /a-d 'c:\windows\winsxs').Count 12.961 00:00:12.9613440 2.54 cmd /c 'dir /s /b /a-d c:\windows\winsxs | find /c /v """"' 14.999 00:00:14.9992965 2.94 (Get-ChildItem -Force -Recurse -File 'c:\windows\winsxs').Count 16.736 00:00:16.7357536 3.29 |
[1] [System.IO.Directory]::EnumerateFiles()
is inherently and undoubtedly faster than a Get-ChildItem
solution. In my tests (see section “Performance comparison:” above), [System.IO.Directory]::EnumerateFiles()
beat out cmd /c dir /s
as well, slightly in Windows PowerShell, and clearly so in PowerShell Core, but others report different findings. That said, finding the overall fastest solution is not the only consideration, especially if more than just counting files is needed and if the enumeration needs to be robust. This answer discusses the tradeoffs of the various solutions.
[2] In fact, due to an inefficient implementation as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4, use of -Path
and -Include
is actually slower than using Get-ChildItem
unfiltered and instead using an additional pipeline segment with ... | Where-Object Name -like *STB*
, as in the OP – see this GitHub issue.