Efficiently counting files in directory and subfolders with specific name


I can count all the files in a folder and sub-folders, the folders themselves are not counted.

However, powershell is too slow for the amount of files (up to 700k). I read that cmd is faster in executing this kind of task.

Unfortunately I have no knowledge of cmd code at all. In the example above I am counting all the files with STB in the file name.

That is what I would like to do in cmd as well.

Any help is appreciated.


Theo’s helpful answer based on direct use of .NET ([System.IO.Directory]::EnumerateFiles()) is the fastest option (in my tests; YMMV – see the benchmark code below[1]).

Its limitations in the .NET Framework (FullCLR) – on which Windows PowerShell is built – are:

  • An exception is thrown when an inaccessible directory is encountered (due to lack of permissions). You can catch the exception, but you cannot continue the enumeration; that is, you cannot robustly enumerate all items that you can access while ignoring those that you cannot.
  • Hidden items are invariably included.
  • With recursive enumeration, symlinks / junctions to directories are invariably followed.

By contrast, the cross-platform .NET Core framework, since v2.1 – on which PowerShell Core is built – offers ways around these limitations, via the EnumerationOptions options – see this answer for an example.

Note that you can also perform enumeration via the related [System.IO.DirectoryInfo] type, which – similar to Get-ChildItem – returns rich objects rather than mere path strings, allowing for much for versatile processing; e.g., to get an array of all file sizes (property .Length, implicitly applied to each file object):

A native PowerShell solution that addresses these limitations and is still reasonably fast is to use Get-ChildItem with the -Filter parameter.

  • Hidden items are excluded by default; add -Force to include them.
  • To ignore permission problems, add -ErrorAction SilentlyContinue or -ErrorAction Ignore; the advantage of SilentlyContinue is that you can later inspect the $Error collection to determine the specific errors that occurred, so as to ensure that the errors truly only stem from permission problems.
  • In Windows PowerShell, Get-ChildItem -Recurse invariably follows symlinks / junctions to directories, unfortunately; more sensibly, PowerShell Core by default does not, and offers opt-in via -FollowSymlink.
  • Like the [System.IO.DirectoryInfo]-based solution, Get-ChildItem outputs rich objects ([System.IO.FileInfo] / [System.IO.DirectoryInfo]) describing each enumerated file-system item, allowing for versatile processing.

Note that while you can also pass wildcard arguments to -Path (the implied first positional parameter) and -Include (as in TobyU’s answer), it is only -Filter that provides
significant speed improvements
, due to filtering at the source (the filesystem driver), so that PowerShell only receives the already-filtered results; by contrast, -Path / -Include must first enumerate everything and match against the wildcard pattern afterwards.[2]

Caveats re -Filter use:

  • Its wildcard language is not the same as PowerShell’s; notably, it doesn’t support character sets/ranges (e.g. *[0-9]) and it has legacy quirks – see this answer.
  • It only supports a single wildcard pattern, whereas -Include supports multiple (as an array).

That said, -Filter processes wildcards the same way as cmd.exe‘s dir.

Finally, for the sake of completeness, you can adapt MC ND’s helpful answer based on cmd.exe‘s dir command for use in PowerShell, which simplifies matters:

PowerShell captures an external program’s stdout output as an array of lines, whose element count you can simply query with the .Count (or .Length) property.

That said, this may or may not be faster than PowerShell’s own Get-ChildItem -Filter, depending on the filtering scenario; also note that dir /s can only ever return path strings, whereas Get-ChildItem returns rich objects whose properties you can query.

Caveats re dir use:

  • /a-d excludes directories, i.e., only reports files, but then also includes hidden files, which dir doesn’t do by default.
  • dir /s invariably descends into hidden directories too during the recursive enumeration; an /a (attribute-based) filter is only applied to the leaf items of the enumeration (only to files in this case).
  • dir /s invariably follows symlinks / junctions to other directories (assuming it has the requisite permissions – see next point).
  • dir /s quietly ignores directories or symlinks / junctions to directories if it cannot enumerate their contents due to lack of permissions – while this is helpful in the specific case of the aforementioned hidden system junctions (you can find them all with cmd /c dir C:\ /s /ashl), it can cause you to miss the content of directories that you do want to enumerate, but can’t for true lack of permissions, because dir /s will give no indication that such content may even exist (if you directly target an inaccessible directory, you get a somewhat misleading File Not Found error message, and the exit code is set to 1).

Performance comparison:

  • The following tests compare pure enumeration performance without filtering, for simplicity, using a sizable directory tree assumed to be present on all systems, c:\windows\winsxs; that said, it’s easy to adapt the tests to also compare filtering performance.
  • The tests are run from PowerShell, which means that some overhead is introduced by creating a child process for cmd.exe in order to invoke dir /s, though (a) that overhead should be relatively low and (b) the larger point is that staying in the realm of PowerShell is well worthwhile, given its vastly superior capabilities compared to cmd.exe.
  • The tests use function Time-Command, which can be downloaded from this Gist, which averages 10 runs by default.

On my single-core VMWare Fusion VM with Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.523) I get the following timings, from fastest to slowest (scroll to the right to see the Factor column to show relative performance):

Interestingly, both [System.IO.Directory]::EnumerateFiles() and the Get-ChildItem solution are significantly faster in PowerShell Core, which runs on top of .NET Core (as of PowerShell Core 6.2.0-preview.4, .NET Core 2.1):

[1] [System.IO.Directory]::EnumerateFiles() is inherently and undoubtedly faster than a Get-ChildItem solution. In my tests (see section “Performance comparison:” above), [System.IO.Directory]::EnumerateFiles() beat out cmd /c dir /s as well, slightly in Windows PowerShell, and clearly so in PowerShell Core, but others report different findings. That said, finding the overall fastest solution is not the only consideration, especially if more than just counting files is needed and if the enumeration needs to be robust. This answer discusses the tradeoffs of the various solutions.

[2] In fact, due to an inefficient implementation as of Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4, use of -Path and -Include is actually slower than using Get-ChildItem unfiltered and instead using an additional pipeline segment with ... | Where-Object Name -like *STB*, as in the OP – see this GitHub issue.


Efficiently counting files in directory and subfolders with specific name by licensed under CC BY-SA | With most appropriate answer!

Leave a Reply