Question:
I have a data file of about 10-15 columns from which I want to extract specific columns. Some of the columns I know the exact column header and others I only know that the first two letters will always be “FC”.
How do I select only the columns where I know the column header and those that start with “FC”?
Starting with just the “FC” columns, I have tried like this:
1 2 3 4 |
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t" $FCcols = $myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"} $myCSV | select $FCcols |
But I just get an error:
1 2 3 4 5 6 7 8 9 |
Select-Object : Cannot convert System.Management.Automation.PSObject to one of the following types {System.String, System.Management.Automation.ScriptBlock}. At line:3 char:16 + $myCSV | select <<<< $FCcols + CategoryInfo : InvalidArgument: (:) [Select-Object], NotSupport edException + FullyQualifiedErrorId : DictionaryKeyUnknownType,Microsoft.PowerShell.Co mmands.SelectObjectCommand |
Then, if I try:
1 2 3 4 5 |
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t" $FCcols = [System.Collections.ArrayList]@() $myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"} | %{$FCcols.Add($_)} $myCSV | select $FCcols |
I get the output I want except that it is in “column header : value” format, like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
FC1839 : 0 FC1842 : 1 FC1843 : 6 FC1844 : 12 FC1845 : 4 FC1839 : 0 FC1842 : 0 FC1843 : 19 FC1844 : 22 FC1845 : 14 |
I am probably just missing something simple, but how do I get to the point that I am able to select these matching columns and then output them to another .txt file (without the header : value format)?
Answer:
First things first: Mathias R. Jessen’s helpful tip not only solves your problem, but significantly simplifies the approach (and also works in PSv2):
1 2 |
$myCSV | Select-Object FC* |
The (implied) -Property
parameter supports wildcard expressions, so FC*
matches all property (column names) that start with FC
.
As for the output format you’re seeing: Because you’re selecting 5 properties, PowerShell defaults to implicit Format-List
formatting, with each property name-value pair on its own line.
To fix this display problem, pipe to Format-Table
explicitly (which is what PowerShell would do implicitly if you had selected 4 or fewer properties):
1 2 |
$myCSV | Select-Object FC* | Format-Table |
To re-export the results to a CSV (TSV) file:
1 2 3 |
Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* | Export-Csv myresults.txt -Encoding Utf8 -Delimiter "`t" -NoTypeInformation |
To do so without a header line:
1 2 3 4 |
Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | Select-Object -Skip 1 | Set-Content myresults.txt -Encoding Utf8 |
As for your specific symptom:
The problem occurs only in PSv2, and it smells like a bug to me.
The workaround is make your column-name array a strongly typed string array ([string[]]
):
1 2 |
[string[]] $FCcols = $myCSV[0].psobject.Properties | % { $_.Name } | ? { $_ -match '^FC' } |
Note that, for brevity, I’ve used built-in alias %
in lieu of ForEach-Object
and ?
in lieu of Where-Object
.
Also note that the regex passed to -match
was changed to ^FC
to ensure that only columns that start with FC
are matched.
Your code works as-is in PSv3+, but can be simplified:
1 2 |
$FCcols = $myCSV[0].psobject.Properties.Name -match "^FC" |
Note how .Name
is applied directly to .psobject.Properties
, which in v3+ causes the .Name
member to be invoked on each item of the collection, a feature called member-access enumeration.