How to merge two csv files in powershell with same headers and discard duplicate rows

Question:

I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:

  1. First Query on 30th of every month and save in csv file.
  2. Second Query on 1st of every month and save in csv file.

Then merge the two csv files to get data if there are 31 days in a month.

Both files look like below:

When i merge the two csv files with below code. I get duplicate rows with data from same data/time.

How can i merge the two csv files without getting duplicate data?
I have tried select -unique however, it gives just one row.

Answer:

As for why Select-Object -Unique didn’t work:

  • Select-Object -Unique, when given instances of reference types (other than strings), compares their .ToString() values in order to determine uniqueness.
  • [pscustomobject] instances, such as the ones Import-Csv creates, regrettably return the empty string from their .ToString() method.
    • This long-standing bug, still present as of PowerShell (Core) 7.2, was originally reported in GitHub issue #6163.

Thus, all input objects compare the same, and only the first input object is ever returned.

S9uare’s helpful Select-Object -Property * -Unique approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time values would suffice; with large input files, processing can take a long time.


Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content rather than Import-Csv:

Note that I’m using -Encoding ASCII to mimic Export-Csv‘s default behavior; change as needed.

With input objects that are strings, Select-Object -Unique works as expected – and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object needs to build up an in-memory data structure containing all rows in order to determine uniqueness.

Source:

How to merge two csv files in powershell with same headers and discard duplicate rows by licensed under CC BY-SA | With most appropriate answer!

Leave a Reply