Question:
I am doing some text stream processing on a series of PS1 & PSM1 files, and I ran into some issues with smart quotes and em-dashes (never, NEVER, cut and paste code from MS Scripting Guy blog). I figured the issue was encoding so I looked, and I have files of both ASCII & UTF8, but of course both have issues with my funky text. So I have done some replacements, and I have that working, but I wonder if I shouldn’t also standardize on one encoding, and if so, which one?
Answer:
Not a direct answer to your question but you may find it useful nonetheless, I have a tool I wrote to handle PS and SQL scripts but quickly found people were pasting from their emails which screwed a ton of stuff. I had to implement this to correct it all, and it should get everything:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
if ($code.IndexOf([Char]0x2013) -gt -1) { $code = $code.Replace(([Char]0x2013).ToString(), "--") } # en dash if ($code.IndexOf([Char]0x2014) -gt -1) { $code = $code.Replace(([Char]0x2014).ToString(), "-") } # em dash if ($code.IndexOf([Char]0x2015) -gt -1) { $code = $code.Replace(([Char]0x2015).ToString(), "-") } # horizontal bar if ($code.IndexOf([Char]0x2017) -gt -1) { $code = $code.Replace(([Char]0x2017).ToString(), "_") } # double low line if ($code.IndexOf([Char]0x2018) -gt -1) { $code = $code.Replace(([Char]0x2018).ToString(), "`'") } # left single quotation mark if ($code.IndexOf([Char]0x2019) -gt -1) { $code = $code.Replace(([Char]0x2019).ToString(), "`'") } # right single quotation mark if ($code.IndexOf([Char]0x201a) -gt -1) { $code = $code.Replace(([Char]0x201a).ToString(), ",") } # single low-9 quotation mark if ($code.IndexOf([Char]0x201b) -gt -1) { $code = $code.Replace(([Char]0x201b).ToString(), "`'") } # single high-reversed-9 quotation mark if ($code.IndexOf([Char]0x201c) -gt -1) { $code = $code.Replace(([Char]0x201c).ToString(), "`"") } # left double quotation mark if ($code.IndexOf([Char]0x201d) -gt -1) { $code = $code.Replace(([Char]0x201d).ToString(), "`"") } # right double quotation mark if ($code.IndexOf([Char]0x201e) -gt -1) { $code = $code.Replace(([Char]0x201e).ToString(), "`"") } # double low-9 quotation mark if ($code.IndexOf([Char]0x2026) -gt -1) { $code = $code.Replace(([Char]0x2026).ToString(), "...") } # horizontal ellipsis if ($code.IndexOf([Char]0x2032) -gt -1) { $code = $code.Replace(([Char]0x2032).ToString(), "`"") } # prime if ($code.IndexOf([Char]0x2033) -gt -1) { $code = $code.Replace(([Char]0x2033).ToString(), "`"") } # double prime if ($code.IndexOf([Char]0x0009) -gt -1) { $code = $code.Replace(([Char]0x0009).ToString(), " ") } # tab |