DotAll and multiline RegEx

Question:

i got a little trouble using Rexex in Powershell. It seems like there is a imlementation error or something.

The text i want to work with is a html file, which looks like this (Example1):

The Problem is that, caused by html editors, i also may get something like this (Example2):

So as you see, we got linebreaks and html escaped, fixed whitespaces  .

My Powershell Regex looks like this:

and this

Basicly The [ marks the beginning of a variable and ] the end of it. Two problems arise from this:

  1. Since we got two variables, mobile and fax, i’m using (.?){7} to allow SOME (here exacly 7) characters and avoid matching the hole part between the first [ near Mobile and the last ] near Fax (which would happen if i would be using (.*?) instead of (.?){7}). I’m not sure if there are alternatives so that i can allow ANY number (and not 7) of chars between the starting [ and the variable keyword “Fax” for example. This would be usefull to avoid missmatches when stuff like    gets added (where only 7 char would not be enough and like i said (.*?) will fail). Hope i was able to explain it (kinda hard) – if not: please feel free to ask!
  2. Powershells -replace method dosn’t offer a way to set regex options, therefore i got to use (?ms) to set DotAll and multiline modes. As you see, I’m using it within my regex pattern. However: when a newline is added, as you see in example2 between the words Mobile: and %mobile%, the regex fails and nothing gets replaced!

I’m greatfull for any help and even regex recommandations from the pros to avoid any further problems i’m not thinking about right now…

EDIT:
(Example3):

Answer:

The trick around DotAll mode is to use [\s\S] instead of .. This character class matches any character (because it matches space and non-space characters). (As does [\w\W] or [\d\D], but the spaces seem to be kind of a convention.)

To get around the 7 you can simply disallow closing ] before the one you actually want to match (that by the way also makes DotAll unnecessary). So something like this should work fine for you:

It looks a bit ugly, but it simply means this:

Further reading on character classes.

Note that none of these patterns need multiline mode m (neither yours nor mine), because all it does is make ^ and $ match line beginnings and endings, respectively. But none of the patterns contain these meta-characters. So the modifier does not do anything.

My console output:

Source:

DotAll and multiline RegEx by licensed under CC BY-SA | With most appropriate answer!

Leave a Reply