regex – Regular expression to match a line that doesnt contain a word

regex – Regular expression to match a line that doesnt contain a word

The notion that regex doesnt support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:

^((?!hede).)*$

The regex above will match any string, or line without a line break, not containing the (sub)string hede. As mentioned, this is not something regex is good at (or should do), but still, it is possible.

And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s in the following pattern):

/^((?!hede).)*$/s

or use it inline:

/(?s)^((?!hede).)*$/

(where the /.../ are the regex delimiters, i.e., not part of the pattern)

If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [sS]:

/^((?!hede)[sS])*$/

Explanation

A string is just a list of n characters. Before, and after each character, theres an empty string. So a list of n characters will have n+1 empty strings. Consider the string ABhedeCD:

    ┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
    └──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘

index    0      1      2      3      4      5      6      7

where the es are the empty strings. The regex (?!hede). looks ahead to see if theres no substring hede to be seen, and if that is the case (so something else is seen), then the . (dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they dont consume any characters. They only assert/validate something.

So, in my example, every empty string is first validated to see if theres no hede up ahead, before a character is consumed by the . (dot). The regex (?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$

As you can see, the input ABhedeCD will fail because on e3, the regex (?!hede) fails (there is hede up ahead!).

Note that the solution to does not start with “hede”:

^(?!hede).*$

is generally much more efficient than the solution to does not contain “hede”:

^((?!hede).)*$

The former checks for “hede” only at the input string’s first position, rather than at every position.

regex – Regular expression to match a line that doesnt contain a word

If youre just using it for grep, you can use grep -v hede to get all lines which do not contain hede.

ETA Oh, rereading the question, grep -v is probably what you meant by tools options.

Leave a Reply

Your email address will not be published. Required fields are marked *