RegEx help pls

From: koswix31 Oct 2015 10:14
To: CHYRON (DSMITHHFX) 9 of 14
I know, right?
From: Kenny J (WINGNUTKJ)31 Oct 2015 10:59
To: Peter (BOUGHTONP) 10 of 14
Bravo!

After avoiding the things for years, the software I work with has suddenly sprouted a regex based text file parsing engine, and I'm having to try to remember everything I ever knew about them.
From: koswix31 Oct 2015 13:20
To: Kenny J (WINGNUTKJ) 11 of 14
When ever I see regex i just pretend I'm hacking a terminal in Fallout 3.
From: Peter (BOUGHTONP)31 Oct 2015 18:12
To: Kenny J (WINGNUTKJ) 12 of 14
Sometimes I wonder why people avoid regex, given that it's easier than most other languages, particularly any general purpose one.

Then I remember there's a reason software has so many bugs in. :/

From: Kenny J (WINGNUTKJ)31 Oct 2015 19:54
To: Peter (BOUGHTONP) 13 of 14
There's a reason that software has so many bugs in, but I don't think it's anything to do with regular expressions. Top 3 in my office: Lack of testing, lack of review oversight, lack of understanding of requirements.

I think you answered your question about why people avoid regex up there, by posting a bit of Regex. It's an arcane and very compact syntax with very little in common to natural language, and is therefore a pain in the arse to maintain if you're not dealing with it day in, day out.

I took to adding exhaustive comments to any regex I wrote, because I knew that in six months time, some idiot would be picking through the code trying to figure out what it was doing. I wanted to give them as much information as possible about what the expression was for, and how it worked, because there would be a fairly good chance that idiot would be me.
From: Peter (BOUGHTONP)31 Oct 2015 21:43
To: Kenny J (WINGNUTKJ) 14 of 14
What? No, I wasn't saying it had anything to do with regex itself.

If developers don't understand the relatively simple concepts involved in regex, it's no surprise they don't understand and correctly apply the more advanced concepts found in complex programming.

Lack of understanding of requirements is definitely another problem, as is lack of testing (both automated and human).

Or put another way: most developers don't know what the problem is, don't really know how to solve it, and don't want to verify their thrown together mess actually does what was asked for, beyond very superficial and cursory one-off checks.

Not certain what lack of review oversight means?


> It's an arcane and very compact syntax with very little in common to natural language

That's the thing - it's not really any worse than many other computer languages, when you take the whitespace away.

These are just as gibberish to a newbie or non-programmer as a regex is:

   var url=$('a[href^=#]:eq(0)').val();
   <?=((int)$_POST['c'])?:__('none')?>

You might not write code like that, but equally regex can be written with whitespace and comments too (and it's a shame that's not the default).

   # match entire string
   ^.*$

   # don't match video files
   (?<!
      \.(?:mov|mpg|mp4)
   )

A contrived example, and that first comment is akin to an "increment by one" line, but it hopefully demonstrates the problem is less the language.

The second comment is more the sort that should be written - explaining the intent of the code, so if there's a bug then the idiot in six months time has a chance of knowing whether something unusual is deliberate or the cause. (Of course, any deliberately unusual behaviour should also have suitable positive and negative test cases to prevent regression.)