Regular expressionexp - snippets

Page content

Regular expression

Regular expression is a basic concept in theoretical computer science. Once you see the Wikipedia page of “Regular expression”, you can realize how important it is for understanding computer science.

But for beninner of web engineer, the simple explanation of regular expression could be, it is just a “pattern” in a nut shell.

Regular expression is often abbreviated to regex.

Regex rules

(To be updated…) Here is often used regex syntax.

  • \d: A digit.
  • \w: A alphanumerical character (word).
  • {}: Repeat. {2} means “repeat twice”, and {3:5} means “repeat 3 to 5 times.”
  • [6-9]: A digit between 6 to 9.
  • (): Make a syntax group.
  • (a|b): The character a or br.
  • .oo: An any one character (.) and oo, like “foo”.
  • ^The: Start with ($) the characters “The”.

A backslash is used for escape.

  • \.: The character dot “.”.

Regular expression is not good at “not contain” syntax. But we can with [^]. It is depend on implementations in most cases, I think.

For example, in NLP we want to remove punctuations sometimes. If you use Python, the following page will be very helpful.

https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string

Example: IPv4

grep -E "(\d{1,3}\.){3}\d{1,3}" --only-matching subnet.txt
  • \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b

https://www.regular-expressions.info/wordboundaries.html

Nginx: [^/]+

Untill next slash /. Not / and more than 1 character.

Further more

https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/