Difference between revisions of "Regexp"

Latest revision as of 10:34, 3 December 2018

Origin

Regexp is a portmanteau of the words regular and expression. It is the Emacs abbreviation for "regular expression". Many other computer languages and software use the abbreviation regex (no trailing p) instead.

Wikipedia has a nice article about regular expressions in general. This article focuses on the Emacs Lisp implementation of regular expressions.

Syntax

The following characters are special : . * + ? ^ $ \ [

Between brackets [], the following are special : ] - ^

Many characters are special when they follow a backslash – see below.

 .        any character (but newline)
 *        previous character or group, repeated 0 or more time
 +        previous character or group, repeated 1 or more time
 ?        previous character or group, repeated 0 or 1 time  
 ^        start of line
 $        end of line
 [...]    any character between brackets
 [^..]    any character not in the brackets
 [a-z]    any character between a and z
 \        prevents interpretation of following special char
 \|       or
 \w       word constituent
 \b       word boundary
 \sc      character with c syntax (e.g. \s- for whitespace char)
 \( \)    start\end of group
 \< \>    start\end of word
 \` \'    start\end of buffer
 \1       string matched by the first group
 \n       string matched by the nth group
 \{3\}    previous character or group, repeated 3 times
 \{3,\}   previous character or group, repeated 3 or more times
 \{3,6\}  previous character or group, repeated 3 to 6 times

.?, +?, and ?? are non-greedy versions of ., +, and ? \W, \B, and \Sc match any character that does not match \w, \b, and \sc

Character category

 \ca      ascii character
 \Ca      non-ascii character (newline included)
 \cl      latin character
 \cg      greek character

POSIX character classes

 [:digit:]  a digit, same as [0-9]
 [:upper:]  a letter in uppercase
 [:space:]  a whitespace character, as defined by the syntax table
 [:xdigit:] an hexadecimal digit
 [:cntrl:]  a control character
 [:ascii:]  an ascii character

Syntax classes

 \s-   whitespace character        \s/   character quote character
 \sw   word constituent            \s$   paired delimiter         
 \s_   symbol constituent          \s'   expression prefix        
 \s.   punctuation character       \s<   comment starter          
 \s(   open delimiter character    \s>   comment ender            
 \s)   close delimiter character   \s!   generic comment delimiter
 \s"   string quote character      \s|   generic string delimiter 
 \s\   escape character

Embed Emacs Lisp

 \,expr   where expr is an Emacs Lisp expression.

This is mostly used in the replace part of the regexp. Like this:

 \(foo\)\(bar\) -> \1\,(upcase \2)
 foobar         -> fooBAR

Emacs Commands that Use Regular Expressions

C-M-s	incremental forward search matching regexp
C-M-r	incremental backward search matching regexp
replace-regexp	replace string matching regexp
query-replace-regexp	same, but query before each replacement
align-regexp	align, using strings matching regexp as delimiters (see also an interactive version )
highlight-regexp	highlight strings matching regexp
occur	show lines containing a match
multi-occur	show lines in all buffers containing a match
how-many	count the number of strings matching regexp
keep-lines	delete all lines except those containing matches
flush-lines	delete lines containing matches
grep	call unix grep command and put result in a buffer
lgrep	user-friendly interface to the grep command
rgrep	recursive grep
dired-do-copy-regexp	copy files with names matching regexp
dired-do-rename-regexp	rename files matching regexp
find-grep-dired	display files containing matches for regexp with Dired

Tips and Tricks

Enter a newline character

To enter a newline character in a regexp, use the two keystroke sequence C-q C-j. It will appear in the minibuffer as ^J.

Build regexps interactively

Re-builder builds regexp interactively in buffer.

You can also use helm-regexp.

Search and replace with visual feedback

You can have the equivalent of query-replace-regexp with a visual feedback thanks to the package visual-regexp, available in MELPA. All explanations and screenshots are on its github page.

Even more powerful, see visual regexp steroids. It is an extension to visual-regexp which enables the use of modern regexp engines (no more escaped group parentheses, and other goodies!). In addition to that, you can optionally use the better regexp syntax to power isearch-forward-regexp and isearch-backward-regexp. For now, Python and pcre2el is supported out of the box (tested on Linux and Windows).

The following screenshot shows the visual-regexp-steroids. It is the same visual as visual-regexp but using a python regexp:

Use python regexp

This is possible with the afordmentioned package, visual-regexp-steroids. A nice feature is that you can use a python expression in the replacement, like (\1.upper()) (but remember we can use elisp too).

Use foreign regexps

foreign-regexp.el - search and replace by foreign regexp.

@@ Line 4: / Line 4: @@
 Many other computer languages and software use the abbreviation '''regex''' (no trailing p) instead.
-Wikipedia article about [http://en.wikipedia.org/wiki/Regular_expression regular expressions]
+Wikipedia has a nice article about [http://en.wikipedia.org/wiki/Regular_expression regular expressions] in general. This article focuses on the Emacs Lisp implementation of regular expressions.
 == Syntax ==
@@ Line 66: / Line 66: @@
    \s\   escape character
-== Regexp Examples ==
+=== Embed Emacs Lisp ===
- [-+[:digit:]]                     digit or + or - sign
+  \,expr   where expr is an Emacs Lisp expression.
- \(\+\|-\)?[0-9]+\(\.[0-9]+\)?     decimal number (-2 or 1.5 but not .2 or 1.)
- \(\w+\) +\1\>                     two consecutive, identical words
+This is mostly used in the replace part of the regexp. Like this:
- \<[[:upper:]]\w*                  word starting with an uppercase letter
-  +$                               trailing whitespaces (note the starting SPC)
+  \(foo\)\(bar\) -> \1\,(upcase \2)
- \w\{20,\}                         word with 20 letters or more
+  foobar         -> fooBAR
- \w+phony\>                        word ending by phony
- \(19\|20\)[0-9]\{2\}              year 1900-2099
- ^.\{6,\}                          at least 6 symbols
- ^[a-zA-Z0-9_]\{3,16\}$            decent string for a user name
- <tag[^> C-q C-j ]*>\(.*?\)</tag>  html tag
 == Emacs Commands that Use Regular Expressions ==
-  C-M-s                   incremental forward search matching regexp
+{|
-  C-M-r                   incremental backward search matching regexp
+  |C-M-s
-  replace-regexp          replace string matching regexp
+ |incremental forward search matching regexp
-  query-replace-regexp    same, but query before each replacement
+  |-
-  align-regexp            align, using strings matching regexp as delimiters
+ |C-M-r
-  highlight-regexp        highlight strings matching regexp
+ |incremental backward search matching regexp
-  occur                   show lines containing a match
+  |-
-  multi-occur             show lines in all buffers containing a match
+ |replace-regexp
-  how-many                count the number of strings matching regexp
+ |replace string matching regexp
-  keep-lines              delete all lines except those containing matches
+  |-
-  flush-lines             delete lines containing matches
+ |query-replace-regexp
-  grep                    call unix grep command and put result in a buffer
+ |same, but query before each replacement
-  lgrep                   user-friendly interface to the grep command
+  |-
-  rgrep                   recursive grep
+ |align-regexp
-  dired-do-copy-regexp    copy files with names matching regexp
+ |align, using strings matching regexp as delimiters (see also an [https://github.com/mkcms/interactive-align interactive version ])
-  dired-do-rename-regexp  rename files matching regexp
+  |-
-  find-grep-dired         display files containing matches for regexp with Dired
+ |highlight-regexp
+ |highlight strings matching regexp
+  |-
+ |occur
+ |show lines containing a match
+  |-
+ |multi-occur
+ |show lines in all buffers containing a match
+  |-
+ |how-many
+ |count the number of strings matching regexp
+  |-
+ |keep-lines
+ |delete all lines except those containing matches
+  |-
+ |flush-lines
+ |delete lines containing matches
+  |-
+ |grep
+ |call unix grep command and put result in a buffer
+  |-
+ |lgrep
+ |user-friendly interface to the grep command
+  |-
+ |rgrep
+ |recursive grep
+  |-
+ |dired-do-copy-regexp
+ |copy files with names matching regexp
+  |-
+ |dired-do-rename-regexp
+  |rename files matching regexp
+  |-
+ |find-grep-dired
+ |display files containing matches for regexp with Dired
+|}
 == Tips and Tricks ==
+=== Enter a newline character ===
 To enter a '''newline''' character in a regexp, use the two keystroke sequence '''C-q C-j'''.
 It will appear in the minibuffer as '''^J'''.
-== See Also ==
+=== Build regexps interactively ===
-[[Re-builder]] build regexp interactively in buffer
+[[Re-builder]] builds regexp interactively in buffer.
+You can also use [http://tuhdo.github.io/helm-intro.html#sec-18 helm-regexp].
+=== Search and replace with visual feedback ===
+You can have the equivalent of '''query-replace-regexp''' with a visual feedback thanks to the package '''visual-regexp''', available in [[MELPA]]. All explanations and screenshots are on [https://github.com/benma/visual-regexp.el its github page].
+Even more powerful, see [https://github.com/benma/visual-regexp-steroids.el/ visual regexp steroids]. It is an extension to visual-regexp which enables the use of modern regexp engines (no more escaped group parentheses, and other goodies!). In addition to that, you can optionally use the better regexp syntax to power isearch-forward-regexp and isearch-backward-regexp. For now, Python and pcre2el is supported out of the box (tested on Linux and Windows).
+The following screenshot shows the visual-regexp-steroids. It is the same visual as visual-regexp but using a python regexp:
+[[File:emacs-visual-regexp.png]]
+=== Use python regexp ===
+This is possible with the afordmentioned package, visual-regexp-steroids. A nice feature is that you can use a python expression in the replacement, like <code>(\1.upper())</code> (but remember we can use elisp too).
+=== Use foreign regexps ===
+[https://github.com/k-talo/foreign-regexp.el foreign-regexp.el] - search and replace by foreign regexp.
+[[Category:Intermediate]]
+[[Category:Search]]
+[[Category:Regexp]]