Intro to Regular Expression

Character literals


/a/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

    

"Escaped" characters literals


/.*/

Special characters must be escaped.*

/\.\*/
Special characters must be escaped.*
    

Positional special characters


/^Mary/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary$/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

    

The "wildcard" character


/.a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.


    

Grouping regular expressions


/(Mary)( )(had)/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

    

Character classes


/[a-z]a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

    

Complement operator


/[^a-z]a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

    

Alternation of patterns


/cat|dog|bird/

The pet store sold cats, dogs, and birds.

/=first|second=/

=first first= # =second second= # =first= # =second=

/(=)(first)|(second)(=)/

=first first= # =second second= # =first= # =second=

/=(first|second)=/

=first first= # =second second= # =first= # =second=

    

The basic abstract quantifier


/@(=+=)*@/ 

Match with zero in the middle: @@
Subexpresion occurs, but...: @=+=ABC@
Lots of occurrences: @=+==+==+==+==+=@
Must repeat entire pattern: @=+==+=+==+=@

    

Matching Patterns in Text: Intermediate


More abstract quantifiers


/A+B*C?D

AAAD
ABBBBCD
BBBCD
ABCCD
AAABBBC

    

Numeric quantifiers


/a{5} b{,6} c{4,8}/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a+ b{3,} c?/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a{5} b{6,} c{4,8}/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

    

Backreferences


/(abc|xyz) \1/

jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz

/(abc|xyz) (abc|xyz)/

jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz

    

Don't match more than you want to


/th.*s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much

    

Tricks for restraining matches


/th[^s]*./

-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much

    

A literal-string modification example


s/cat/dog/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild dogs, bobdogs, lions, and other wild dogs.
    

A pattern-match modification example


s/cat|dog/snake/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild snakes, bobsnakes, lions, and other wild snakes.

s/[a-z]+i[a-z]*/nice/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had nice dogs, bobcats, nice, and other nice cats.
    

Modification using backreferences


s/([A-Z])([0-9]{2,4}) /\2:\1 /g 

< A37 B4 C107 D54112 E1103 XXX
> 37:A B4 107:C D54112 1103:E XXX