Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Regular Expressions in Python!
You have completed Regular Expressions in Python!
Preview
Negated sets let us specify characters and sequences that should be left out of any matches.
New terms
-
[^abc]
- a set that will not match, and exclude, the letters 'a', 'b', and 'c'. -
re.IGNORECASE
orre.I
- flag to make a search case-insensitive.re.match('A', 'apple', re.I)
would find the 'a' in 'apple'. -
re.VERBOSE
orre.X
- flag that allows regular expressions to span multiple lines and contain (ignored) whitespace and comments.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Let's try a slightly harder one, slightly
weirder one perhaps.
0:00
So let's actually, let's see.
0:06
Let's comment these two both out, and
let's take our email address one.
0:08
And I wanna match all the email address,
just like we did before.
0:16
But if the email undress ends in .gov, I
want to leave that part off.
0:19
Just pretend I have a good reason for this
cuz I, I really don't.
0:25
So, all right, this sounds like a really
good place for us to use a negative set.
0:30
And we can also write this out.
0:35
I mean, this is really, there's a lot to
this.
0:37
So, let's leave ourselves some comments,
make this a little bit easier.
0:41
So okay, first of all, yeah.
0:45
We can definitely use a negative set here.
0:48
So, let's make this a multiline string.
0:51
And we gotta end that multiline string.
0:58
You know what, we need to make this four
spaces.
1:02
There we go, all right.
1:09
And we'll end our multiline string, and
then we'll do stuff as usual against data.
1:11
All right.
So, let's take this.
1:18
I actually don't wanna catch the part
before.
1:20
I just wanna get the the e-mail address.
1:23
So, let's do a \b and
1:27
then an @, and then that part.
1:32
And I don't care how many things are
there.
1:39
So find a word boundary, just leaving
myself a little note here,
1:43
an @, and then any number of characters.
1:48
All right, then what I want to ignore is
gov, and
1:55
I don't wanna get that tab that's in
there.
2:01
You can't necessarily see it, but
2:03
the space between each of these things is
a tab character.
2:07
And I know there's a tab character right
here, and
2:12
it just might catch it, so let's leave
that off.
2:14
So one or more of those is fine.
2:19
And let's leave another comment here of
ignore, wow, wow.
2:21
Ignore one or more instances of the
letters g,
2:29
o, or v and a tab.
2:36
All right.
And
2:41
then we have another b here, so match
another word boundary, all right.
2:41
And then we do data.
2:49
Now, I've done a flag here, which is that
I've done multiple lines.
2:50
So I need to use this VERBOSE flag.
2:57
And then, since we've got gov in there,
and we've got it in lowercase.
3:00
Just in case there was an uppercase
version, I'd want to add on the flag re.I.
3:04
And we add multiple flags with the pipe
symbol in between each of the flags.
3:09
It's a little weird.
3:15
It's just something you get used to.
3:17
You just kinda have to remember it.
3:19
So, all right, let's try that out.
3:21
And there we go,
3:25
we've got @teamtreehouse.com,
@teamtreehouse.com, blah, blah, blah.
3:26
And then we get over here, and we've got
us, this was supposed to be us.gov, and
3:30
we've got just us.
3:34
And then we were supposed to have
empire.gov, as we've got up here, and
3:36
we've just have empire.
3:39
And we're supposed to have spain.gov, and
we just got spain.
3:41
So, that's pretty cool,
3:43
we got all the email addresses, but we
left off the .gov on two of them.
3:46
So, I think that's pretty cool, pretty
handy.
3:51
Let's try another one with our VERBOSE
flag,
3:56
just to get used to doing our VERBOSE
flag.
4:00
Gonna comment this out.
4:04
All right.
4:05
So let's try another verbose pattern that
will match our our names.
4:08
It'll also match our jobs, but it's still
a good practice.
4:14
So we're gonna do print(re.findall.
4:18
And then we're gonna do a multi-line
string, cuz we're gonna use verbose.
4:23
So let's do \b -\w.
4:26
So that would be Find a word boundary 1+
4:33
hyphens or word characters.
4:40
We'll just say characters.
4:47
And a comma cuz that comma's in there.
4:49
It has to find that comma.
4:52
And then let's have it find, find
whitespace.
4:54
Find 1 whitespace.
5:00
And then let's have it find another
hyphen, a w, or
5:02
a space as part of our set.
5:06
We'll talk about why that's different in
just a second.
5:07
1+ hyphens and characters, and explicit
spaces.
5:10
And then I want it to not find tabs or new
line characters.
5:21
Ignore tabs and newlines.
5:25
And then we wanna close this, we're gonna
run this against data, and
5:29
we're gonna do re.x.
5:34
All right.
5:36
So let's talk about this one for a second
before we run it.
5:36
So, when we do the verbose flag,
5:40
which re.x if you didn't guess is the
short hand version of re.VERBOSE.
5:43
When we do the verbose ones, the regular
expression engine ignores all of
5:49
the spaces that are just out in our
pattern.
5:54
So like, these spaces here and
5:57
these spaces here are completely ignored,
as is this comment.
6:00
So we have to mark those with this \s.
6:05
That, and, and that is whitespace.
6:09
So that matches spaces, it matches tabs,
it matches new lines.
6:12
It matches all sorts of stuff.
6:17
Actually, I don't remember if it matches
new lines or
6:18
not, but it matches spaces and tabs, and
other characters like that.
6:19
If you wanna go look up like, half tab or
letter space and
6:24
stuff like that, there's all sorts of
these spaces that are available.
6:28
So it matches all of those.
6:31
But inside of a set, we can use an
explicit space and
6:33
that will only match spaces.
6:36
It won't match tabs or newlines or
whatever.
6:40
And then down here we want to ignore tab
and newline.
6:42
Now, why didn't we have to use re.i in
this one, or re.ignorecase.
6:46
The reason's because we're not matching
any explicit characters.
6:50
We're not matching, like, the letter t,
that may be uppercase or lowercase.
6:54
Since we're not matching those things,
we're matching more generic stuff like
6:59
word characters, then we can use, or we
can, we can leave off re.i.
7:03
.
So let's run this and see what it does.
7:09
And I forgot another character.
7:13
We should have a plus sign there as well.
7:15
So let's run that again.
7:19
There we go.
7:21
So now we've got Kenneth Love and Teacher
Treehouse, Dave MacFarlane, or
7:22
MacFarlane, Dave, Teacher Treehouse, and
so on.
7:26
So we got the names, and we got the where
they work.
7:29
So, of course, if we want to get Tim in
there, we need to change this to a star.
7:34
Run this again and we should get Tim.
7:40
I don't see Tim actually.
7:44
So Tim's not in there, but we will fix
that later.
7:47
We'll select everybody before we get to
the end of this.
7:52
As you can tell though, it really, really
helps breaking up our patterns or
7:55
multiple lines.
7:59
And being able to annotate each line with
a comment, so that we remember what we're
8:00
doing, what we're looking for and how to
make things again.
8:04
We have a ton of choices now when we write
patterns.
8:09
They can be as flexible or strict as we
need.
8:12
Our next video will cover the real meat of
what'll make our regular expressions
8:14
capable of solving our immediate problem.
8:18
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up