Start a free 7-day trial to check out our newest course, WordPress Basics!

Join the Treehouse affiliate program and earn 25% recurring commission!

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Well done!

You have completed Regular Expressions in Python!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Negation

8:20 with Kenneth Love

Negated sets let us specify characters and sequences that should be left out of any matches.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

New terms

[^abc] - a set that will not match, and exclude, the letters 'a', 'b', and 'c'.
re.IGNORECASE or re.I - flag to make a search case-insensitive. re.match('A', 'apple', re.I) would find the 'a' in 'apple'.
re.VERBOSE or re.X - flag that allows regular expressions to span multiple lines and contain (ignored) whitespace and comments.

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Let's try a slightly harder one, slightly weirder one perhaps. 0:00

So let's actually, let's see. 0:06

Let's comment these two both out, and let's take our email address one. 0:08

And I wanna match all the email address, just like we did before. 0:16

But if the email undress ends in .gov, I want to leave that part off. 0:19

Just pretend I have a good reason for this cuz I, I really don't. 0:25

So, all right, this sounds like a really good place for us to use a negative set. 0:30

And we can also write this out. 0:35

I mean, this is really, there's a lot to this. 0:37

So, let's leave ourselves some comments, make this a little bit easier. 0:41

So okay, first of all, yeah. 0:45

We can definitely use a negative set here. 0:48

So, let's make this a multiline string. 0:51

And we gotta end that multiline string. 0:58

You know what, we need to make this four spaces. 1:02

There we go, all right. 1:09

And we'll end our multiline string, and then we'll do stuff as usual against data. 1:11

All right. So, let's take this. 1:18

I actually don't wanna catch the part before. 1:20

I just wanna get the the e-mail address. 1:23

So, let's do a \b and 1:27

then an @, and then that part. 1:32

And I don't care how many things are there. 1:39

So find a word boundary, just leaving myself a little note here, 1:43

an @, and then any number of characters. 1:48

All right, then what I want to ignore is gov, and 1:55

I don't wanna get that tab that's in there. 2:01

You can't necessarily see it, but 2:03

the space between each of these things is a tab character. 2:07

And I know there's a tab character right here, and 2:12

it just might catch it, so let's leave that off. 2:14

So one or more of those is fine. 2:19

And let's leave another comment here of ignore, wow, wow. 2:21

Ignore one or more instances of the letters g, 2:29

o, or v and a tab. 2:36

All right. And 2:41

then we have another b here, so match another word boundary, all right. 2:41

And then we do data. 2:49

Now, I've done a flag here, which is that I've done multiple lines. 2:50

So I need to use this VERBOSE flag. 2:57

And then, since we've got gov in there, and we've got it in lowercase. 3:00

Just in case there was an uppercase version, I'd want to add on the flag re.I. 3:04

And we add multiple flags with the pipe symbol in between each of the flags. 3:09

It's a little weird. 3:15

It's just something you get used to. 3:17

You just kinda have to remember it. 3:19

So, all right, let's try that out. 3:21

And there we go, 3:25

we've got @teamtreehouse.com, @teamtreehouse.com, blah, blah, blah. 3:26

And then we get over here, and we've got us, this was supposed to be us.gov, and 3:30

we've got just us. 3:34

And then we were supposed to have empire.gov, as we've got up here, and 3:36

we've just have empire. 3:39

And we're supposed to have spain.gov, and we just got spain. 3:41

So, that's pretty cool, 3:43

we got all the email addresses, but we left off the .gov on two of them. 3:46

So, I think that's pretty cool, pretty handy. 3:51

Let's try another one with our VERBOSE flag, 3:56

just to get used to doing our VERBOSE flag. 4:00

Gonna comment this out. 4:04

All right. 4:05

So let's try another verbose pattern that will match our our names. 4:08

It'll also match our jobs, but it's still a good practice. 4:14

So we're gonna do print(re.findall. 4:18

And then we're gonna do a multi-line string, cuz we're gonna use verbose. 4:23

So let's do \b -\w. 4:26

So that would be Find a word boundary 1+ 4:33

hyphens or word characters. 4:40

We'll just say characters. 4:47

And a comma cuz that comma's in there. 4:49

It has to find that comma. 4:52

And then let's have it find, find whitespace. 4:54

Find 1 whitespace. 5:00

And then let's have it find another hyphen, a w, or 5:02

a space as part of our set. 5:06

We'll talk about why that's different in just a second. 5:07

1+ hyphens and characters, and explicit spaces. 5:10

And then I want it to not find tabs or new line characters. 5:21

Ignore tabs and newlines. 5:25

And then we wanna close this, we're gonna run this against data, and 5:29

we're gonna do re.x. 5:34

All right. 5:36

So let's talk about this one for a second before we run it. 5:36

So, when we do the verbose flag, 5:40

which re.x if you didn't guess is the short hand version of re.VERBOSE. 5:43

When we do the verbose ones, the regular expression engine ignores all of 5:49

the spaces that are just out in our pattern. 5:54

So like, these spaces here and 5:57

these spaces here are completely ignored, as is this comment. 6:00

So we have to mark those with this \s. 6:05

That, and, and that is whitespace. 6:09

So that matches spaces, it matches tabs, it matches new lines. 6:12

It matches all sorts of stuff. 6:17

Actually, I don't remember if it matches new lines or 6:18

not, but it matches spaces and tabs, and other characters like that. 6:19

If you wanna go look up like, half tab or letter space and 6:24

stuff like that, there's all sorts of these spaces that are available. 6:28

So it matches all of those. 6:31

But inside of a set, we can use an explicit space and 6:33

that will only match spaces. 6:36

It won't match tabs or newlines or whatever. 6:40

And then down here we want to ignore tab and newline. 6:42

Now, why didn't we have to use re.i in this one, or re.ignorecase. 6:46

The reason's because we're not matching any explicit characters. 6:50

We're not matching, like, the letter t, that may be uppercase or lowercase. 6:54

Since we're not matching those things, we're matching more generic stuff like 6:59

word characters, then we can use, or we can, we can leave off re.i. 7:03

. So let's run this and see what it does. 7:09

And I forgot another character. 7:13

We should have a plus sign there as well. 7:15

So let's run that again. 7:19

There we go. 7:21

So now we've got Kenneth Love and Teacher Treehouse, Dave MacFarlane, or 7:22

MacFarlane, Dave, Teacher Treehouse, and so on. 7:26

So we got the names, and we got the where they work. 7:29

So, of course, if we want to get Tim in there, we need to change this to a star. 7:34

Run this again and we should get Tim. 7:40

I don't see Tim actually. 7:44

So Tim's not in there, but we will fix that later. 7:47

We'll select everybody before we get to the end of this. 7:52

As you can tell though, it really, really helps breaking up our patterns or 7:55

multiple lines. 7:59

And being able to annotate each line with a comment, so that we remember what we're 8:00

doing, what we're looking for and how to make things again. 8:04

We have a ton of choices now when we write patterns. 8:09

They can be as flexible or strict as we need. 8:12

Our next video will cover the real meat of what'll make our regular expressions 8:14

capable of solving our immediate problem. 8:18

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up