Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Regular Expressions in Python!
You have completed Regular Expressions in Python!
Preview
Now that we can search for just about anything, let's organize our results a bit better. Regular expressions give us indexed and named groups, both of which are super-handy.
New terms
-
([abc])
- creates a group that contains a set for the letters 'a', 'b', and 'c'. This could be later accessed from theMatch
object as.group(1)
-
(?P<name>[abc])
- creates a named group that contains a set for the letters 'a', 'b', and 'c'. This could later be accessed from theMatch
object as.group('name')
. -
.groups()
- method to show all of the groups on aMatch
object. -
re.MULTILINE
orre.M
- flag to make a pattern regard lines in your text as the beginning or end of a string. -
^
- specifies, in a pattern, the beginning of the string. -
$
- specifies, in a pattern, the end of the string.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
When we create our patterns, they just
match the entire thing.
0:00
We've seen it already where our match
objects only have one group in them.
0:02
It's often really handy to have multiple
groups defined inside of your pattern, so
0:06
that you can later access just parts of
the text that you care about.
0:10
Like for our case, making a group for the
email address, and a group for
0:13
the phone number, and a group for
0:16
the name would make it a lot simpler later
to pull those out and use them.
0:18
So, at this point,
0:22
we've gotten to where we can catch pretty
much anything in our text file.
0:23
So I think what we should do now is we
should kind of just use all these
0:28
things at once.
0:30
This might get a little confusing, so
0:33
what we're gonna do is we're actually
gonna break these up into groups.
0:34
So let's let's start this out with our
normal print and re.findall.
0:38
And then let's do a large verbose one and
we're gonna need re.X.
0:45
So, all right.
0:51
Now we can write our pattern.
0:53
I'm gonna add in some extra lines her just
so
0:55
I can make this a little bit more
readable.
0:57
All right, so we define groups with
parentheses.
1:00
So, our first group here, we wanna capture
the last name and the first name.
1:04
So for the last name, we need that.
1:09
So hyphens, word characters, and spaces.
1:17
Any number of those from zero on up.
1:20
And then we need a comma, and we need an
actual space, and
1:23
then we need hyphen w space again, and
that's our group.
1:27
So that's last name, comma space first
name.
1:33
And then there's gonna be a tab.
1:36
All right.
So let's make a little note of
1:38
that last and first names.
1:39
Okay.
So now for
1:42
our email address, which was our next
thing in our line.
1:43
those, oops, those should cover our items.
1:49
So hyphens, word characters, numbers,
periods, and plus signs.
1:53
So we've got one or more of those.
2:00
We have an at symbol, and then we again
have hyphens,
2:01
word characters, digits, and periods.
2:05
One or more of those, and then there's a
tab.
2:08
And this is for our email.
2:12
All right, so what comes next?
2:14
Well, next is the phone number.
2:16
So, remember we have to escape these
parentheses and
2:18
we wanna mark them as optional.
2:21
So, then there's three digits.
2:23
And there is closing parentheses that is
optional.
2:27
There is a hyphen that is optional, and a
space that is optional.
2:31
And then there are three numbers, a
hyphen, and the four numbers.
2:36
That's our group, and then there's a tab.
2:42
So we'll say that's phone.
2:43
yep, all right.
2:47
Then we have the job and the company that
they work for.
2:49
So, this is a whole lot like our one that
captures the names, but
2:53
we don't have a lot of stuff in here.
2:59
So, it's pretty much just word characters
and spaces.
3:02
So there can be one or more of those,
3:05
a comma, some sort of white space, and
then again words and spaces.
3:07
And then of course there's a tab.
3:14
So job and company.
3:16
And then the last thing that we put in
there on some of the lines at
3:18
least is a Twitter account.
3:21
So, let's grab that, Twitter is actually
really easy to grab.
3:23
It's just /w/d, because,
3:27
I guess, no underscores are being included
in slash w.
3:31
You can't have hyphens, you can't really
have special characters,
3:36
you can just have numbers and letters.
3:39
So, that's that for Twitter, and let's
mark that Twitter.
3:40
All right.
So, that's our pattern.
3:47
Now it's a really long reg X, and there's
actually a couple of problems with this,
3:49
things that it won't catch.
3:53
But let's run it and, and see what we get.
3:55
So, we'll come down here, python
address_book.
3:58
And we can see like, you notice that
there's opening parenthesis, there's a,
4:02
a tuple.
4:07
Yeah, you see the tuple?
4:08
And the tuple shows all of our little
groups that we caught.
4:09
Each item in the tuple is one of our
groups.
4:13
So that's pretty awesome.
4:16
We're gonna come back to that.
4:17
Do you notice there's anything missing?
4:19
Dave's not here.
4:21
And King Arthur isn't in here.
4:23
And the reason is because they don't have
some of the items that we're looking for.
4:25
So since they don't match exactly, they
don't get included.
4:32
So what we should do is we should go back
and
4:36
mark a couple of things as being optional.
4:38
We're also gonna do a couple of other
tricks here.
4:41
So, let's see.
4:43
The first thing we're gonna do is we're
actually gonna add a symbol right here.
4:44
We're gonna add the carrot, and that means
the beginning of the string.
4:49
Okay.
4:54
And to compliment that, right down here
right after that closing parenthesis,
4:54
we're gonna put in a dollar sign, which
marks the end of the string.
4:59
'Kay, we've got another trick we're gonna
do for
5:04
that in just a minute, but remember those.
5:06
So Tim doesn't have a last name.
5:10
So we'll mark those as completely
optional.
5:12
And everybody's got email.
5:15
I don't think there's anything we need to
change on email.
5:17
And some of them.
5:20
Let's see.
5:22
I think they all have phone numbers.
5:23
Some of them, however, don't have jobs
listed.
5:25
So, rather, they have jobs listed.
5:30
They may not have if they don't have a
phone number,
5:33
then we mar, oh, sorry, yeah, we wanna
change this.
5:37
A phone number is optional.
5:41
We wanna make that phone number optional.
5:42
If they don't have a phone number, it
won't be there.
5:44
The tab after job, if they don't have a
Twitter account,
5:48
the tab after job will actually be a new
line.
5:51
So that tab won't be there.
5:54
We wanna mark that tab as being optional.
5:55
And really over here in the company name,
5:58
we should add in a dot as being a possible
character.
6:01
Because we've got that one, that co dot.
6:04
So we want to be able to mark that, or
catch it.
6:05
And then some of them don't have Twitter
accounts, so
6:08
let's make Twitter optional as well.
6:10
The other thing we need to add, because we
marked beginning and end of the string.
6:13
And our string is this entire thing.
6:17
We want our string to be in one line.
6:20
Right?
6:24
So what we need to do is we need to add in
re.MULTILINE.
6:25
And what that says is treat each line a
return me and count our slash in.
6:29
Treat that as the end of the string.
6:33
So, it turns our one big string into a lot
of strings,
6:35
as far as the regular expression engine is
concerned.
6:39
Okay?
6:42
If we want we can do re.M, instead of
re.MULTILINE.
6:43
So either way that's gonna work.
6:48
All right, let's try this one out.
6:51
Look at that.
We've got a whole lot more stuff.
6:55
I do believe we've got everything for
everyone.
6:56
There's the doctor, even with his big
email address.
6:59
[BLANK_AUDIO]
7:02
We got Tim.
7:05
We got everybody in there.
7:06
All of our stuff is there.
7:07
So that's amazing.
7:09
That's awesome.
7:09
So, what we wanna do now though, is we
wanna make this regular expression.
7:11
It's really handy as it is, but
7:15
it's just giving us out a list of tuples
when we do this find all.
7:17
And no matter what we did, we would only
get tuples, and
7:21
we would get like index positions.
7:24
What I wanna do though, is I wanna be able
to turn this into a dictionary, so
7:26
that I can use that dictionary and do
something else with it.
7:29
So let's take our groups and make them
named groups.
7:34
So the way that we do that, we don't have
to change any of our code.
7:38
Our code gets to stay the same.
7:41
We just add on a couple of things.
7:42
We add a question mark and a p, and this
is what makes it a name.
7:43
And then we specify the name inside of
less than and greater than signs.
7:48
So we're gonna name this first group name,
cuz that's what it is.
7:52
The second group we're going to name
email.
7:58
The third group we will name phone.
8:02
The fourth group we'll name job.
8:06
And the last group, we'll name Twitter.
8:10
All right?
I think that's pretty good.
8:16
But let's actually, instead of doing all
of this here and, and
8:18
printing, let's make this a little easier
for ourselves.
8:23
Let's say line equals and let's do a
search.
8:25
[BLANK_AUDIO]
8:28
And then we need to get rid of one of
these.
8:30
All right?
So line is a search.
8:32
For right now it's just gonna be that
first line.
8:34
It's just gonna be me.
8:35
But we can print out what this gets.
8:37
So let's print out line.
8:39
[BLANK_AUDIO]
8:41
And then let's also print out line
line.groupdict.
8:43
And let's see what's these two things do.
8:50
So, okay, let's come down here, address
book.
8:52
So when we print out line, we get this
match objects.
8:55
All right.
8:57
And the match object catches a whole bunch
of stuff.
8:58
But when we print the dictionary, look
what we get.
9:00
We've got the dictionary that has the name
and email address, and the job.
9:02
Yeah, it gets the slash t on the job, but
that's okay.
9:06
And Twitter gets kennethlove and the phone
gets the phone number.
9:10
So we got all this stuff.
9:12
That's so much better than what we've
gotten before when we
9:14
were just getting these tuples.
9:17
So our next video, we've got just two last
big steps and
9:20
we'll have turned this in to something
absolutely amazing.
9:24
Wow, using groups, especially named
groups,
9:26
makes our string almost act like an object
or dictionary.
9:29
We've turned a simple string into really
useful data, good job, us.
9:32
All right, just a bit more to go, and
we'll have this in the bag.
9:37
In our next video, let's look at making
reusable patterns, and
9:39
how we can loop over our addresses in a
more useful manner.
9:43
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up