Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Regular Expressions in Python!
You have completed Regular Expressions in Python!
Preview
Exact matches and escape patterns give us flexibility for what we're looking for, but when something occurs multiple times, we have to specify it multiple times. Counts makes Python do that work for us.
New terms
-
\w{3}
- matches any three-word characters in a row. -
\w{,3}
- matches 0, 1, 2, or 3-word characters in a row. -
\w{3,}
- matches 3 or more word characters in a row. There's no upper limit. -
\w{3, 5}
- matches 3, 4, or 5-word characters in a row. -
\w?
- matches 0 or 1-word characters. -
\w*
- matches 0 or more word characters. Since there is no upper limit, this is, effectively, infinite word characters. -
\w+
- matches 1 or more word characters. Like*
, it has no upper limit but must occur at least once. -
.findall(pattern, text, flags)
- Finds all non-overlapping occurrences of thepattern
in thetext
.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
As I'm sure you've noticed by now, having
to type \w eight times for
0:00
eight characters gets old pretty quick.
0:04
Python's reg.exe engine let's us say that
something should occur a certain number
0:07
of times.
0:10
We can say that something occurs an exact
number of times by using the curly braces.
0:11
{3} says that something occurs exactly
three times,3 says that it
0:16
occurs 0 to 3 times, and 3, says that it
occurs 3 or more times.
0:21
3,5 says that it occurs 3, 4, or 5 times.
0:26
We can also use more generic counts.
0:29
The question mark says that something is
optional, it occurs 0 or 1 times.
0:32
The asterisk says that something occurs at
least zero times,
0:37
so it can not occur at all or it can occur
hundreds of times.
0:40
There's no upper bounds to the asterisk.
0:44
And finally is the plus sign.
0:47
Like the asterisk, there's no upper bound,
but
0:49
the pattern has to occur at least once.
0:51
Let's look at using all of these to solve
some of the problems in our
0:54
blob of addresses.
0:57
So, like I said last time, I, I'm lazy and
I'm not bored and
0:58
ridiculous enough to try and type \w as
many times as there are letters in a word.
1:03
So I'm pretty lucky that I can instead
1:09
use the plus sign to find places where we
have one or more letters.
1:15
So let's try that.
1:20
So, what we said here is that we want to
find word characters with a \w.
1:23
Which those are any sort of Unicode
character and then the underscore.
1:30
I wanna find one or more of those and then
a comma, a space, and
1:37
again one or more word characters.
1:42
So let's let's give that a try.
1:45
[BLANK_AUDIO]
1:47
And, look at that.
1:50
We've got my name.
1:51
So that's pretty cool.
1:53
If we go back,
1:54
we can actually take our number search
here, and we can clean this up too.
1:55
Because we know that we've got three
numbers here.
2:00
And we know if we've got three numbers
here.
2:05
And we know if we got four numbers here.
2:09
I can say, hey, find the parenthesis, find
three digits, exactly three.
2:13
Find a closing parenthesis, find a space,
find three more numbers, a hyphen, and
2:19
then four more numbers.
2:24
So if we.
2:25
Run this one, then we get our phone number
match and our name match.
2:27
These are maybe a little bit less
readable, but
2:32
they're definitely more accepting.
2:36
In fact, let's make it just a little bit
more accepting.
2:38
We have some phone numbers.
2:40
If you look down here like say, this one
or this one,
2:42
that don't have parenthesis on them, so
let's make these parenthesis optional.
2:48
Now we do that by putting a question mark
after them.
2:53
Question mark says, this should show up
zero times or one time.
2:57
So let's try that, and.
3:03
Oh, yeah, that's only gonna run on my
line.
3:08
So let's try this on multiple lines,
3:10
and let's change our search here to be
findall.
3:12
And what findall will do is it'll move
through the whole string,
3:17
the whole data variable, and find all the
places where this doesn't overlap.
3:21
So let's try running that again.
3:27
And we see we've got a bunch of phone
numbers here.
3:29
In fact, the only one we don't have is the
one that's like this, but
3:31
with a hyphen right there.
3:35
So let's put that one in too.
3:37
We've got the optional parenthesis, but
3:39
let's put in a hyphen that is optional and
the space is actually optional too.
3:41
And let's do that as \s instead of the
actual space.
3:46
That will be a little bit more clear.
3:50
So let's save that and let's run this
again.
3:53
And we get the 555-1.
3:59
We did, there it is, 555 hyphen.
4:02
So that's great.
4:05
And you know what, I bet if we want to, we
can
4:08
take this a little bit further on our
findall.
4:14
And I bet, we can use this to find all of
our names.
4:21
So let's actually comment this one out,
just so
4:28
we've got a little bit less showing up.
4:30
And let's try running that again.
4:33
Well we got all the names, but we also get
back the jobs and company.
4:36
So not exactly what we wanted.
4:41
We'll clean that up later, but if we look
at this,
4:43
we didn't get our name here on line five,
we didn't get Tim.
4:46
The reason we didn't get Tim is because we
said one or more.
4:52
There has to be something there.
4:56
I wonder if we can make it to where that
doesn't have to be the case.
4:58
We also, if you see here, we got
Enchanter, Killer, for Tim.
5:02
And it's supposed to be Enchanter, Killer
Rabbit Cave.
5:07
So we'll worry about the Killer Rabbit
Cave in a little while.
5:12
We'll, we'll do that in a later video, but
for
5:15
now, let's see if we can get Tim in there.
5:17
So instead of this plus sign, let's do an
asterisk.
5:20
And the asterisk says it'll be zero or an
infinite number of times.
5:24
Just, this thing, if it appears, cool,
show it to me.
5:29
If it doesn't, that's fine.
5:33
Move on.
5:35
So let's try running that.
5:36
And did we get Tim?
5:38
There we go, we got Tim.
5:40
So that's awesome.
5:43
We got Tim.
5:44
We'll do more about catching the company
names after we have a couple more tools at
5:45
our disposal.
5:49
Counts definitely help a lot in writing
smaller,
5:50
if sometimes less readable, patterns.
5:53
We'll talk about a way to make patterns
more readable in the next video,
5:55
along with ways to cut characters out of
being matched.
5:58
We'll also talk about how to make our
patterns a bit more restrictive than our
6:01
scape sequences are.
6:04
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up