Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trial 
   
    Serdar Halac
15,259 PointsWhy does [^gov\t] not get rid of the ".com" part as well as the .gov part?
I understand that it gets rid of the .gov by essentially saying "oh it seems that's theres a g, which is part of the set [gov\t], so I will ignore starting from g".
But why do the .com extensions still get found? .com contains the letter 'o', which is part of the set [gov\t]. Therefore shouldn't it go through the document, see the 'o' in the .com, and say "I'm stopping from here on out, because the letter o is part of the set I'm suppose to ignore" and just return an email with '.c' at the end instead of '.com'?
And actually, why do the @treehouse extensions also get included? 'treehouse' contains the letter 'o', so why does it not stop after 'treeh'? Since it's supposed to exclude either 'g', 'o', or 'v'.
I'm very confused!
Help would be greatly appreciated.
TLDR: [gov\t] should block both .GOV and .cOm, no?
1 Answer
 
    Steven Parker
243,134 PointsThe explanation given for this particular regex seems a bit misleading. But the actual explanation is a bit complicated — let me see if I can straighten it out.
First off, the caret (^) does not exclude things from the match, but from the character class that is being defined.  So '[^gov\t]" actually means "any character other than g, o, v, or tab".  Then adding the "+" quatifier to it ("[^gov\t]+") means "a match must  have at least one character in this position that is not a g, o, v or tab".
So there's four significant parts to this regex:
-  @this exact symbol (the first "\b" is redundant since "@' is not a word symbol) 
- 
[-\w.]+one or more "word" characters or hyphens or periods (the \d is redundant since "\w" includes digits) 
- 
[^gov\t]+one or more characters other than g, o, v, or tab 
- 
\ba word boundary 
So any match must qualify by having all 4 of these elements.   So  here's why "@spain." is a match:
- 
@this character matches 
- 
spainthese characters match the second part 
- 
.just the period matches the third part 
- the period is also a word boundary
The "gov" at the end is not included because only the "v" ends on a word boundary, and it is not part of the final character class.
And  here's why "@treehouse.com" is a match:
- 
@this character matches 
- 
treehouse.cothese characters match the second part 
- 
mjust the 'm' matches the third part 
- the 'm' is also ends on a word boundary since nothing follows it
And a regex is case-sensitive unless the "i" flag (ignore case) is given.
 this exact symbol (the first "\b" is redundant since "@' is not a word symbol)
 this exact symbol (the first "\b" is redundant since "@' is not a word symbol)
Serdar Halac
15,259 PointsSerdar Halac
15,259 PointsDetailed and great answer! Thank you. I played around and did [^v\t] and got the same result, which seems to indicate that the excluded letter being next to word boundary part plays a key role. Thanks for the answer!