Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Analyzing Books with Pandas!
You have completed Analyzing Books with Pandas!
Preview
Determine which author in the dataset has written the most pages and more.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Welcome back.
0:00
In this video we're going to tackle
some questions around pages.
0:01
Let's add them now.
0:04
Markdown, and here we go.
0:08
Who wrote the most pages?
0:12
Another markdown,
what's an author's average page count?
0:18
And one more.
0:32
How many books have been written
with less than 200 pages?
0:35
Let's tackle who wrote the most pages.
0:45
First, we need to find all of
the unique authors in the dataset
0:52
since many authors wrote multiple books.
0:57
We're going to run books.
1:00
Where authors.
1:03
Are unique.
1:07
And you can see it gives us an array
of all the different authors.
1:12
Now we need to work out
how to get the sum for
1:17
all of the pages related
to a specific author.
1:20
Let's set this equal to a variable
right now, all_authors.
1:25
And then let's do books.loc,
1:33
where books, authors, and
1:38
let's just do Stephen King.
1:43
Cuz I know he's a relatively
famous author, and
1:48
I know he's written multiple books, so
I feel like he's a good example to use.
1:52
Num_pages.
1:59
Okay, so we can see all of the IDs,
and then all of the page counts, or
2:03
the number of pages for all of the books
that have the author of Stephen King.
2:08
So it sorted the books by all
the books that have authors that
2:13
equals Stephen King.
2:17
And it's only returning
the number of pages column.
2:19
So we can see it's quite a lot.
2:23
Now, a fun thing we can do here at the end
that makes our lives a lot easier.
2:25
We just add .sum, and
it will sum it all up for us.
2:29
So we get a total of 1,800.
2:34
No sorry, we get a total of
18,219 pages for Stephen King.
2:38
Now, let's think this through.
2:46
We know how to get all of our authors, and
2:48
we know how to get a single author's
page count to see who has the most.
2:51
So we're going to need to compare
all of the author's page totals, and
2:57
then see who has the highest value.
3:02
There are a few different
ways to tackle this.
3:05
One way is to create a max variable,
3:07
I'm going to put it up here at the top,
and set it equal to zero.
3:11
And then we can loop through our
authors to calculate their page total.
3:18
Compare it to this max value.
3:23
And if it's larger,
then we can update the value.
3:26
And let's also hold the author's name
as well so we know who we end up with.
3:30
And I'm going to do top_author, and
I'm going to set it equal to None for
3:35
now so
that we can set it as an author's name.
3:40
So let's turn this into a loop.
3:44
So we need to get all of our authors and
now we need to loop through them all.
3:47
So for author in all_authors.
3:52
We're going to do, tab this over, and
3:58
this is going to be our
total_pages equals.
4:02
And instead of Stephen King,
we need to pass in our author so
4:08
that we get to each author
as it loops through.
4:13
Awesome, and
then next we need to check if their
4:22
total_pages is greater than
the current max value.
4:27
If it is then, the max needs to
now be set equal to total pages so
4:35
that they now have the top spot.
4:40
And our top author now is going
to be set equal to that author.
4:45
And then let's print out the max value.
4:53
And let's print out the author or
the top_author.
4:57
Actually, it doesn't matter cuz
they will be the same thing.
5:01
And then at the end, outside of our for
5:05
loop, I'm going to print the max again,
5:10
and print the author.
5:15
And this is just so
we can see as the for loop is running,
5:20
which authors kind of
take over the top spot,
5:24
the leaderboard and then at the end,
who came out on top.
5:28
And I think something, I think this
one I need to do as top_author.
5:34
That was my mistake.
5:40
Let's run it again.
5:42
Okay, so we can see it's running and
we got J.K Rowling, and
5:44
then another form of J.K Rowling cuz
sometimes it's not splitting them up but
5:47
that's okay for
what we're doing right now.
5:52
And then we got J.R Tolkien, and
then we got Stephen King, and
5:55
then Stephen King ended up being
our top author with 18,219 pages.
5:59
Awesome.
6:04
Now our next question,
what's an author's average page count?
6:05
We can use the same count code from above.
6:09
So.
6:12
We got our total pages.
6:15
I'm just going to copy this.
6:17
Row here, and paste it.
6:21
And I'm going to do their pages.
6:25
And then the same thing as before,
6:29
I'm just going to use
Stephen King as our example.
6:31
Just cuz he just won the top
number of pages written.
6:37
And then so we got them their number of
pages now we need to know the number of
6:45
books that they've written.
6:49
So their books, we can do
6:51
books where the authors is
6:56
equal to Stephen King.
7:02
And we can do.
7:07
Okay, so we can wrap this in
a parentheses and then do value_counts.
7:11
And looks like we have some trues and
false for when that is equal.
7:19
And let's return just this first value.
7:25
So we can see that they have 40
bucks where the author ends up being
7:27
Stephen King.
7:32
So this will be their books.
7:35
Now for a bit of math.
7:41
Their average_pages is
7:43
their pages divided by their books.
7:48
And then we can print their average pages,
7:56
and we get about 455 pages per book.
8:02
And then all you have to do if you wanna
see a different author is just switch out
8:10
the author's name to someone else.
8:14
I could do J.R.R Tolkien,
make sure I spell that right.
8:16
I did not, I-E-N.
8:23
And make sure I have the right
number of periods and things, cool.
8:27
Copy it, and paste it,
and run it, 737 pages.
8:33
Wow.
8:40
I can't imagine writing
that many pages for novel.
8:41
Lastly, we have how many books have
been written with less than 200 pages?
8:45
I wanna give you this to
try on your own first.
8:50
Pause me and see what you come up with,
then unpause me and see what I wrote.
8:53
Okay, so we need to filter our books
9:00
where the number of
pages is less than 200.
9:06
Cool.
9:14
And then to figure out how many there are,
9:15
we can actually just use
len to get the length.
9:18
And it looks like there are 2,898 books
with less than 200 pages in our dataset.
9:23
Nice job, Pythonistas.
9:30
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up