How to count the number of pages in a PDF file in Python

For this article there is no such prerequisite, we will use PyPDF2 library for this purpose. PyPDF2 is a free and open-source pure-Python PyPDF library capable of performing many tasks like splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well. Refer to this “ Working with PDF files in Python ” to explore about PyPDF2

Installing required library

Execute the below command to install the PyPDF2 library in the command prompt or terminal.

pip install PyPDF2

Step to Count the number of pages in a PDF file

Step 1: Import PyPDF2 library into the Python program

import PyPDF2

Step 2: Open the PDF file in read binary format using file handling

file = open('your pdf file path', 'rb')

Step 3: Read the pdf using the PdfReader() function of the PyPDF2 library

pdfReader = PyPDF2.PdfReader(file)

Note: These above three steps are similar for all methods that we are going to see using an example.

Methods to count PDF pages

We are going to learn three methods to count the number of pages in a PDF file which are as follows:

  1. By using the len(pdfReader.pages) property.
  2. By using the getNumPages() method.
  3. By using the pages property and len() function .

Method 1: Using len(pdfReader.pages) property

len(pdfReader.pages) is a property of PdfReader Class that returns the total number of pages in the PDF file.

totalPages1 = len(pdfReader.pages)

For Example:

Output:

Total Pages: 10

In the above example, we imported the PyPDF2 module and opened the file using file handling in read binary format after that with the help of PdfReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the numPages property of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalPages for further usage and at last, we print the variable holding the total page count of PDF file.

Method 2: Using getNumPages() method

getNumPages() is a method of PdfReader class that returns an integer specifying a total number of pages and it takes no argument this method is deprecated since version 1.28.0 but we can still use another method that comes in its replacement is next method discussed.

totalPages2 = pdfReader.getNumPages()

Output:

Total Pages: 10

In the above example, we imported the PyPDF2 module and opened the file using file handling in reading binary format after that with the help of the PdfReader() function of PyPDF2 module we read the pdf file that we opened previously, then with the help of getNumPages() method of the module we counted the total pages of PDF file and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.

Method 3: Using pages property and len() function

pages is a read-only property that emulates a list of Page objects and using len() function which is Python’s inbuilt function to count the length of a sequence is used combinedly to determine the total pages of the PDF.

totalPages3 = len(pdfReader.pages)

Output:

Total Pages: 10

In the above example we imported the PyPDF2 module and opened the file using file handling in read binary format then with the help of PdfReader() function of PyPDF2 module we read the pdf file which we opened previously, then with the help of the pages property of the module we get the list of all the pages of PDF file and with the help of len() function we counted the total pages returned by pages property and stored the total number of pages in a variable “totalpages” for further usage and at last, we print the variable holding the total page count of PDF file.