Introduction I recently had the need to take a couple pages out of a PDF and save it to a new PDF. This is a fairly simple task but every time I do it, it takes some time to figure out the right command line parameters to make it work. In addition, my co-workers wanted similar functionality and since they are not comfortable on the command line, I wanted to build a small graphical front end for this task. One solution is to use Gooey which is a really good option that I cover in my prior article. However, I wanted to try out another library and decided to give appJar a try. This article will walk through an example of using appJar to create a GUI that allows a user to select a PDF, strip out one or more pages and save it to a new file. This approach is simple, useful and shows how to integrate a GUI into other python applications you create.

The State of GUI ’s in Python One of the most common questions on the python subreddit is something along the lines of “What GUI should I use?” There are no shortage of options but there’s a pretty steep learning curve for many of them. In addition, some work to varying degrees on different platforms and many have been dormant for quite a while. It is not an easy question to answer. From a high level, the big GUI categories are: Qt

WxWindows

Tkinter

Custom libraries (Kivy, Toga, etc)

Web technology based solutions ( HTML , Chrome-based, etc) In addition to this ecosystem, there are several types of wrapper and helper apps to make development simpler. For example, Gooey is a nice way to leverage argparse to build a WxPython GUI for free. I have had a lot of success using this approach to enable end users to interact with my python scripts. I highly recommend it, especially since wxWindows will now work on python 3. The downside to Gooey is that there is limited ability to construct an application outside of the “Gooey way.” I wanted to see what else was out there that met the following requirements: Is simple to use for a quick and dirty script

Provides more interaction options than a typical command line approach

Works and looks decent on Windows

Is easy to install

Is actively maintained

Works on python 3

Runs quickly

Cross-platform on Linux is a plus It turns out the appJar fits my criteria pretty well.

What is appJar appJar was developed by an educator, who wanted a simpler GUI creation process for his students. The application provides a wrapper around Tkinter (which ships by default with python) and takes away a lot of the challenging boilerplate of creating an application. The application is under active development. In fact, a new release was made as I pulled this article together. The documentation is extensive and has pretty good examples. It only took me a couple of hours of playing around with the code to get a useful application up and running. I suspect I will use this final application on a frequent basis when I need to pull select pages out of a pdf document. I may also expand it to allow concatenation of multiple documents into a new one. Before I go much further, I want to address Tkinter. I know that Tkinter has a really bad reputation for not looking very modern. However, the newer ttk themes do look much better and I think that the final app looks pretty decent on Windows. On linux, it’s not a work of art, but it does work. At the end of the day, this blog is about helping you create solutions that are quick and powerful and get the job done. If you want a really polished GUI that looks native on your OS, you may need to investigate some of the more full featured options. If you want to get something done quickly, that works; then appJar is worth considering. In order to give you a sense of how it looks, here is the final app running on Windows: It’s pretty good looking in my opinion.

Solving the Problem The goal of this program is to make it quick and easy to take a subset of pages out of a PDF file and save it into a new file. There are many programs that can do this in Windows but I have found that many of the “free” ones have ads or other bloated components. The command line works but sometimes a GUI is much simpler - especially when navigating lots of file paths or trying to explain to less technical users. In order to do the actual PDF manipulation, I’m using the pypdf2 library. The python pdf toolkit ecosystem is kind of confusing but this library seems to have been around a long time and more recently has seen an uptick of activity on github. The other nice aspect is that PyPDF2 is covered in Automate The Boring Stuff so there is a body of additional examples out there. Here’s the start of a simple script that has a hardcoded input, output and page range. from PyPDF2 import PdfFileWriter , PdfFileReader infile = "Input.pdf" outfile = "Output.pdf" page_range = "1-2,6" Next, we instantiate the PdfFileWriter and PdfFileReader objects and create the actual Output.pdf file: output = PdfFileWriter () input_pdf = PdfFileReader ( open ( infile , "rb" )) output_file = open ( outfile , "wb" ) The most complicated aspect of the code is splitting up the page_range into a sequential python list of pages to extract. Stack Overflow to the rescue! page_ranges = ( x . split ( "-" ) for x in page_range . split ( "," )) range_list = [ i for r in page_ranges for i in range ( int ( r [ 0 ]), int ( r [ - 1 ]) + 1 )] The final step is to copy the page from the input and save to the output: for p in range_list : # Subtract 1 to deal with 0 index output . addPage ( input_pdf . getPage ( p - 1 )) output . write ( output_file ) That is all pretty simple and is yet another example of how powerful python can be when it comes to solving real world problems. The challenge is that this approach is not very useful when you want to let other people interact with it.

Building the appJar GUI Now we can walk through integrating that code snippet into a GUI that will: Allow user to select a PDf file using a standard file explorer GUI

Select an output directory and file name

Type in a custom range to extract pages

Have some error checking to make sure users enter the right information The first step is to install appJar with pip install appjar . The actual coding starts with importing all the components we need: from appJar import gui from PyPDF2 import PdfFileWriter , PdfFileReader from pathlib import Path Next, we can build up the basic GUI app: # Create the GUI Window app = gui ( "PDF Splitter" , useTtk = True ) app . setTtkTheme ( "default" ) app . setSize ( 500 , 200 ) The first 3 lines set up the basic structure of the app. I have decided to set useTtk=True because the app looks a little better when this is enabled. The downsides are that Ttj is still in beta but for this simple app, it works well for me. I also chose to set the theme to default in this article. On a Windows system, I set it to ‘vista’ which looks better in my opinion. If you want to see all the themes available on a system use app.getTtkThemes() and experiment with those values. Here is a summary of how the different themes look on Windows and Ubuntu. Some of the distinctions are subtle so feel free to experiment and see what you prefer. The next step is to add the labels and data entry widgets: # Add the interactive components app . addLabel ( "Choose Source PDF File" ) app . addFileEntry ( "Input_File" ) app . addLabel ( "Select Output Directory" ) app . addDirectoryEntry ( "Output_Directory" ) app . addLabel ( "Output file name" ) app . addEntry ( "Output_name" ) app . addLabel ( "Page Ranges: 1,3,4-10" ) app . addEntry ( "Page_Ranges" ) For this application, I chose to explicitly call out the Label, then the Entry. appJar also support a combined widget called LabelEntry which puts everything on one line. In my experience, the choice comes down to ascetics so play around with the options and see which ones look good in your application. The most important thing to remember at this point is that the text enclosed in the Entry variables will be used to get the actual value entered. The next step is to add the buttons. This code will add a “Process” and “Quit” button. When either button is pressed, it will call the press function: # link the buttons to the function called press app . addButtons ([ "Process" , "Quit" ], press ) Finally, make the application go: # start the GUI app . go () This basic structure accomplishes most of the GUI work. Now, the program needs to read in any input, validate it and execute the PDF splitting (similar to the example above). The first function we need to define is press. This function will be called when either of the buttons is pressed. def press ( button ): if button == "Process" : src_file = app . getEntry ( "Input_File" ) dest_dir = app . getEntry ( "Output_Directory" ) page_range = app . getEntry ( "Page_Ranges" ) out_file = app . getEntry ( "Output_name" ) errors , error_msg = validate_inputs ( src_file , dest_dir , page_range , out_file ) if errors : app . errorBox ( "Error" , "

" . join ( error_msg ), parent = None ) else : split_pages ( src_file , page_range , Path ( dest_dir , out_file )) else : app . stop () This function takes one parameter, button which will be defined as either “Process” or “Quit”. If the user selects quit, then app.stop() will shut down the app. If the process button is clicked, then the input values are retrieved using app.getEntry() . Each value is stored and then validated by calling the validate_inputs function. If there are errors, we can display them using a popup box - app.errorBox . If there are no errors, we can split the file up using split_pages . Let’s look at the validate_inputs function. def validate_inputs ( input_file , output_dir , range , file_name ): errors = False error_msgs = [] # Make sure a PDF is selected if Path ( input_file ) . suffix . upper () != ".PDF" : errors = True error_msgs . append ( "Please select a PDF input file" ) # Make sure a range is selected if len ( range ) < 1 : errors = True error_msgs . append ( "Please enter a valid page range" ) # Check for a valid directory if not ( Path ( output_dir )) . exists (): errors = True error_msgs . append ( "Please Select a valid output directory" ) # Check for a file name if len ( file_name ) < 1 : errors = True error_msgs . append ( "Please enter a file name" ) return ( errors , error_msgs ) This function executes a couple of checks to make sure there is data in the fields and that it is valid. I do not claim this will stop all errors but it does give you an idea of how to check everything and how to collect errors in a list. Now that all the data is collected and validated, we can call the split function to process the input file and create an output file with a subset of the data. def split_pages ( input_file , page_range , out_file ): output = PdfFileWriter () input_pdf = PdfFileReader ( open ( input_file , "rb" )) output_file = open ( out_file , "wb" ) # https://stackoverflow.com/questions/5704931/parse-string-of-integer-sets-with-intervals-to-list page_ranges = ( x . split ( "-" ) for x in page_range . split ( "," )) range_list = [ i for r in page_ranges for i in range ( int ( r [ 0 ]), int ( r [ - 1 ]) + 1 )] for p in range_list : # Need to subtract 1 because pages are 0 indexed try : output . addPage ( input_pdf . getPage ( p - 1 )) except IndexError : # Alert the user and stop adding pages app . infoBox ( "Info" , "Range exceeded number of pages in input.

File will still be saved." ) break output . write ( output_file ) if ( app . questionBox ( "File Save" , "Output PDF saved. Do you want to quit?" )): app . stop () This function introduces a couple of additional appJar concepts. First, the app.InfoBox is used to let the user know when they enter a range that includes more pages than in the document. I have made the decision to just process through the end of the file and let the user know. Once that file is saved, the program uses the app.questionBox to ask the user if they want to continue or not. If so, then we use app.stop() to gracefully exit.

The Complete Code All of the code will be stored on github but here is the final solution: from appJar import gui from PyPDF2 import PdfFileWriter , PdfFileReader from pathlib import Path # Define all the functions needed to process the files def split_pages ( input_file , page_range , out_file ): """ Take a pdf file and copy a range of pages into a new pdf file Args: input_file: The source PDF file page_range: A string containing a range of pages to copy: 1-3,4 out_file: File name for the destination PDF """ output = PdfFileWriter () input_pdf = PdfFileReader ( open ( input_file , "rb" )) output_file = open ( out_file , "wb" ) # https://stackoverflow.com/questions/5704931/parse-string-of-integer-sets-with-intervals-to-list page_ranges = ( x . split ( "-" ) for x in page_range . split ( "," )) range_list = [ i for r in page_ranges for i in range ( int ( r [ 0 ]), int ( r [ - 1 ]) + 1 )] for p in range_list : # Need to subtract 1 because pages are 0 indexed try : output . addPage ( input_pdf . getPage ( p - 1 )) except IndexError : # Alert the user and stop adding pages app . infoBox ( "Info" , "Range exceeded number of pages in input.

File will still be saved." ) break output . write ( output_file ) if ( app . questionBox ( "File Save" , "Output PDF saved. Do you want to quit?" )): app . stop () def validate_inputs ( input_file , output_dir , range , file_name ): """ Verify that the input values provided by the user are valid Args: input_file: The source PDF file output_dir: Directory to store the completed file range: File A string containing a range of pages to copy: 1-3,4 file_name: Output name for the resulting PDF Returns: True if error and False otherwise List of error messages """ errors = False error_msgs = [] # Make sure a PDF is selected if Path ( input_file ) . suffix . upper () != ".PDF" : errors = True error_msgs . append ( "Please select a PDF input file" ) # Make sure a range is selected if len ( range ) < 1 : errors = True error_msgs . append ( "Please enter a valid page range" ) # Check for a valid directory if not ( Path ( output_dir )) . exists (): errors = True error_msgs . append ( "Please Select a valid output directory" ) # Check for a file name if len ( file_name ) < 1 : errors = True error_msgs . append ( "Please enter a file name" ) return ( errors , error_msgs ) def press ( button ): """ Process a button press Args: button: The name of the button. Either Process of Quit """ if button == "Process" : src_file = app . getEntry ( "Input_File" ) dest_dir = app . getEntry ( "Output_Directory" ) page_range = app . getEntry ( "Page_Ranges" ) out_file = app . getEntry ( "Output_name" ) errors , error_msg = validate_inputs ( src_file , dest_dir , page_range , out_file ) if errors : app . errorBox ( "Error" , "

" . join ( error_msg ), parent = None ) else : split_pages ( src_file , page_range , Path ( dest_dir , out_file )) else : app . stop () # Create the GUI Window app = gui ( "PDF Splitter" , useTtk = True ) app . setTtkTheme ( "default" ) app . setSize ( 500 , 200 ) # Add the interactive components app . addLabel ( "Choose Source PDF File" ) app . addFileEntry ( "Input_File" ) app . addLabel ( "Select Output Directory" ) app . addDirectoryEntry ( "Output_Directory" ) app . addLabel ( "Output file name" ) app . addEntry ( "Output_name" ) app . addLabel ( "Page Ranges: 1,3,4-10" ) app . addEntry ( "Page_Ranges" ) # link the buttons to the function called press app . addButtons ([ "Process" , "Quit" ], press ) # start the GUI app . go ()