Read and Write PDF using OpenPDF







1560

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.

OpenPDF is based on a fork of iText version 4. iText is a widely used PDF library but they changed their license and moved to AGPL. In this article, we can see how to read and write to PDF, How to extract text from PDF and How to create password protected PDF.

Maven Dependency

<dependency>

<groupId>com.github.librepdf</groupId>

<artifactId>openpdf</artifactId>

<version>1.2.7</version>

</dependency>



<!-- Bouncy castle is required, if you want to encrypt or password protect PDF -->



<dependency>

<groupId>org.bouncycastle</groupId>

<artifactId>bcprov-jdk15on</artifactId>

<version>1.60</version>

</dependency>

Below sample will create a two page PDF. Create a Document instance and pass it to PDF writer. Add text using Paragraphs to the document instance.

public void createPDF(String filename) {

try {



Document document = new Document(PageSize.A4, 50, 50, 50, 50);



//create a PDF writer instance and pass output stream

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));



document.open();

document.addAuthor("Author-Name");

document.addCreationDate();



document.add(new Paragraph("Hello World -- Page 1"));

document.add(new Paragraph("This is my first PDF."));



document.newPage();



document.add(new Paragraph("Hello World -- Page 2"));

document.close();

}

catch(Exception exp) {

System.out.println(exp.getMessage());

}

}

Lets create a PDF with some images.

public void createImagePDF(String inImageFilename, String outFilename) {

try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);



PdfWriter.getInstance(document, new FileOutputStream(outFilename));

document.open();

document.addAuthor("Author-Name");

document.addCreationDate();



document.add(Image.getInstance(inImageFilename));

document.close();

}

catch(Exception exp) {

System.out.println(exp.getMessage());

}

}

Below sample helps to create password protected PDF. setEncryption function of PDFWriter instance takes owner password, user password and document permissions as arguments. By using user password, document can be opened but only copy and print can be performed. The access is restricted. If the document is opened using owner password, the document will have full access.

public void createPasswordProtectedPDF(String outFilename, String password) {

try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outFilename));



//Set owner password and user password.

writer.setEncryption("Hello".getBytes(),

"World".getBytes(),

PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);



document.open();

document.addAuthor("Author-Name");

document.addCreationDate();



document.add(new Paragraph("Hello World -- Page 1"));

document.add(new Paragraph("This is my first PDF."));



document.newPage();



document.add(new Paragraph("Hello World -- Page 2"));

document.close();

}

catch(Exception exp) {

System.out.println(exp.getMessage());

}

}

Let's try to read the PDF which we created. Below sample will output the complete document structure of PDF. If you want to know, how PDF is structured, below code can help.

public void readPDF(String filename) {

try {

PdfReader reader = new PdfReader(filename);

System.out.println("Document Metadata");

System.out.println(reader.getInfo().toString());

System.out.println("--");



int numPages = reader.getNumberOfPages();

for (int index =1; index <= numPages; index++) {

byte[] pageBuf = reader.getPageContent(index);

String pageContent = new String(pageBuf);

System.out.println("Page - " + index);

System.out.println(pageContent);

System.out.println("");

}

reader.close();

}

catch(Exception exp) {

System.out.println(exp.getMessage());

}

}

Now you have the PDF with some text and you want to extract its text. This would be typical use case where we want to search the PDF content.

public void extractPDF(String filename) {

try {

PdfReader reader = new PdfReader(filename);

System.out.println("Document Metadata");

System.out.println(reader.getInfo().toString());

System.out.println("--");



//Text Extraction

int numPages = reader.getNumberOfPages();

PdfTextExtractor textExtractor = new PdfTextExtractor(reader);

for (int index =1; index <= numPages; index++) {

String pageContent = textExtractor.getTextFromPage(index);

System.out.println("Page - " + index);

System.out.println(pageContent);

}

reader.close();

}

catch(Exception exp) {

System.out.println(exp.getMessage());

}

}

OpenPDF is a nice library and it has lot more features. If you have requirement to generate PDF in the backend then this library can be considered.

Reference:

Github Location - https://github.com/LibrePDF/OpenPDF

Javadocs - https://librepdf.github.io/OpenPDF/docs-1-2-7/