How to Convert Excel to PDF in Java

How to Convert Excel to PDF in Java

Converting Excel Files to PDF using Java

Excel, a widely used spreadsheet application, is a powerful tool for data analysis and management. However, sharing Excel files can be challenging due to compatibility issues and the potential for formatting changes. Conversely, PDF is a more universal file format that preserves document formatting and can be easily shared across different platforms. In this article, we’ll explore how to convert Excel to PDF in Java using Apache POI and iText.

Introduction

iText is a widely used library for working with PDF files and offers powerful features for PDF generation and manipulation. Apache POI is an open-source library that provides comprehensive tools for reading and writing Microsoft Office files, including Excel spreadsheets. It provides a comprehensive set of features for working with Excel files, including reading and manipulating cell styles. If we specifically need to extract detailed styling information, including font colour, background colour, font styles, and other formatting details from an Excel file, Apache POI is a more suitable library.

Implementation

Step 1: Add Apache POI Dependency

Firstly, to utilize Apache POI for Excel file handling and iText for PDF generation, include the necessary dependencies in your Maven project’s pom.xml file:

<dependencies>
        <!-- Apache POI for Excel handling -->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>5.2.5</version>
        </dependency>

        <!-- iText for PDF generation -->
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itextpdf</artifactId>
            <version>5.5.13.3</version>
        </dependency>
    </dependencies>

Step 2: Load the Excel File

Load the Excel file using Apache POI’s XSSFWorkbook for .xlsx files:

public static XSSFWorkbook readExcelFile(String excelFilePath) throws IOException {
    FileInputStream inputStream = new FileInputStream(excelFilePath);
    XSSFWorkbook workbook = new XSSFWorkbook(inputStream);
    inputStream.close();
    return workbook;
}

This code snippet supports reading Excel files in the XSSF format. For older Excel formats (.xls), we need to use the HSSFWorkbook class instead.

Step 3: Create a PDF Document

Create a new PDF document using iText’s Document class:

private static Document createPDFDocument(String pdfFilePath) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(pdfFilePath));
    document.open();
    return document;
}

Step 4: Parse Excel Data and Convert to PDF Elements

Next, let’s iterate through the Excel sheets and convert the extracted data into corresponding PDF elements:

private static void createAndAddTable(XSSFSheet worksheet, Document document) throws DocumentException {
    PdfPTable table = new PdfPTable(worksheet.getRow(0).getPhysicalNumberOfCells());
    table.setWidthPercentage(100);
    addTableHeader(worksheet, table);
    addTableData(worksheet, table);
    document.add(table);
}

private static void addTableHeader(XSSFSheet worksheet, PdfPTable table) {
    Row headerRow = worksheet.getRow(0);
    for (int i = 0; i < headerRow.getPhysicalNumberOfCells(); i++) {
        Cell cell = headerRow.getCell(i);
        String headerText = cell.getCellType() == CellType.STRING ? cell.getStringCellValue() : String.valueOf(cell.getNumericCellValue());
        PdfPCell headerCell = new PdfPCell(new Phrase(headerText, new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD)));
        headerCell.setHorizontalAlignment(Element.ALIGN_CENTER);
        table.addCell(headerCell);
    }
}

private static void addTableData(XSSFSheet worksheet, PdfPTable table) {
    Iterator<Row> rowIterator = worksheet.iterator();
    while (rowIterator.hasNext()) {
        Row row = rowIterator.next();
        if (row.getRowNum() == 0) {
            continue;
        }
        for (int i = 0; i < row.getPhysicalNumberOfCells(); i++) {
            Cell cell = row.getCell(i);
            String cellValue;
            if (cell != null) {
                if (cell.getCellType() == CellType.STRING) {
                    cellValue = cell.getStringCellValue();
                } else if (cell.getCellType() == CellType.NUMERIC) {
                    cellValue = String.valueOf(cell.getNumericCellValue());
                } else {
                    cellValue = "";
                }
            } else {
                cellValue = "";
            }
            PdfPCell cellPdf = new PdfPCell(new Phrase(cellValue));
            cellPdf.setHorizontalAlignment(Element.ALIGN_CENTER);
            table.addCell(cellPdf);
        }
    }
}

The provided code snippet utilizes several PDF elements from the iText library to add table headers to a PDF document.

PdfPTable: This is the main element used for creating the table. It represents a table with rows and columns. The code snippet creates a new PdfPTable object and subsequently adds header cells to it.

PdfPCell: This element represents a single cell within the PdfPTable. Each header cell is created using a new PdfPCell object, which takes an Phrase object as its content.

Phrase: This element represents a formatted text string. The code snippet creates a new Phrase object for each header cell, setting its text and font properties.

Font: This element represents the font used to display the text within the Phrase.

Step 5: Save the Generated PDF Document

Lastly, we use iText’s PdfWriter class to write the generated PDF content to an output file:

public static void convertExcelToPDF(String excelFilePath, String pdfFilePath) throws IOException, DocumentException {
    XSSFWorkbook workbook = readExcelFile(excelFilePath);
    Document document = createPDFDocument(pdfFilePath);
    // Assuming you have only one sheet in the workbook
    XSSFSheet worksheet = workbook.getSheetAt(0);
    createAndAddTable(worksheet, document);
    document.close();
    workbook.close();
}

Testing

We’re ready to test the program. This code snippet demonstrates how to run the convertExcelToPDF() method and convert an Excel spreadsheet to a PDF document.

public static void main(String[] args) throws DocumentException, IOException {
    String excelFilePath = "/path/to/your/order_list.xlsx";
    String pdfFilePath = "/path/to/your/order_list.pdf";
    convertExcelToPDF(excelFilePath, pdfFilePath);
}

Once we have updated the file paths and imported the libraries, we can run the code snippet to convert your Excel file to PDF.

Screenshot-2023-12-07-at-10.35.27-AM-1024x648 How to Convert Excel to PDF in Java

Once the code finishes executing, check if the PDF file has been created at the specified pdfFilePath. Open the PDF to confirm that it contains all the data from our Excel spreadsheet.

Screenshot-2023-12-07-at-10.36.13-AM-1024x979 How to Convert Excel to PDF in Java

Advanced Features

Preserving Excel Formatting and Styles

Preserve Excel cell formatting and apply it to the corresponding PDF cell. This includes text alignment and font styles (bold and italic).

Background Color

The setBackgroundColor() function retrieves the background colour index from an Excel cell and applies it to the corresponding PDF cell.

private static void setBackgroundColor(Cell cell, PdfPCell cellPdf) {
    // Set background color
    short bgColorIndex = cell.getCellStyle()
        .getFillForegroundColor();
    if (bgColorIndex != IndexedColors.AUTOMATIC.getIndex()) {
        XSSFColor bgColor = (XSSFColor) cell.getCellStyle()
            .getFillForegroundColorColor();
        if (bgColor != null) {
            byte[] rgb = bgColor.getRGB();
            if (rgb != null && rgb.length == 3) {
                cellPdf.setBackgroundColor(new BaseColor(rgb[0] & 0xFF, rgb[1] & 0xFF, rgb[2] & 0xFF));
            }
        }
    }
}

Text Alignment

The setCellAlignment() method examines the Excel cell’s horizontal and vertical alignment settings and translates them to their corresponding iText7 Element alignment constants.

private static void setCellAlignment(Cell cell, PdfPCell cellPdf) {
    CellStyle cellStyle = cell.getCellStyle();

    HorizontalAlignment horizontalAlignment = cellStyle.getAlignment();
    VerticalAlignment verticalAlignment = cellStyle.getVerticalAlignment();

    switch (horizontalAlignment) {
    case LEFT:
        cellPdf.setHorizontalAlignment(Element.ALIGN_LEFT);
        break;
    case CENTER:
        cellPdf.setHorizontalAlignment(Element.ALIGN_CENTER);
        break;
    case JUSTIFY:
    case FILL:
        cellPdf.setVerticalAlignment(Element.ALIGN_JUSTIFIED);
        break;
    case RIGHT:
        cellPdf.setHorizontalAlignment(Element.ALIGN_RIGHT);
        break;
    }

    switch (verticalAlignment) {
    case TOP:
        cellPdf.setVerticalAlignment(Element.ALIGN_TOP);
        break;
    case CENTER:
        cellPdf.setVerticalAlignment(Element.ALIGN_MIDDLE);
        break;
    case JUSTIFY:
        cellPdf.setVerticalAlignment(Element.ALIGN_JUSTIFIED);
        break;
    case BOTTOM:
        cellPdf.setVerticalAlignment(Element.ALIGN_BOTTOM);
        break;
    }
}

Font Style

The getCellStyle() method extracts various font properties from the Excel cell and translates them to iText7 Font settings. It handles font color, italics, strikethrough, underline, size, bold, and family name.

private static Font getCellStyle(Cell cell) throws DocumentException, IOException {
    Font font = new Font();
    CellStyle cellStyle = cell.getCellStyle();
    org.apache.poi.ss.usermodel.Font cellFont = cell.getSheet()
        .getWorkbook()
        .getFontAt(cellStyle.getFontIndexAsInt());

    short fontColorIndex = cellFont.getColor();
    if (fontColorIndex != IndexedColors.AUTOMATIC.getIndex() && cellFont instanceof XSSFFont) {
        XSSFColor fontColor = ((XSSFFont) cellFont).getXSSFColor();
        if (fontColor != null) {
            byte[] rgb = fontColor.getRGB();
            if (rgb != null && rgb.length == 3) {
                font.setColor(new BaseColor(rgb[0] & 0xFF, rgb[1] & 0xFF, rgb[2] & 0xFF));
            }
        }
    }

    if (cellFont.getItalic()) {
        font.setStyle(Font.ITALIC);
    }

    if (cellFont.getStrikeout()) {
        font.setStyle(Font.STRIKETHRU);
    }

    if (cellFont.getUnderline() == 1) {
        font.setStyle(Font.UNDERLINE);
    }

    short fontSize = cellFont.getFontHeightInPoints();
    font.setSize(fontSize);

    if (cellFont.getBold()) {
        font.setStyle(Font.BOLD);
    }

    String fontName = cellFont.getFontName();
    if (FontFactory.isRegistered(fontName)) {
        font.setFamily(fontName); // Use extracted font family if supported by iText
    } else {
        logger.warn("Unsupported font type: {}", fontName);
        // - Use a fallback font (e.g., Helvetica)
        font.setFamily("Helvetica");
    }

    return font;
}

Generating Multiple PDF Pages from a Single Excel File

If the Excel file contains multiple sheets or sections, we can generate multiple PDF pages by processing each sheet or section separately:

public static void convertExcelToPDF(String excelFilePath, String pdfFilePath) throws IOException, DocumentException {
    XSSFWorkbook workbook = readExcelFile(excelFilePath);
    Document document = createPDFDocument(pdfFilePath);

    for (int i = 0; i < workbook.getNumberOfSheets(); i++) {
        XSSFSheet worksheet = workbook.getSheetAt(i);
        createAndAddTable(worksheet, document);

        // Add a new page for each sheet (except the last one)
        if (i < workbook.getNumberOfSheets() - 1) {
            document.newPage();
        }
    }

    document.close();
    workbook.close();
}

Embedding Images and Other Embedded Objects

If the Excel file contains embedded images, extract and embed them into the PDF document:

// Add header with sheet name as title
Paragraph title = new Paragraph(worksheet.getSheetName(), new Font(Font.FontFamily.HELVETICA, 18, Font.BOLD));
title.setSpacingAfter(20f);
title.setAlignment(Element.ALIGN_CENTER);
document.add(title);

// Add logo
Image logo = Image.getInstance("/path/to/your/logo.png");
logo.setAlignment(Element.ALIGN_CENTER);
logo.scaleToFit(100, 100); 
document.add(logo);

Conclusion

In conclusion, this article has provided a step-by-step guide on converting Excel files to PDF in Java using Apache POI and iText. By combining the capabilities of Apache POI for Excel handling and iText for PDF generation, you can seamlessly preserve formatting, apply styles, and embed images from Excel to PDF. This approach ensures a smooth transition between these two widely used file formats, facilitating efficient data sharing and distribution.

Download the full source code at GitHub.

Share this content:

Leave a Comment

Discover more from nnyw@tech

Subscribe now to keep reading and get access to the full archive.

Continue reading