EasyXLS

How to convert HTML file to Excel in Python

EasyXLS Excel library can be used to convert Excel file formats with Python on Windows, Linux, Mac or other operating systems. The integration vary depending on the operating system or if the bridge for .NET Framework of Java is chosen:

EasyXLS on Windows using .NET Framework with Python

If you opt for the .NET version of EasyXLS, the below code requires Pythonnet, a bridge between Python and .NET Framework.

Step 1: Download and install EasyXLS Excel Library for .NET

To download the trial version of EasyXLS Excel Library, press the below button:

Download EasyXLS™ Excel Library for .NET

If you already own a license key, you may login and download EasyXLS from your account.

Step 2: Install Pythonnet

For the installation you need to run "pip" command as it follows. Pip is a package-management system used to install and manage software packages written in Python.
<Python installation path>\Scripts>pip install "pythonnet.whl"

Step 3: Include EasyXLS library into project

EasyXLS.dll must be added to your project. EasyXLS.dll can be found after installing EasyXLS, in "Dot NET version" folder.

Step 4: Run Python code that converts HTML file to Excel

Execute the following Python code that converts HTML file to Excel.

"""-------------------------------------------------------------
Tutorial 40

This tutorial shows how to convert HTML file to Excel in Python.
The HTML file generated by Tutorial 31 is imported, some data is
modified and after that is exported as Excel file.
-------------------------------------------------------------"""

import clr
import gc

clr.AddReference('EasyXLS')
from EasyXLS import *

print("Tutorial 40\n-----------\n")

# Create an instance of the class used to import/export Excel files
workbook = ExcelDocument()

# Import HTML file
print("Reading file C:\\Samples\\Tutorial31.html")

if workbook.easy_LoadHTMLFile("C:\\Samples\\Tutorial31.html"):
    # Set worksheet name
    workbook.easy_getSheetAt(0).setSheetName("First tab")

    # Add new worksheet and add some data in cells (optional step)
    workbook.easy_addWorksheet("Second tab")
    xlsTable = workbook.easy_getSheetAt(1).easy_getExcelTable()
    xlsTable.easy_getCell("A1").setValue("Data added by Tutorial40")

    for column in range(5):
        xlsTable.easy_getCell(1, column).setValue("Data " + str(column + 1))

    # Export Excel file
    print("\nWriting file C:\\Samples\\Tutorial40 - convert HTML to Excel.xlsx.")
    workbook.easy_WriteXLSXFile("C:\\Samples\\Tutorial40 - convert HTML to Excel.xlsx")

    # Confirm conversion of HTML to Excel
    sError = workbook.easy_getError()

    if sError == "":
        print("\nFile successfully created.\n\n")
    else:
            print("\nError encountered: " + sError + "\n\n")
else:
    print("\nError reading file C:\\Samples\\Tutorial31.html" + workbook.easy_getError() + "\n\n")

# Dispose memory
gc.collect()

EasyXLS on Linux, Mac, Windows using Java with Python

If you opt for the Java version of EasyXLS, a similar code as above requires Py4J, Pyjnius or any other bridge between Python and Java.

Step 1: Download and install EasyXLS Excel Library for Java

To download the trial version of EasyXLS Excel Library, press the below button:

Download EasyXLS™ Excel Library for Java

If you already own a license key, you may login and download EasyXLS from your account.

Step 2: Install Py4j

For the Py4j installation you need to run "pip" command as it follows. Pip is a package-management system used to install and manage software packages written in Python.
<Python installation path>\Scripts>pip install "py4j.whl"

Step 3: Create additional Java program

The following Java code needs to be running in the background prior to executing the Python code.

import py4j.GatewayServer;

public class GatewayServerApp {
  public static void main(String[] args) {
    GatewayServerApp app = new GatewayServerApp();
    // app is now the gateway.entry_point
    GatewayServer server = new GatewayServer(app);
    server.start();
  }
}


Step 4: Add py4j library to CLASSPATH

py4j.jar must be added to your classpath of the additional Java program. py4j.jar can be found after installing Py4j, in "<Python installation path>\share\py4j" folder.

Step 5: Add EasyXLS library to CLASSPATH

EasyXLS.jar must be added to your classpath of the additional Java program. EasyXLS.jar can be found after installing EasyXLS, in "Lib" folder.

Step 6: Run additional Java program

Start the gateway server application and it will implicitly start Java Virtual Machine as well.

Step 7: Run Python code that converts HTML file to Excel

Execute a code as below Python code that converts HTML file to Excel.

"""------------------------------------------------------------------
Tutorial 40

This tutorial shows how to convert HTML file to Excel in Python. The
HTML file generated by Tutorial 31 is imported, some data is modified
and after that is exported as Excel file.
------------------------------------------------------------------"""

import gc

from py4j.java_gateway import JavaGateway
from py4j.java_gateway import java_import 
gateway = JavaGateway()

java_import(gateway.jvm,'EasyXLS.*')

print("Tutorial 40\n-----------\n")

# Create an instance of the class used to import/export Excel files
workbook = gateway.jvm.ExcelDocument()

# Import HTML file
print("Reading file C:\\Samples\\Tutorial31.html")
		
if workbook.easy_LoadHTMLFile("C:\\Samples\\Tutorial31.html"):
    # Set worksheet name
    workbook.easy_getSheetAt(0).setSheetName("First tab")

    # Add new worksheet and add some data in cells (optional step)
    workbook.easy_addWorksheet("Second tab")
    xlsTable = workbook.easy_getSheetAt(1).easy_getExcelTable()
    xlsTable.easy_getCell("A1").setValue("Data added by Tutorial40")
						
    for column in range(5):
        xlsTable.easy_getCell(1, column).setValue("Data " + str(column + 1))

    # Export Excel file
    print("\nWriting file C:\\Samples\\Tutorial40 - convert HTML to Excel.xlsx.")
    workbook.easy_WriteXLSXFile("C:\\Samples\\Tutorial40 - convert HTML to Excel.xlsx")

    # Confirm conversion of HTML to Excel
    sError = workbook.easy_getError()

    if sError == "":
        print("\nFile successfully created.\n")
    else:
            print("\nError encountered: " + sError + "\n")
else:
    print("\nError reading file C:\\Samples\\Tutorial31.html \n" + workbook.easy_getError())
		
# Dispose memory
gc.collect()

EasyXLS Excel libraries:

.NET
Excel Library for Python .NET
full .NET version to import, export or convert Excel files
-
Java
Excel Library for Python Java
full Java version to import, export or convert Excel files
Download EasyXLS™ Excel Library for Python

File formats:

MS Excel 97 - 2003
MS Excel 2007 - 2010
MS Excel 2013
MS Excel 2016
MS Excel 2019
XLSXXLSMXLSBXLS
XMLHTMLCSVTXT