Skip to content

Python xlrd Module: A Comprehensive Guide

The xlrd module in Python is used for reading data from Excel files (.xls and .xlsx). It allows you to extract information from spreadsheets in a programmatic way. This guide covers the key features and functionalities of the xlrd module with detailed examples.

Introduction to xlrd

The xlrd module allows you to read data from Excel files. It supports older .xls files and newer .xlsx files, though the latter requires a version of xlrd that includes support for newer file formats.

Installation

To use xlrd, you need to install it via pip. You can install it with the following command:

pip install xlrd

Note: As of xlrd version 2.0.0, support for .xlsx files has been removed. To handle .xlsx files, you might need to use other libraries like openpyxl or pandas. For .xls files, xlrd is still appropriate.

Reading Excel Files

To read data from an Excel file, you first need to open it using xlrd and then access the worksheets.

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

Accessing Worksheet Data

You can access worksheet data and get information about the sheet, such as its name and the number of rows and columns.

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

# Get sheet name
print("Sheet name:", sheet.name)

# Get number of rows and columns
print("Number of rows:", sheet.nrows)
print("Number of columns:", sheet.ncols)

Reading Cell Values

To read the value of a specific cell, use the cell_value() method.

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

# Read the value of a specific cell
value = sheet.cell_value(0, 0)  # Read value from first row, first column
print("Cell value:", value)

Handling Formulas

xlrd does not evaluate formulas; it only reads the formula string. To handle formulas, you need to parse the formula string yourself or use other libraries that can evaluate formulas.

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example_with_formula.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

# Read the formula string from a cell
formula = sheet.cell_formula(0, 1)  # Read formula from first row, second column
print("Cell formula:", formula)

Iterating Over Rows and Columns

You can iterate over rows and columns to read data from multiple cells.

Iterating Over Rows

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

# Iterate over rows
for row_num in range(sheet.nrows):
    row = sheet.row_values(row_num)
    print("Row {}: {}".format(row_num, row))

Iterating Over Columns

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access the first sheet
sheet = workbook.sheet_by_index(0)

# Iterate over columns
for col_num in range(sheet.ncols):
    col = sheet.col_values(col_num)
    print("Column {}: {}".format(col_num, col))

Handling Multiple Sheets

You can work with multiple sheets by accessing them by name or index.

Accessing Sheets by Name

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# Access a sheet by name
sheet = workbook.sheet_by_name('Sheet1')
print("Sheet name:", sheet.name)

Listing All Sheet Names

import xlrd

# Open the Excel file
workbook = xlrd.open_workbook('example.xls')

# List all sheet names
sheet_names = workbook.sheet_names()
print("Sheet names:", sheet_names)

Error Handling

Handling errors is important when dealing with file operations and data extraction.

import xlrd

try:
    # Open the Excel file
    workbook = xlrd.open_workbook('example.xls')

    # Access the first sheet
    sheet = workbook.sheet_by_index(0)

    # Read the value of a specific cell
    value = sheet.cell_value(0, 0)
    print("Cell value:", value)

except FileNotFoundError:
    print("The file was not found.")

except xlrd.biffh.XLRDError as e:
    print(f"An error occurred while reading the Excel file: {e}")

Conclusion

The xlrd module is a useful tool for reading data from Excel files. It provides a straightforward way to access and extract information from .xls files, although support for .xlsx files has been removed in recent versions. For more advanced features or .xlsx support, consider using libraries like openpyxl or pandas. By understanding the key functionalities of xlrd, you can effectively work with Excel files and integrate them into your Python applications.