5.2. Format Specific Table Loader Classes

5.2.1. AbstractTableReader class

class pytablereader.interface.AbstractTableReader(source, quoting_flags, type_hints, type_hint_rules=None)[source]

Bases: TableLoaderInterface

The abstract class of table data file loader.

table_name

Table name string.

source

Table data source to load.

5.2.2. CSV Loader Classes

5.2.2.1. CSV Table Loader

class pytablereader.csv.core.CsvTableLoader(source, quoting_flags, type_hints, type_hint_rules)[source]

The abstract class of CSV table loaders.

headers

Attribute names of the table. Use the first line of the CSV file as attribute list if headers is empty.

delimiter

A one-character string used to separate fields. Defaults to ",".

quotechar

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. Defaults to '"'.

encoding

Encoding of the CSV data.

5.2.2.2. CSV File Loader

class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: CsvTableLoader

A file loader class to extract tabular data from CSV files.

Parameters:

file_path (str) – Path to the loading CSV file.

table_name

Table name string. Defaults to %(filename)s.

Examples:

Load table data from CSV

load()[source]

Extract tabular data as TableData instances from a CSV file. source attribute should contain a path to the file to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(format_name)s

"csv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.2.3. CSV Text Loader

class pytablereader.CsvTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: CsvTableLoader

A text loader class to extract tabular data from CSV text data.

Parameters:

text (str) – CSV text to load.

table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

Examples:

Load table data from CSV

load()[source]

Extract tabular data as TableData instances from a CSV text object. source attribute should contain a text object to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(format_name)s

"csv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.3. HTML Loader Classes

5.2.3.1. HTML File Loader

class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from HTML files.

Parameters:

file_path (str) – Path to the loading HTML file.

table_name

Table name string. Defaults to %(title)s_%(key)s.

encoding

HTML file encoding. Defaults to "utf-8".

load()[source]

Extract tabular data as TableData instances from HTML table tags in a HTML file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(title)s

<title> tag value of the HTML.

%(key)s

This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute not present in the
table tag.

%(format_name)s

"html"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the HTML data is invalid or empty.

Note

Table tag attributes ignored with loaded TableData.

5.2.3.2. HTML Text Loader

class pytablereader.HtmlTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from HTML text data.

Parameters:

text (str) – HTML text to load.

table_name

Table name string. Defaults to %(title)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from HTML table tags in a HTML text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(title)s

<title> tag value of the HTML.

%(key)s

This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute is not included
in the table tag.

%(format_name)s

"html"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the HTML data is invalid or empty.

5.2.4. JSON Loader Classes

5.2.4.1. Json File Loader

class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from JSON files.

Parameters:

file_path (str) – Path to the loading JSON file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a JSON file. source attribute should contain a path to the file to load.

This method can be loading four types of JSON formats:

(1) Single table data in a file:

Acceptable JSON Schema (1): single table
{
    "type": "array",
    "items": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (1)
[
    {"attr_b": 4, "attr_c": "a", "attr_a": 1},
    {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
    {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
]

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(2) Single table data in a file:

Acceptable JSON Schema (2): single table
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (2)
{
    "attr_a": [1, 2, 3],
    "attr_b": [4, 2.1, 120.9],
    "attr_c": ["a", "bb", "ccc"]
}

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(3) Single table data in a file:

:caption: Acceptable JSON Schema (3): single table

{
    "type": "object",
    "additionalProperties": {
        "anyOf": [
            {"type": "string"},
            {"type": "number"},
            {"type": "boolean"},
            {"type": "null"}
        ]
    }
}
Acceptable JSON example for the JSON schema (3)
{
    "num_ratings": 27,
    "support_threads": 1,
    "downloaded": 925716,
    "last_updated":"2017-12-01 6:22am GMT",
    "added":"2010-01-20",
    "num": 1.1,
    "hoge": null
}

The example data will be loaded as the following tabular data:

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

(4) Multiple table data in a file:

Acceptable JSON Schema (4): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Acceptable JSON example for the JSON schema (4)
{
    "table_a" : [
        {"attr_b": 4, "attr_c": "a", "attr_a": 1},
        {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
        {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
    ],
    "table_b" : [
        {"a": 1, "b": 4},
        {"a": 2 },
        {"a": 3, "b": 120.9}
    ]
}

The example data will be loaded as the following tabular data:

table_a

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b

a

b

1

4.0

2

None

3

120.9

(5) Multiple table data in a file:

Acceptable JSON Schema (5): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "type": "array",
            "items": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Acceptable JSON example for the JSON schema (5)
{
    "table_a" : {
        "attr_a": [1, 2, 3],
        "attr_b": [4, 2.1, 120.9],
        "attr_c": ["a", "bb", "ccc"]
    },
    "table_b" : {
        "a": [1, 3],
        "b": [4, 120.9]
    }
}

The example data will be loaded as the following tabular data:

table_a

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b

a

b

1

4.0

3

120.9

(6) Multiple table data in a file:

Acceptable JSON Schema (6): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (6)
{
    "table_a": {
        "num_ratings": 27,
        "support_threads": 1,
        "downloaded": 925716,
        "last_updated":"2017-12-01 6:22am GMT",
        "added":"2010-01-20",
        "num": 1.1,
        "hoge": null
    },
    "table_b": {
        "a": 4,
        "b": 120.9
    }
}

The example data will be loaded as the following tabular data:

table_a

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

table_b

key

value

a

4.0

b

120.9

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.

%(format_name)s

"json"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

5.2.4.2. Json Text Loader

class pytablereader.JsonTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from JSON text data.

Parameters:

text (str) – JSON text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a JSON text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.

%(format_name)s

"json"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

5.2.4.3. Line-delimited Json File Loader

class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Line-delimited JSON files.

Parameters:

file_path (str) – Path to the loading Line-delimited JSON file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a Line-delimited JSON file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type:

TableData iterator

Raises:

5.2.4.4. Line-delimited Json Text Loader

class pytablereader.JsonLinesTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from Line-delimited JSON text data.

Parameters:

text (str) – Line-delimited JSON text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a Line-delimited JSON text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type:

TableData iterator

5.2.5. LTSV Loader Classes

5.2.5.1. LTSV File Loader

class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: LtsvTableLoader

Labeled Tab-separated Values (LTSV) format file loader class.

Parameters:

file_path (str) – Path to the loading LTSV file.

table_name

Table name string. Defaults to %(filename)s.

load()[source]

Extract tabular data as TableData instances from a LTSV file. source attribute should contain a path to the file to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(format_name)s

"ltsv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.

  • pytablereader.DataError – If the LTSV data is invalid.

5.2.5.2. LTSV Text Loader

class pytablereader.LtsvTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: LtsvTableLoader

Labeled Tab-separated Values (LTSV) format text loader class.

Parameters:

text (str) – LTSV text to load.

table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

load()[source]

Extract tabular data as TableData instances from a LTSV text object. source attribute should contain a text object to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(format_name)s

"ltsv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.

  • pytablereader.DataError – If the LTSV data is invalid.

5.2.6. Markdown Loader Classes

5.2.6.1. Markdown File Loader

class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Markdown files.

Parameters:

file_path (str) – Path to the loading Markdown file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a Markdown file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"markdown"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.6.2. Markdown Text Loader

class pytablereader.MarkdownTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from Markdown text data.

Parameters:

text (str) – Markdown text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a Markdown text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"markdown"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.7. MediaWiki Loader Classes

5.2.7.1. MediaWiki File Loader

class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from MediaWiki files.

Parameters:

file_path (str) – Path to the loading file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a MediaWiki file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.

%(format_name)s

"mediawiki"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.7.2. MediaWiki Text Loader

class pytablereader.MediaWikiTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from MediaWiki text data.

Parameters:

text (str) – MediaWiki text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a MediaWiki text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.

%(format_name)s

"mediawiki"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.8. Spread Sheet Loader Classes

5.2.8.1. Excel File Loader

class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Microsoft Excel TM files.

Parameters:

file_path (str) – Path to the loading Excel workbook file.

table_name

Table name string. Defaults to %(sheet)s.

start_row

The first row to search header row.

load()[source]

Extract tabular data as TableData instances from an Excel file. This method automatically search the header row of the table start from start_row. The header row requires all of the columns has value (except empty columns).

Returns:

Loaded TableData iterator. TableData created for each sheet in the workbook. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename of the workbook

%(sheet)s

Name of the sheet

%(format_name)s

"spreadsheet"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

5.2.8.2. Google Sheets Loader

class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Concrete class of Google Spreadsheet loader.

table_name

Table name string. Defaults to %(sheet)s.

Parameters:

file_path (str) – Path to the Google Sheets credential JSON file.

Dependency Packages:
Examples:

Load table data from Google Sheets

load()[source]

Load table data from a Google Spreadsheet.

This method consider source as a path to the credential JSON file to access Google Sheets API.

The method automatically search the header row start from start_row. The condition of the header row is that all of the columns have value (except empty columns).

Returns:

Loaded table data. Return one TableData for each sheet in the workbook. The table name for data will be determined by make_table_name().

Return type:

iterator of TableData

Raises:
  • pytablereader.DataError – If the header row is not found.

  • pytablereader.OpenError – If the spread sheet not found.

5.2.9. Database Loader Classes

5.2.9.1. SQLite File Loader

class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from SQLite database files.

Parameters:

file_path (str) – Path to the loading SQLite database file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

Dependency Packages:
load()[source]

Extract tabular data as TableData instances from a SQLite database file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"sqlite"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the SQLite database file data is invalid or empty.