5.2. Format Specific Table Loader Classes

5.2.1. AbstractTableReader class

class pytablereader.interface.AbstractTableReader(source, quoting_flags, type_hints, type_hint_rules=None)[source]

Bases: pytablereader.interface.TableLoaderInterface

The abstract class of table data file loader.

table_name

Table name string.

source

Table data source to load.

5.2.2. CSV Loader Classes

5.2.2.1. CSV Table Loader

class pytablereader.csv.core.CsvTableLoader(source, quoting_flags, type_hints, type_hint_rules)[source]

The abstract class of CSV table loaders.

headers

Attribute names of the table. Use the first line of the CSV file as attribute list if headers is empty.

delimiter

A one-character string used to separate fields. Defaults to ",".

quotechar

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. Defaults to '"'.

encoding

Encoding of the CSV data.

5.2.2.2. CSV File Loader

class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: pytablereader.csv.core.CsvTableLoader

A file loader class to extract tabular data from CSV files.

Parameters

file_path (str) – Path to the loading CSV file.

table_name

Table name string. Defaults to %(filename)s.

Examples

Load table data from CSV

load()[source]

Extract tabular data as TableData instances from a CSV file. source attribute should contain a path to the file to load.

Returns

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(format_name)s

"csv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.2.3. CSV Text Loader

class pytablereader.CsvTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: pytablereader.csv.core.CsvTableLoader

A text loader class to extract tabular data from CSV text data.

Parameters

text (str) – CSV text to load.

table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

Examples

Load table data from CSV

load()[source]

Extract tabular data as TableData instances from a CSV text object. source attribute should contain a text object to load.

Returns

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(format_name)s

"csv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.3. HTML Loader Classes

5.2.3.1. HTML File Loader

class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from HTML files.

Parameters

file_path (str) – Path to the loading HTML file.

table_name

Table name string. Defaults to %(title)s_%(key)s.

encoding

HTML file encoding. Defaults to "utf-8".

load()[source]

Extract tabular data as TableData instances from HTML table tags in a HTML file. source attribute should contain a path to the file to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(title)s

<title> tag value of the HTML.

%(key)s

This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute not present in the
table tag.

%(format_name)s

"html"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the HTML data is invalid or empty.

Note

Table tag attributes ignored with loaded TableData.

5.2.3.2. HTML Text Loader

class pytablereader.HtmlTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from HTML text data.

Parameters

text (str) – HTML text to load.

table_name

Table name string. Defaults to %(title)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from HTML table tags in a HTML text object. source attribute should contain a text object to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(title)s

<title> tag value of the HTML.

%(key)s

This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute is not included
in the table tag.

%(format_name)s

"html"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the HTML data is invalid or empty.

5.2.4. JSON Loader Classes

5.2.4.1. Json File Loader

class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from JSON files.

Parameters

file_path (str) – Path to the loading JSON file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a JSON file. source attribute should contain a path to the file to load.

This method can be loading four types of JSON formats:

(1) Single table data in a file:

Acceptable JSON Schema (1): single table
{
    "type": "array",
    "items": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (1)
[
    {"attr_b": 4, "attr_c": "a", "attr_a": 1},
    {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
    {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
]

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(2) Single table data in a file:

Acceptable JSON Schema (2): single table
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (2)
{
    "attr_a": [1, 2, 3],
    "attr_b": [4, 2.1, 120.9],
    "attr_c": ["a", "bb", "ccc"]
}

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(3) Single table data in a file:

:caption: Acceptable JSON Schema (3): single table

{
    "type": "object",
    "additionalProperties": {
        "anyOf": [
            {"type": "string"},
            {"type": "number"},
            {"type": "boolean"},
            {"type": "null"}
        ]
    }
}
Acceptable JSON example for the JSON schema (3)
{
    "num_ratings": 27,
    "support_threads": 1,
    "downloaded": 925716,
    "last_updated":"2017-12-01 6:22am GMT",
    "added":"2010-01-20",
    "num": 1.1,
    "hoge": null
}

The example data will be loaded as the following tabular data:

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

(4) Multiple table data in a file:

Acceptable JSON Schema (4): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Acceptable JSON example for the JSON schema (4)
{
    "table_a" : [
        {"attr_b": 4, "attr_c": "a", "attr_a": 1},
        {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
        {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
    ],
    "table_b" : [
        {"a": 1, "b": 4},
        {"a": 2 },
        {"a": 3, "b": 120.9}
    ]
}

The example data will be loaded as the following tabular data:

table_a

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b

a

b

1

4.0

2

None

3

120.9

(5) Multiple table data in a file:

Acceptable JSON Schema (5): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "type": "array",
            "items": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Acceptable JSON example for the JSON schema (5)
{
    "table_a" : {
        "attr_a": [1, 2, 3],
        "attr_b": [4, 2.1, 120.9],
        "attr_c": ["a", "bb", "ccc"]
    },
    "table_b" : {
        "a": [1, 3],
        "b": [4, 120.9]
    }
}

The example data will be loaded as the following tabular data:

table_a

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b

a

b

1

4.0

3

120.9

(6) Multiple table data in a file:

Acceptable JSON Schema (6): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Acceptable JSON example for the JSON schema (6)
{
    "table_a": {
        "num_ratings": 27,
        "support_threads": 1,
        "downloaded": 925716,
        "last_updated":"2017-12-01 6:22am GMT",
        "added":"2010-01-20",
        "num": 1.1,
        "hoge": null
    },
    "table_b": {
        "a": 4,
        "b": 120.9
    }
}

The example data will be loaded as the following tabular data:

table_a

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

table_b

key

value

a

4.0

b

120.9

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.

%(format_name)s

"json"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises
  • pytablereader.DataError – If the data is invalid JSON.

  • pytablereader.error.ValidationError – If the data is not acceptable JSON format.

5.2.4.2. Json Text Loader

class pytablereader.JsonTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from JSON text data.

Parameters

text (str) – JSON text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a JSON text object. source attribute should contain a text object to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.

%(format_name)s

"json"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

5.2.4.3. Line-delimited Json File Loader

class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Line-delimited JSON files.

Parameters

file_path (str) – Path to the loading Line-delimited JSON file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a Line-delimited JSON file. source attribute should contain a path to the file to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type

TableData iterator

Raises
  • pytablereader.DataError – If the data is invalid Line-delimited JSON.

  • pytablereader.error.ValidationError – If the data is not acceptable Line-delimited JSON format.

5.2.4.4. Line-delimited Json Text Loader

class pytablereader.JsonLinesTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from Line-delimited JSON text data.

Parameters

text (str) – Line-delimited JSON text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a Line-delimited JSON text object. source attribute should contain a text object to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type

TableData iterator

5.2.5. LTSV Loader Classes

5.2.5.1. LTSV File Loader

class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: pytablereader.ltsv.core.LtsvTableLoader

Labeled Tab-separated Values (LTSV) format file loader class.

Parameters

file_path (str) – Path to the loading LTSV file.

table_name

Table name string. Defaults to %(filename)s.

load()[source]

Extract tabular data as TableData instances from a LTSV file. source attribute should contain a path to the file to load.

Returns

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(format_name)s

"ltsv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.

  • pytablereader.DataError – If the LTSV data is invalid.

5.2.5.2. LTSV Text Loader

class pytablereader.LtsvTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Bases: pytablereader.ltsv.core.LtsvTableLoader

Labeled Tab-separated Values (LTSV) format text loader class.

Parameters

text (str) – LTSV text to load.

table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

load()[source]

Extract tabular data as TableData instances from a LTSV text object. source attribute should contain a text object to load.

Returns

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(format_name)s

"ltsv"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.

  • pytablereader.DataError – If the LTSV data is invalid.

5.2.6. Markdown Loader Classes

5.2.6.1. Markdown File Loader

class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Markdown files.

Parameters

file_path (str) – Path to the loading Markdown file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a Markdown file. source attribute should contain a path to the file to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"markdown"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.6.2. Markdown Text Loader

class pytablereader.MarkdownTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from Markdown text data.

Parameters

text (str) – Markdown text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a Markdown text object. source attribute should contain a text object to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"markdown"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.7. MediaWiki Loader Classes

5.2.7.1. MediaWiki File Loader

class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from MediaWiki files.

Parameters

file_path (str) – Path to the loading file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()[source]

Extract tabular data as TableData instances from a MediaWiki file. source attribute should contain a path to the file to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.

%(format_name)s

"mediawiki"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.7.2. MediaWiki Text Loader

class pytablereader.MediaWikiTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A text loader class to extract tabular data from MediaWiki text data.

Parameters

text (str) – MediaWiki text to load.

table_name

Table name string. Defaults to %(key)s.

load()[source]

Extract tabular data as TableData instances from a MediaWiki text object. source attribute should contain a text object to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

""

%(key)s

This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.

%(format_name)s

"mediawiki"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.8. Spread Sheet Loader Classes

5.2.8.1. Excel File Loader

class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from Microsoft Excel TM files.

Parameters

file_path (str) – Path to the loading Excel workbook file.

table_name

Table name string. Defaults to %(sheet)s.

start_row

The first row to search header row.

load()[source]

Extract tabular data as TableData instances from an Excel file. This method automatically search the header row of the table start from start_row. The header row requires all of the columns has value (except empty columns).

Returns

Loaded TableData iterator. TableData created for each sheet in the workbook. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename of the workbook

%(sheet)s

Name of the sheet

%(format_name)s

"spreadsheet"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises
  • pytablereader.DataError – If the header row is not found.

  • pytablereader.error.OpenError – If failed to open the source file.

5.2.8.2. Google Sheets Loader

class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

Concrete class of Google Spreadsheet loader.

table_name

Table name string. Defaults to %(sheet)s.

Parameters

file_path (str) – Path to the Google Sheets credential JSON file.

Dependency Packages
Examples

Load table data from Google Sheets

load()[source]

Load table data from a Google Spreadsheet.

This method consider source as a path to the credential JSON file to access Google Sheets API.

The method automatically search the header row start from start_row. The condition of the header row is that all of the columns have value (except empty columns).

Returns

Loaded table data. Return one TableData for each sheet in the workbook. The table name for data will be determined by make_table_name().

Return type

iterator of TableData

Raises
  • pytablereader.DataError – If the header row is not found.

  • pytablereader.OpenError – If the spread sheet not found.

5.2.9. Database Loader Classes

5.2.9.1. SQLite File Loader

class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]

A file loader class to extract tabular data from SQLite database files.

Parameters

file_path (str) – Path to the loading SQLite database file.

table_name

Table name string. Defaults to %(filename)s_%(key)s.

Dependency Packages
load()[source]

Extract tabular data as TableData instances from a SQLite database file. source attribute should contain a path to the file to load.

Returns

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier

Value after the replacement

%(filename)s

Filename (without extension)

%(key)s

%(format_name)s%(format_id)s

%(format_name)s

"sqlite"

%(format_id)s

A unique number between the same format.

%(global_id)s

A unique number between all of the format.

Return type

TableData iterator

Raises

pytablereader.DataError – If the SQLite database file data is invalid or empty.