5.2. Format Specific Table Loader Classes¶

5.2.1. AbstractTableReader class¶

class pytablereader.interface.AbstractTableReader(source, quoting_flags, type_hints, type_hint_rules=None)[source]¶

Bases: TableLoaderInterface

The abstract class of table data file loader.

table_name¶: Table name string.

source¶: Table data source to load.

5.2.2. CSV Loader Classes¶

5.2.2.1. CSV Table Loader¶

class pytablereader.csv.core.CsvTableLoader(source, quoting_flags, type_hints, type_hint_rules)[source]¶

The abstract class of CSV table loaders.

headers¶: Attribute names of the table. Use the first line of the CSV file as attribute list if headers is empty.

delimiter¶: A one-character string used to separate fields. Defaults to ",".

quotechar¶: A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. Defaults to '"'.

encoding¶: Encoding of the CSV data.

5.2.2.2. CSV File Loader¶

class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

Bases: CsvTableLoader

A file loader class to extract tabular data from CSV files.

Parameters:: file_path (str) – Path to the loading CSV file.

table_name¶: Table name string. Defaults to %(filename)s.

Examples:: Load table data from CSV

load()[source]¶

Extract tabular data as TableData instances from a CSV file. source attribute should contain a path to the file to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(format_name)s`	`"csv"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.2.3. CSV Text Loader¶

class pytablereader.CsvTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

Bases: CsvTableLoader

A text loader class to extract tabular data from CSV text data.

Parameters:: text (str) – CSV text to load.

table_name¶: Table name string. Defaults to %(format_name)s%(format_id)s.

Examples:: Load table data from CSV

load()[source]¶

Extract tabular data as TableData instances from a CSV text object. source attribute should contain a text object to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(format_name)s`	`"csv"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.3. HTML Loader Classes¶

5.2.3.1. HTML File Loader¶

class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from HTML files.

Parameters:: file_path (str) – Path to the loading HTML file.

table_name¶: Table name string. Defaults to %(title)s_%(key)s.

encoding¶: HTML file encoding. Defaults to "utf-8".

load()[source]¶

Extract tabular data as TableData instances from HTML table tags in a HTML file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(title)s`	`<title>` tag value of the HTML.
`%(key)s`	This replaced to: (1) `id` attribute of the table tag (2) `%(format_name)s%(format_id)s` if `id` attribute not present in the table tag.
`%(format_name)s`	`"html"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the HTML data is invalid or empty.

Note

Table tag attributes ignored with loaded TableData.

5.2.3.2. HTML Text Loader¶

class pytablereader.HtmlTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A text loader class to extract tabular data from HTML text data.

Parameters:: text (str) – HTML text to load.

table_name¶: Table name string. Defaults to %(title)s_%(key)s.

load()[source]¶

Extract tabular data as TableData instances from HTML table tags in a HTML text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(title)s`	`<title>` tag value of the HTML.
`%(key)s`	This replaced to: (1) `id` attribute of the table tag (2) `%(format_name)s%(format_id)s` if `id` attribute is not included in the table tag.
`%(format_name)s`	`"html"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the HTML data is invalid or empty.

5.2.4. JSON Loader Classes¶

5.2.4.1. Json File Loader¶

class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from JSON files.

Parameters:: file_path (str) – Path to the loading JSON file.

table_name¶: Table name string. Defaults to %(filename)s_%(key)s.

load()[source]¶

Extract tabular data as TableData instances from a JSON file. source attribute should contain a path to the file to load.

This method can be loading four types of JSON formats:

(1) Single table data in a file:

Acceptable JSON Schema (1): single table¶

{
    "type": "array",
    "items": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}

Acceptable JSON example for the JSON schema (1)¶

[
    {"attr_b": 4, "attr_c": "a", "attr_a": 1},
    {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
    {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
]

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(2) Single table data in a file:

Acceptable JSON Schema (2): single table¶

{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}

Acceptable JSON example for the JSON schema (2)¶

{
    "attr_a": [1, 2, 3],
    "attr_b": [4, 2.1, 120.9],
    "attr_c": ["a", "bb", "ccc"]
}

The example data will be loaded as the following tabular data:

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

(3) Single table data in a file:

:caption: Acceptable JSON Schema (3): single table

{
    "type": "object",
    "additionalProperties": {
        "anyOf": [
            {"type": "string"},
            {"type": "number"},
            {"type": "boolean"},
            {"type": "null"}
        ]
    }
}

Acceptable JSON example for the JSON schema (3)¶

{
    "num_ratings": 27,
    "support_threads": 1,
    "downloaded": 925716,
    "last_updated":"2017-12-01 6:22am GMT",
    "added":"2010-01-20",
    "num": 1.1,
    "hoge": null
}

The example data will be loaded as the following tabular data:

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

(4) Multiple table data in a file:

Acceptable JSON Schema (4): multiple tables¶

{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}

Acceptable JSON example for the JSON schema (4)¶

{
    "table_a" : [
        {"attr_b": 4, "attr_c": "a", "attr_a": 1},
        {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
        {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
    ],
    "table_b" : [
        {"a": 1, "b": 4},
        {"a": 2 },
        {"a": 3, "b": 120.9}
    ]
}

The example data will be loaded as the following tabular data:

table_a¶

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b¶

a

b

1

4.0

2

None

3

120.9

(5) Multiple table data in a file:

Acceptable JSON Schema (5): multiple tables¶

{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "type": "array",
            "items": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}

Acceptable JSON example for the JSON schema (5)¶

{
    "table_a" : {
        "attr_a": [1, 2, 3],
        "attr_b": [4, 2.1, 120.9],
        "attr_c": ["a", "bb", "ccc"]
    },
    "table_b" : {
        "a": [1, 3],
        "b": [4, 120.9]
    }
}

The example data will be loaded as the following tabular data:

table_a¶

attr_a

attr_b

attr_c

1

4.0

a

2

2.1

bb

3

120.9

ccc

table_b¶

a

b

1

4.0

3

120.9

(6) Multiple table data in a file:

Acceptable JSON Schema (6): multiple tables¶

{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}

Acceptable JSON example for the JSON schema (6)¶

{
    "table_a": {
        "num_ratings": 27,
        "support_threads": 1,
        "downloaded": 925716,
        "last_updated":"2017-12-01 6:22am GMT",
        "added":"2010-01-20",
        "num": 1.1,
        "hoge": null
    },
    "table_b": {
        "a": 4,
        "b": 120.9
    }
}

The example data will be loaded as the following tabular data:

table_a¶

key

value

num_ratings

27

support_threads

1

downloaded

925716

last_updated

2017-12-01 6:22am GMT

added

2010-01-20

num

1.1

hoge

None

table_b¶

key

value

a

4.0

b

120.9

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(key)s`	This replaced the different value for each single/multiple JSON tables: [single JSON table] `%(format_name)s%(format_id)s` [multiple JSON table] Table data key.
`%(format_name)s`	`"json"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the data is invalid JSON.
pytablereader.error.ValidationError – If the data is not acceptable JSON format.

5.2.4.2. Json Text Loader¶

class pytablereader.JsonTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A text loader class to extract tabular data from JSON text data.

Parameters:: text (str) – JSON text to load.

table_name¶: Table name string. Defaults to %(key)s.

load()[source]¶

Extract tabular data as TableData instances from a JSON text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(key)s`	This replaced the different value for each single/multiple JSON tables: [single JSON table] `%(format_name)s%(format_id)s` [multiple JSON table] Table data key.
`%(format_name)s`	`"json"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

See also

JsonTableFileLoader.load()

5.2.4.3. Line-delimited Json File Loader¶

class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from Line-delimited JSON files.

Parameters:: file_path (str) – Path to the loading Line-delimited JSON file.

table_name¶: Table name string. Defaults to %(filename)s_%(key)s.

load()[source]¶

Extract tabular data as TableData instances from a Line-delimited JSON file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the data is invalid Line-delimited JSON.
pytablereader.error.ValidationError – If the data is not acceptable Line-delimited JSON format.

5.2.4.4. Line-delimited Json Text Loader¶

class pytablereader.JsonLinesTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A text loader class to extract tabular data from Line-delimited JSON text data.

Parameters:: text (str) – Line-delimited JSON text to load.

table_name¶: Table name string. Defaults to %(key)s.

load()[source]¶

Extract tabular data as TableData instances from a Line-delimited JSON text object. source attribute should contain a text object to load.

Returns:: Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Return type:: TableData iterator

See also

JsonLinesTableFileLoader.load()

5.2.5. LTSV Loader Classes¶

5.2.5.1. LTSV File Loader¶

class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

Bases: LtsvTableLoader

Labeled Tab-separated Values (LTSV) format file loader class.

Parameters:: file_path (str) – Path to the loading LTSV file.

table_name¶: Table name string. Defaults to %(filename)s.

load()[source]¶

Extract tabular data as TableData instances from a LTSV file. source attribute should contain a path to the file to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(format_name)s`	`"ltsv"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.

5.2.5.2. LTSV Text Loader¶

class pytablereader.LtsvTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

Bases: LtsvTableLoader

Labeled Tab-separated Values (LTSV) format text loader class.

Parameters:: text (str) – LTSV text to load.

table_name¶: Table name string. Defaults to %(format_name)s%(format_id)s.

load()[source]¶

Extract tabular data as TableData instances from a LTSV text object. source attribute should contain a text object to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(format_name)s`	`"ltsv"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.

5.2.6. Markdown Loader Classes¶

5.2.6.1. Markdown File Loader¶

class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from Markdown files.

Parameters:: file_path (str) – Path to the loading Markdown file.

table_name¶: Table name string. Defaults to %(filename)s_%(key)s.

load()[source]¶

Extract tabular data as TableData instances from a Markdown file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(key)s`	`%(format_name)s%(format_id)s`
`%(format_name)s`	`"markdown"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.6.2. Markdown Text Loader¶

class pytablereader.MarkdownTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A text loader class to extract tabular data from Markdown text data.

Parameters:: text (str) – Markdown text to load.

table_name¶: Table name string. Defaults to %(key)s.

load()[source]¶

Extract tabular data as TableData instances from a Markdown text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(key)s`	`%(format_name)s%(format_id)s`
`%(format_name)s`	`"markdown"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.7. MediaWiki Loader Classes¶

5.2.7.1. MediaWiki File Loader¶

class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from MediaWiki files.

Parameters:: file_path (str) – Path to the loading file.

table_name¶: Table name string. Defaults to %(filename)s_%(key)s.

load()[source]¶

Extract tabular data as TableData instances from a MediaWiki file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(key)s`	This replaced to: (1) `caption` mark of the table (2) `%(format_name)s%(format_id)s` if `caption` mark not included in the table.
`%(format_name)s`	`"mediawiki"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.7.2. MediaWiki Text Loader¶

class pytablereader.MediaWikiTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A text loader class to extract tabular data from MediaWiki text data.

Parameters:: text (str) – MediaWiki text to load.

table_name¶: Table name string. Defaults to %(key)s.

load()[source]¶

Extract tabular data as TableData instances from a MediaWiki text object. source attribute should contain a text object to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	`""`
`%(key)s`	This replaced to: (1) `caption` mark of the table (2) `%(format_name)s%(format_id)s` if `caption` mark not included in the table.
`%(format_name)s`	`"mediawiki"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.8. Spread Sheet Loader Classes¶

5.2.8.1. Excel File Loader¶

class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from Microsoft Excel ^TM files.

Parameters:: file_path (str) – Path to the loading Excel workbook file.

table_name¶: Table name string. Defaults to %(sheet)s.

start_row¶: The first row to search header row.

load()[source]¶

Extract tabular data as TableData instances from an Excel file. This method automatically search the header row of the table start from start_row. The header row requires all of the columns has value (except empty columns).

Returns:

Loaded TableData iterator. TableData created for each sheet in the workbook. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename of the workbook
`%(sheet)s`	Name of the sheet
`%(format_name)s`	`"spreadsheet"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the header row is not found.
pytablereader.error.OpenError – If failed to open the source file.

5.2.8.2. Google Sheets Loader¶

class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

Concrete class of Google Spreadsheet loader.

table_name¶: Table name string. Defaults to %(sheet)s.

Parameters:

file_path (str) – Path to the Google Sheets credential JSON file.

Dependency Packages:

Examples:

Load table data from Google Sheets

load()[source]¶

Load table data from a Google Spreadsheet.

This method consider source as a path to the credential JSON file to access Google Sheets API.

The method automatically search the header row start from start_row. The condition of the header row is that all of the columns have value (except empty columns).

Returns:

Loaded table data. Return one TableData for each sheet in the workbook. The table name for data will be determined by make_table_name().

Return type:

iterator of TableData

Raises:

pytablereader.DataError – If the header row is not found.
pytablereader.OpenError – If the spread sheet not found.

5.2.9. Database Loader Classes¶

5.2.9.1. SQLite File Loader¶

class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶

A file loader class to extract tabular data from SQLite database files.

Parameters:: file_path (str) – Path to the loading SQLite database file.

table_name¶: Table name string. Defaults to %(filename)s_%(key)s.

Dependency Packages:

SimpleSQLite

load()[source]¶

Extract tabular data as TableData instances from a SQLite database file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier	Value after the replacement
`%(filename)s`	Filename (without extension)
`%(key)s`	`%(format_name)s%(format_id)s`
`%(format_name)s`	`"sqlite"`
`%(format_id)s`	A unique number between the same format.
`%(global_id)s`	A unique number between all of the format.

Return type:

TableData iterator

Raises:

pytablereader.DataError – If the SQLite database file data is invalid or empty.