5.2. Format Specific Table Loader Classes

5.2.1. TableLoader class

class pytablereader.interface.TableLoader(source, quoting_flags)

Bases: pytablereader.interface.TableLoaderInterface

The abstract class of table data file loader.

table_name

Table name string.

source

Table data source to load.

5.2.2. CSV Loader Classes

5.2.2.1. CSV Table Loader

class pytablereader.csv.core.CsvTableLoader(source, quoting_flags)

The abstract class of CSV table loaders.

header_list

Attribute names of the table. Use the first line of the CSV file as attribute list if header_list is empty.

delimiter

A one-character string used to separate fields. Defaults to ",".

quotechar

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. Defaults to '"'.

encoding

Encoding of the CSV data.

5.2.2.2. CSV File Loader

class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None)

Bases: pytablereader.csv.core.CsvTableLoader

A file loader class to extract tabular data from CSV files.

Parameters:file_path (str) – Path to the loading CSV file.
table_name

Table name string. Defaults to %(filename)s.

Examples:Load table data from CSV
load()

Extract tabular data as TableData instances from a CSV file. source attribute should contain a path to the file to load.

Returns:Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(format_name)s "csv"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.2.3. CSV Text Loader

class pytablereader.CsvTableTextLoader(text, quoting_flags=None)

Bases: pytablereader.csv.core.CsvTableLoader

A text loader class to extract tabular data from CSV text data.

Parameters:text (str) – CSV text to load.
table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

Examples:Load table data from CSV
load()

Extract tabular data as TableData instances from a CSV text object. source attribute should contain a text object to load.

Returns:Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s ""
%(format_name)s "csv"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the CSV data is invalid.

See also

csv.reader()

5.2.3. HTML Loader Classes

5.2.3.1. HTML File Loader

class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from HTML files.

Parameters:file_path (str) – Path to the loading HTML file.
table_name

Table name string. Defaults to %(title)s_%(key)s.

encoding

HTML file encoding. Defaults to "utf-8".

load()

Extract tabular data as TableData instances from HTML table tags in a HTML file. source attribute should contain a path to the file to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(title)s <title> tag value of the HTML.
%(key)s
This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute not present in the
table tag.
%(format_name)s "html"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the HTML data is invalid or empty.

Note

Table tag attributes ignored with loaded TableData.

5.2.3.2. HTML Text Loader

class pytablereader.HtmlTableTextLoader(text, quoting_flags=None)

A text loader class to extract tabular data from HTML text data.

Parameters:text (str) – HTML text to load.
table_name

Table name string. Defaults to %(title)s_%(key)s.

load()

Extract tabular data as TableData instances from HTML table tags in a HTML text object. source attribute should contain a text object to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s ""
%(title)s <title> tag value of the HTML.
%(key)s
This replaced to:
(1) id attribute of the table tag
(2) %(format_name)s%(format_id)s
if id attribute is not included
in the table tag.
%(format_name)s "html"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the HTML data is invalid or empty.

5.2.4. JSON Loader Classes

5.2.4.1. Json File Loader

class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from JSON files.

Parameters:file_path (str) – Path to the loading JSON file.
table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()

Extract tabular data as TableData instances from a JSON file. source attribute should contain a path to the file to load.

This method can be loading four types of JSON formats:

(1) Single table data in a file:

Acceptable JSON Schema (1): single table
{
    "type": "array",
    "items": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Valid JSON example for the JSON schema (1)
[
    {"attr_b": 4, "attr_c": "a", "attr_a": 1},
    {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
    {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
]

The example data will be loaded as the following tabular data:

attr_a attr_b attr_c
1 4.0 a
2 2.1 bb
3 120.9 ccc

(2) Single table data in a file:

Acceptable JSON Schema (2): single table
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Valid JSON example for the JSON schema (2)
{
    "attr_a": [1, 2, 3],
    "attr_b": [4, 2.1, 120.9],
    "attr_c": ["a", "bb", "ccc"]
}

The example data will be loaded as the following tabular data:

attr_a attr_b attr_c
1 4.0 a
2 2.1 bb
3 120.9 ccc

(3) Single table data in a file:

:caption: Acceptable JSON Schema (3): single table

{
    "type": "object",
    "additionalProperties": {
        "anyOf": [
            {"type": "string"},
            {"type": "number"},
            {"type": "boolean"},
            {"type": "null"}
        ]
    }
}
Valid JSON example for the JSON schema (3)
{
    "num_ratings": 27,
    "support_threads": 1,
    "downloaded": 925716,
    "last_updated":"2017-12-01 6:22am GMT",
    "added":"2010-01-20",
    "num": 1.1,
    "hoge": null
}

The example data will be loaded as the following tabular data:

key value
num_ratings 27
support_threads 1
downloaded 925716
last_updated 2017-12-01 6:22am GMT
added 2010-01-20
num 1.1
hoge None

(4) Multiple table data in a file:

Acceptable JSON Schema (4): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "array",
        "items": {
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Valid JSON example for the JSON schema (4)
{
    "table_a" : [
        {"attr_b": 4, "attr_c": "a", "attr_a": 1},
        {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2},
        {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3}
    ],
    "table_b" : [
        {"a": 1, "b": 4},
        {"a": 2 },
        {"a": 3, "b": 120.9}
    ]
}

The example data will be loaded as the following tabular data:

table_a
attr_a attr_b attr_c
1 4.0 a
2 2.1 bb
3 120.9 ccc
table_b
a b
1 4.0
2 None
3 120.9

(5) Multiple table data in a file:

Acceptable JSON Schema (5): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "type": "array",
            "items": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "number"},
                    {"type": "boolean"},
                    {"type": "null"}
                ]
            }
        }
    }
}
Valid JSON example for the JSON schema (5)
{
    "table_a" : {
        "attr_a": [1, 2, 3],
        "attr_b": [4, 2.1, 120.9],
        "attr_c": ["a", "bb", "ccc"]
    },
    "table_b" : {
        "a": [1, 3],
        "b": [4, 120.9]
    }
}

The example data will be loaded as the following tabular data:

table_a
attr_a attr_b attr_c
1 4.0 a
2 2.1 bb
3 120.9 ccc
table_b
a b
1 4.0
3 120.9

(6) Multiple table data in a file:

Acceptable JSON Schema (6): multiple tables
{
    "type": "object",
    "additionalProperties": {
        "type": "object",
        "additionalProperties": {
            "anyOf": [
                {"type": "string"},
                {"type": "number"},
                {"type": "boolean"},
                {"type": "null"}
            ]
        }
    }
}
Valid JSON example for the JSON schema (6)
{
    "table_a": {
        "num_ratings": 27,
        "support_threads": 1,
        "downloaded": 925716,
        "last_updated":"2017-12-01 6:22am GMT",
        "added":"2010-01-20",
        "num": 1.1,
        "hoge": null
    },
    "table_b": {
        "a": 4,
        "b": 120.9
    }
}

The example data will be loaded as the following tabular data:

table_a
key value
num_ratings 27
support_threads 1
downloaded 925716
last_updated 2017-12-01 6:22am GMT
added 2010-01-20
num 1.1
hoge None
table_b
key value
a 4.0
b 120.9
Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(key)s
This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.
%(format_name)s "json"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.DataError – If the data is invalid JSON.
  • pytablereader.error.ValidationError – If the data is not acceptable JSON format.

5.2.4.2. Json Text Loader

class pytablereader.JsonTableTextLoader(text, quoting_flags=None)

A text loader class to extract tabular data from JSON text data.

Parameters:text (str) – JSON text to load.
table_name

Table name string. Defaults to %(key)s.

load()

Extract tabular data as TableData instances from a JSON text object. source attribute should contain a text object to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s ""
%(key)s
This replaced the different value
for each single/multiple JSON tables:
[single JSON table]
%(format_name)s%(format_id)s
[multiple JSON table] Table data key.
%(format_name)s "json"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator

5.2.4.3. Line-delimited Json File Loader

class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from Line-delimited JSON files.

Parameters:file_path (str) – Path to the loading Line-delimited JSON file.
table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()

Extract tabular data as TableData instances from a Line-delimited JSON file. source attribute should contain a path to the file to load.

Returns:

Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Return type:

TableData iterator

Raises:
  • pytablereader.DataError – If the data is invalid Line-delimited JSON.
  • pytablereader.error.ValidationError – If the data is not acceptable Line-delimited JSON format.

5.2.4.4. Line-delimited Json Text Loader

class pytablereader.JsonLinesTableTextLoader(text, quoting_flags=None)

A text loader class to extract tabular data from Line-delimited JSON text data.

Parameters:text (str) – Line-delimited JSON text to load.
table_name

Table name string. Defaults to %(key)s.

load()

Extract tabular data as TableData instances from a Line-delimited JSON text object. source attribute should contain a text object to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Return type:TableData iterator

5.2.5. LTSV Loader Classes

5.2.5.1. LTSV File Loader

class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None)

Bases: pytablereader.ltsv.core.LtsvTableLoader

Labeled Tab-separated Values (LTSV) format file loader class.

Parameters:file_path (str) – Path to the loading LTSV file.
table_name

Table name string. Defaults to %(filename)s.

load()

Extract tabular data as TableData instances from a LTSV file. source attribute should contain a path to the file to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(format_name)s "ltsv"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
  • pytablereader.DataError – If the LTSV data is invalid.

5.2.5.2. LTSV Text Loader

class pytablereader.LtsvTableTextLoader(text)

Bases: pytablereader.ltsv.core.LtsvTableLoader

Labeled Tab-separated Values (LTSV) format text loader class.

Parameters:text (str) – LTSV text to load.
table_name

Table name string. Defaults to %(format_name)s%(format_id)s.

load()

Extract tabular data as TableData instances from a LTSV text object. source attribute should contain a text object to load.

Returns:

Loaded table data. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier Value after the replacement
%(filename)s ""
%(format_name)s "ltsv"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
  • pytablereader.DataError – If the LTSV data is invalid.

5.2.6. Markdown Loader Classes

5.2.6.1. Markdown File Loader

class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from Markdown files.

Parameters:file_path (str) – Path to the loading Markdown file.
table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()

Extract tabular data as TableData instances from a Markdown file. source attribute should contain a path to the file to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(key)s %(format_name)s%(format_id)s
%(format_name)s "markdown"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.6.2. Markdown Text Loader

class pytablereader.MarkdownTableTextLoader(text, quoting_flags=None)

A text loader class to extract tabular data from Markdown text data.

Parameters:text (str) – Markdown text to load.
table_name

Table name string. Defaults to %(key)s.

load()

Extract tabular data as TableData instances from a Markdown text object. source attribute should contain a text object to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s ""
%(key)s %(format_name)s%(format_id)s
%(format_name)s "markdown"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the Markdown data is invalid or empty.

5.2.7. MediaWiki Loader Classes

5.2.7.1. MediaWiki File Loader

class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from MediaWiki files.

Parameters:file_path (str) – Path to the loading file.
table_name

Table name string. Defaults to %(filename)s_%(key)s.

load()

Extract tabular data as TableData instances from a MediaWiki file. source attribute should contain a path to the file to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(key)s
This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.
%(format_name)s "mediawiki"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.7.2. MediaWiki Text Loader

class pytablereader.MediaWikiTableTextLoader(text, quoting_flags=None)

A text loader class to extract tabular data from MediaWiki text data.

Parameters:text (str) – MediaWiki text to load.
table_name

Table name string. Defaults to %(key)s.

load()

Extract tabular data as TableData instances from a MediaWiki text object. source attribute should contain a text object to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s ""
%(key)s
This replaced to:
(1) caption mark of the table
(2) %(format_name)s%(format_id)s
if caption mark not included
in the table.
%(format_name)s "mediawiki"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the MediaWiki data is invalid or empty.

5.2.8. Spread Sheet Loader Classes

5.2.8.1. Excel File Loader

class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from Microsoft Excel TM files.

Parameters:file_path (str) – Path to the loading Excel workbook file.
table_name

Table name string. Defaults to %(sheet)s.

start_row

The first row to search header row.

load()

Extract tabular data as TableData instances from an Excel file. This method automatically search the header row of the table start from start_row. The header row requires all of the columns has value (except empty columns).

Returns:

Loaded TableData iterator. TableData created for each sheet in the workbook. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:

Format specifier Value after the replacement
%(filename)s Filename of the workbook
%(sheet)s Name of the sheet
%(format_name)s "spreadsheet"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.

Return type:

TableData iterator

Raises:
  • pytablereader.DataError – If the header row is not found.
  • pytablereader.error.OpenError – If failed to open the source file.

5.2.8.2. Google Sheets Loader

class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None)

Concrete class of Google Spreadsheet loader.

table_name

Table name string. Defaults to %(sheet)s.

Parameters:

file_path (str) – Path to the Google Sheets credential JSON file.

Dependency Packages:
 
Examples:

Load table data from Google Sheets

load()

Load table data from a Google Spreadsheet.

This method consider source as a path to the credential JSON file to access Google Sheets API.

The method automatically search the header row start from start_row. The condition of the header row is that all of the columns have value (except empty columns).

Returns:

Loaded table data. Return one TableData for each sheet in the workbook. The table name for data will be determined by make_table_name().

Return type:

iterator of TableData

Raises:
  • pytablereader.DataError – If the header row is not found.
  • pytablereader.OpenError – If the spread sheet not found.

5.2.9. Database Loader Classes

5.2.9.1. SQLite File Loader

class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None)

A file loader class to extract tabular data from SQLite database files.

Parameters:file_path (str) – Path to the loading SQLite database file.
table_name

Table name string. Defaults to %(filename)s_%(key)s.

Dependency Packages:
 
load()

Extract tabular data as TableData instances from a SQLite database file. source attribute should contain a path to the file to load.

Returns:Loaded table data iterator. Table name determined by the value of table_name. Following format specifiers in the table_name are replaced with specific strings:
Format specifier Value after the replacement
%(filename)s Filename (without extension)
%(key)s %(format_name)s%(format_id)s
%(format_name)s "sqlite"
%(format_id)s A unique number between the same format.
%(global_id)s A unique number between all of the format.
Return type:TableData iterator
Raises:pytablereader.DataError – If the SQLite database file data is invalid or empty.