5.2. Format Specific Table Loader Classes¶
5.2.1. AbstractTableReader class¶
5.2.2. CSV Loader Classes¶
5.2.2.1. CSV Table Loader¶
- class pytablereader.csv.core.CsvTableLoader(source, quoting_flags, type_hints, type_hint_rules)[source]¶
The abstract class of CSV table loaders.
- headers¶
Attribute names of the table. Use the first line of the CSV file as attribute list if
headers
is empty.
- delimiter¶
A one-character string used to separate fields. Defaults to
","
.
- quotechar¶
A one-character string used to quote fields containing special characters, such as the
delimiter
orquotechar
, or which contain new-line characters. Defaults to'"'
.
- encoding¶
Encoding of the CSV data.
5.2.2.2. CSV File Loader¶
- class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
CsvTableLoader
A file loader class to extract tabular data from CSV files.
- Parameters:
file_path (str) – Path to the loading CSV file.
- table_name¶
Table name string. Defaults to
%(filename)s
.
- Examples:
- load()[source]¶
Extract tabular data as
TableData
instances from a CSV file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(format_name)s
"csv"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the CSV data is invalid.
See also
5.2.2.3. CSV Text Loader¶
- class pytablereader.CsvTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
CsvTableLoader
A text loader class to extract tabular data from CSV text data.
- Parameters:
text (str) – CSV text to load.
- table_name¶
Table name string. Defaults to
%(format_name)s%(format_id)s
.
- Examples:
- load()[source]¶
Extract tabular data as
TableData
instances from a CSV text object.source
attribute should contain a text object to load.- Returns:
Loaded table data. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(format_name)s
"csv"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the CSV data is invalid.
See also
5.2.3. HTML Loader Classes¶
5.2.3.1. HTML File Loader¶
- class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from HTML files.
- Parameters:
file_path (str) – Path to the loading HTML file.
- table_name¶
Table name string. Defaults to
%(title)s_%(key)s
.
- encoding¶
HTML file encoding. Defaults to
"utf-8"
.
- load()[source]¶
Extract tabular data as
TableData
instances from HTML table tags in a HTML file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(title)s
<title>
tag value of the HTML.%(key)s
This replaced to:(1)id
attribute of the table tag(2)%(format_name)s%(format_id)s
ifid
attribute not present in thetable tag.%(format_name)s
"html"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the HTML data is invalid or empty.
Note
Table tag attributes ignored with loaded
TableData
.
5.2.3.2. HTML Text Loader¶
- class pytablereader.HtmlTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from HTML text data.
- Parameters:
text (str) – HTML text to load.
- table_name¶
Table name string. Defaults to
%(title)s_%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from HTML table tags in a HTML text object.source
attribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(title)s
<title>
tag value of the HTML.%(key)s
This replaced to:(1)id
attribute of the table tag(2)%(format_name)s%(format_id)s
ifid
attribute is not includedin the table tag.%(format_name)s
"html"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the HTML data is invalid or empty.
5.2.4. JSON Loader Classes¶
5.2.4.1. Json File Loader¶
- class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from JSON files.
- Parameters:
file_path (str) – Path to the loading JSON file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a JSON file.source
attribute should contain a path to the file to load.This method can be loading four types of JSON formats:
(1) Single table data in a file:
{ "type": "array", "items": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
[ {"attr_b": 4, "attr_c": "a", "attr_a": 1}, {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2}, {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3} ]
The example data will be loaded as the following tabular data:
attr_a
attr_b
attr_c
1
4.0
a
2
2.1
bb
3
120.9
ccc
(2) Single table data in a file:
{ "type": "object", "additionalProperties": { "type": "array", "items": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
{ "attr_a": [1, 2, 3], "attr_b": [4, 2.1, 120.9], "attr_c": ["a", "bb", "ccc"] }
The example data will be loaded as the following tabular data:
attr_a
attr_b
attr_c
1
4.0
a
2
2.1
bb
3
120.9
ccc
(3) Single table data in a file:
:caption: Acceptable JSON Schema (3): single table { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } }
{ "num_ratings": 27, "support_threads": 1, "downloaded": 925716, "last_updated":"2017-12-01 6:22am GMT", "added":"2010-01-20", "num": 1.1, "hoge": null }
The example data will be loaded as the following tabular data:
key
value
num_ratings
27
support_threads
1
downloaded
925716
last_updated
2017-12-01 6:22am GMT
added
2010-01-20
num
1.1
hoge
None
(4) Multiple table data in a file:
{ "type": "object", "additionalProperties": { "type": "array", "items": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } } }
{ "table_a" : [ {"attr_b": 4, "attr_c": "a", "attr_a": 1}, {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2}, {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3} ], "table_b" : [ {"a": 1, "b": 4}, {"a": 2 }, {"a": 3, "b": 120.9} ] }
The example data will be loaded as the following tabular data:
(5) Multiple table data in a file:
{ "type": "object", "additionalProperties": { "type": "object", "additionalProperties": { "type": "array", "items": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } } }
{ "table_a" : { "attr_a": [1, 2, 3], "attr_b": [4, 2.1, 120.9], "attr_c": ["a", "bb", "ccc"] }, "table_b" : { "a": [1, 3], "b": [4, 120.9] } }
The example data will be loaded as the following tabular data:
(6) Multiple table data in a file:
{ "type": "object", "additionalProperties": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
{ "table_a": { "num_ratings": 27, "support_threads": 1, "downloaded": 925716, "last_updated":"2017-12-01 6:22am GMT", "added":"2010-01-20", "num": 1.1, "hoge": null }, "table_b": { "a": 4, "b": 120.9 } }
The example data will be loaded as the following tabular data:
- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(key)s
This replaced the different valuefor each single/multiple JSON tables:[single JSON table]%(format_name)s%(format_id)s
[multiple JSON table] Table data key.%(format_name)s
"json"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the data is invalid JSON.
pytablereader.error.ValidationError – If the data is not acceptable JSON format.
5.2.4.2. Json Text Loader¶
- class pytablereader.JsonTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from JSON text data.
- Parameters:
text (str) – JSON text to load.
- table_name¶
Table name string. Defaults to
%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a JSON text object.source
attribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(key)s
This replaced the different valuefor each single/multiple JSON tables:[single JSON table]%(format_name)s%(format_id)s
[multiple JSON table] Table data key.%(format_name)s
"json"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator
See also
5.2.4.3. Line-delimited Json File Loader¶
- class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Line-delimited JSON files.
- Parameters:
file_path (str) – Path to the loading Line-delimited JSON file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a Line-delimited JSON file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the data is invalid Line-delimited JSON.
pytablereader.error.ValidationError – If the data is not acceptable Line-delimited JSON format.
5.2.4.4. Line-delimited Json Text Loader¶
- class pytablereader.JsonLinesTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from Line-delimited JSON text data.
- Parameters:
text (str) – Line-delimited JSON text to load.
- table_name¶
Table name string. Defaults to
%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a Line-delimited JSON text object.source
attribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:- Return type:
TableData
iterator
See also
5.2.5. LTSV Loader Classes¶
5.2.5.1. LTSV File Loader¶
- class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
LtsvTableLoader
Labeled Tab-separated Values (LTSV) format file loader class.
- Parameters:
file_path (str) – Path to the loading LTSV file.
- table_name¶
Table name string. Defaults to
%(filename)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a LTSV file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(format_name)s
"ltsv"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.
5.2.5.2. LTSV Text Loader¶
- class pytablereader.LtsvTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
LtsvTableLoader
Labeled Tab-separated Values (LTSV) format text loader class.
- Parameters:
text (str) – LTSV text to load.
- table_name¶
Table name string. Defaults to
%(format_name)s%(format_id)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a LTSV text object.source
attribute should contain a text object to load.- Returns:
Loaded table data. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(format_name)s
"ltsv"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.
5.2.6. Markdown Loader Classes¶
5.2.6.1. Markdown File Loader¶
- class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Markdown files.
- Parameters:
file_path (str) – Path to the loading Markdown file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a Markdown file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(key)s
%(format_name)s%(format_id)s
%(format_name)s
"markdown"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the Markdown data is invalid or empty.
5.2.6.2. Markdown Text Loader¶
- class pytablereader.MarkdownTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from Markdown text data.
- Parameters:
text (str) – Markdown text to load.
- table_name¶
Table name string. Defaults to
%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a Markdown text object.source
attribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(key)s
%(format_name)s%(format_id)s
%(format_name)s
"markdown"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the Markdown data is invalid or empty.
5.2.7. MediaWiki Loader Classes¶
5.2.7.1. MediaWiki File Loader¶
- class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from MediaWiki files.
- Parameters:
file_path (str) – Path to the loading file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a MediaWiki file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(key)s
This replaced to:(1)caption
mark of the table(2)%(format_name)s%(format_id)s
ifcaption
mark not includedin the table.%(format_name)s
"mediawiki"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the MediaWiki data is invalid or empty.
5.2.7.2. MediaWiki Text Loader¶
- class pytablereader.MediaWikiTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from MediaWiki text data.
- Parameters:
text (str) – MediaWiki text to load.
- table_name¶
Table name string. Defaults to
%(key)s
.
- load()[source]¶
Extract tabular data as
TableData
instances from a MediaWiki text object.source
attribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
""
%(key)s
This replaced to:(1)caption
mark of the table(2)%(format_name)s%(format_id)s
ifcaption
mark not includedin the table.%(format_name)s
"mediawiki"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the MediaWiki data is invalid or empty.
5.2.8. Spread Sheet Loader Classes¶
5.2.8.1. Excel File Loader¶
- class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Microsoft Excel TM files.
- Parameters:
file_path (str) – Path to the loading Excel workbook file.
- table_name¶
Table name string. Defaults to
%(sheet)s
.
- start_row¶
The first row to search header row.
- load()[source]¶
Extract tabular data as
TableData
instances from an Excel file. This method automatically search the header row of the table start fromstart_row
. The header row requires all of the columns has value (except empty columns).- Returns:
Loaded
TableData
iterator.TableData
created for each sheet in the workbook. Table name determined by the value oftable_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename of the workbook
%(sheet)s
Name of the sheet
%(format_name)s
"spreadsheet"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the header row is not found.
pytablereader.error.OpenError – If failed to open the source file.
5.2.8.2. Google Sheets Loader¶
- class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Concrete class of Google Spreadsheet loader.
- table_name¶
Table name string. Defaults to
%(sheet)s
.
- Parameters:
file_path (str) – Path to the Google Sheets credential JSON file.
- Dependency Packages:
- Examples:
- load()[source]¶
Load table data from a Google Spreadsheet.
This method consider
source
as a path to the credential JSON file to access Google Sheets API.The method automatically search the header row start from
start_row
. The condition of the header row is that all of the columns have value (except empty columns).- Returns:
Loaded table data. Return one
TableData
for each sheet in the workbook. The table name for data will be determined bymake_table_name()
.- Return type:
iterator of
TableData
- Raises:
pytablereader.DataError – If the header row is not found.
pytablereader.OpenError – If the spread sheet not found.
5.2.9. Database Loader Classes¶
5.2.9.1. SQLite File Loader¶
- class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from SQLite database files.
- Parameters:
file_path (str) – Path to the loading SQLite database file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s
.
- Dependency Packages:
- load()[source]¶
Extract tabular data as
TableData
instances from a SQLite database file.source
attribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name
. Following format specifiers in thetable_name
are replaced with specific strings:Format specifier
Value after the replacement
%(filename)s
Filename (without extension)
%(key)s
%(format_name)s%(format_id)s
%(format_name)s
"sqlite"
%(format_id)s
A unique number between the same format.
%(global_id)s
A unique number between all of the format.
- Return type:
TableData
iterator- Raises:
pytablereader.DataError – If the SQLite database file data is invalid or empty.