5.2. Format Specific Table Loader Classes¶
5.2.1. AbstractTableReader class¶
5.2.2. CSV Loader Classes¶
5.2.2.1. CSV Table Loader¶
- class pytablereader.csv.core.CsvTableLoader(source, quoting_flags, type_hints, type_hint_rules)[source]¶
The abstract class of CSV table loaders.
- headers¶
Attribute names of the table. Use the first line of the CSV file as attribute list if
headersis empty.
- delimiter¶
A one-character string used to separate fields. Defaults to
",".
- quotechar¶
A one-character string used to quote fields containing special characters, such as the
delimiterorquotechar, or which contain new-line characters. Defaults to'"'.
- encoding¶
Encoding of the CSV data.
5.2.2.2. CSV File Loader¶
- class pytablereader.CsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
CsvTableLoaderA file loader class to extract tabular data from CSV files.
- Parameters:
file_path (str) – Path to the loading CSV file.
- table_name¶
Table name string. Defaults to
%(filename)s.
- Examples:
- load()[source]¶
Extract tabular data as
TableDatainstances from a CSV file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(format_name)s"csv"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the CSV data is invalid.
See also
5.2.2.3. CSV Text Loader¶
- class pytablereader.CsvTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
CsvTableLoaderA text loader class to extract tabular data from CSV text data.
- Parameters:
text (str) – CSV text to load.
- table_name¶
Table name string. Defaults to
%(format_name)s%(format_id)s.
- Examples:
- load()[source]¶
Extract tabular data as
TableDatainstances from a CSV text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(format_name)s"csv"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the CSV data is invalid.
See also
5.2.3. HTML Loader Classes¶
5.2.3.1. HTML File Loader¶
- class pytablereader.HtmlTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from HTML files.
- Parameters:
file_path (str) – Path to the loading HTML file.
- table_name¶
Table name string. Defaults to
%(title)s_%(key)s.
- encoding¶
HTML file encoding. Defaults to
"utf-8".
- load()[source]¶
Extract tabular data as
TableDatainstances from HTML table tags in a HTML file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(title)s<title>tag value of the HTML.%(key)sThis replaced to:(1)idattribute of the table tag(2)%(format_name)s%(format_id)sifidattribute not present in thetable tag.%(format_name)s"html"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the HTML data is invalid or empty.
Note
Table tag attributes ignored with loaded
TableData.
5.2.3.2. HTML Text Loader¶
- class pytablereader.HtmlTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from HTML text data.
- Parameters:
text (str) – HTML text to load.
- table_name¶
Table name string. Defaults to
%(title)s_%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from HTML table tags in a HTML text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(title)s<title>tag value of the HTML.%(key)sThis replaced to:(1)idattribute of the table tag(2)%(format_name)s%(format_id)sifidattribute is not includedin the table tag.%(format_name)s"html"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the HTML data is invalid or empty.
5.2.4. JSON Loader Classes¶
5.2.4.1. Json File Loader¶
- class pytablereader.JsonTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from JSON files.
- Parameters:
file_path (str) – Path to the loading JSON file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a JSON file.sourceattribute should contain a path to the file to load.This method can be loading four types of JSON formats:
(1) Single table data in a file:
Acceptable JSON Schema (1): single table¶{ "type": "array", "items": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
Acceptable JSON example for the JSON schema (1)¶[ {"attr_b": 4, "attr_c": "a", "attr_a": 1}, {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2}, {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3} ]
The example data will be loaded as the following tabular data:
attr_a
attr_b
attr_c
1
4.0
a
2
2.1
bb
3
120.9
ccc
(2) Single table data in a file:
Acceptable JSON Schema (2): single table¶{ "type": "object", "additionalProperties": { "type": "array", "items": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
Acceptable JSON example for the JSON schema (2)¶{ "attr_a": [1, 2, 3], "attr_b": [4, 2.1, 120.9], "attr_c": ["a", "bb", "ccc"] }
The example data will be loaded as the following tabular data:
attr_a
attr_b
attr_c
1
4.0
a
2
2.1
bb
3
120.9
ccc
(3) Single table data in a file:
:caption: Acceptable JSON Schema (3): single table { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } }Acceptable JSON example for the JSON schema (3)¶{ "num_ratings": 27, "support_threads": 1, "downloaded": 925716, "last_updated":"2017-12-01 6:22am GMT", "added":"2010-01-20", "num": 1.1, "hoge": null }
The example data will be loaded as the following tabular data:
key
value
num_ratings
27
support_threads
1
downloaded
925716
last_updated
2017-12-01 6:22am GMT
added
2010-01-20
num
1.1
hoge
None
(4) Multiple table data in a file:
Acceptable JSON Schema (4): multiple tables¶{ "type": "object", "additionalProperties": { "type": "array", "items": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } } }
Acceptable JSON example for the JSON schema (4)¶{ "table_a" : [ {"attr_b": 4, "attr_c": "a", "attr_a": 1}, {"attr_b": 2.1, "attr_c": "bb", "attr_a": 2}, {"attr_b": 120.9, "attr_c": "ccc", "attr_a": 3} ], "table_b" : [ {"a": 1, "b": 4}, {"a": 2 }, {"a": 3, "b": 120.9} ] }
The example data will be loaded as the following tabular data:
(5) Multiple table data in a file:
Acceptable JSON Schema (5): multiple tables¶{ "type": "object", "additionalProperties": { "type": "object", "additionalProperties": { "type": "array", "items": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } } }
Acceptable JSON example for the JSON schema (5)¶{ "table_a" : { "attr_a": [1, 2, 3], "attr_b": [4, 2.1, 120.9], "attr_c": ["a", "bb", "ccc"] }, "table_b" : { "a": [1, 3], "b": [4, 120.9] } }
The example data will be loaded as the following tabular data:
(6) Multiple table data in a file:
Acceptable JSON Schema (6): multiple tables¶{ "type": "object", "additionalProperties": { "type": "object", "additionalProperties": { "anyOf": [ {"type": "string"}, {"type": "number"}, {"type": "boolean"}, {"type": "null"} ] } } }
Acceptable JSON example for the JSON schema (6)¶{ "table_a": { "num_ratings": 27, "support_threads": 1, "downloaded": 925716, "last_updated":"2017-12-01 6:22am GMT", "added":"2010-01-20", "num": 1.1, "hoge": null }, "table_b": { "a": 4, "b": 120.9 } }
The example data will be loaded as the following tabular data:
- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(key)sThis replaced the different valuefor each single/multiple JSON tables:[single JSON table]%(format_name)s%(format_id)s[multiple JSON table] Table data key.%(format_name)s"json"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the data is invalid JSON.
pytablereader.error.ValidationError – If the data is not acceptable JSON format.
5.2.4.2. Json Text Loader¶
- class pytablereader.JsonTableTextLoader(text, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from JSON text data.
- Parameters:
text (str) – JSON text to load.
- table_name¶
Table name string. Defaults to
%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a JSON text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(key)sThis replaced the different valuefor each single/multiple JSON tables:[single JSON table]%(format_name)s%(format_id)s[multiple JSON table] Table data key.%(format_name)s"json"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator
See also
5.2.4.3. Line-delimited Json File Loader¶
- class pytablereader.JsonLinesTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Line-delimited JSON files.
- Parameters:
file_path (str) – Path to the loading Line-delimited JSON file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a Line-delimited JSON file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the data is invalid Line-delimited JSON.
pytablereader.error.ValidationError – If the data is not acceptable Line-delimited JSON format.
5.2.4.4. Line-delimited Json Text Loader¶
- class pytablereader.JsonLinesTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from Line-delimited JSON text data.
- Parameters:
text (str) – Line-delimited JSON text to load.
- table_name¶
Table name string. Defaults to
%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a Line-delimited JSON text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:- Return type:
TableDataiterator
See also
5.2.5. LTSV Loader Classes¶
5.2.5.1. LTSV File Loader¶
- class pytablereader.LtsvTableFileLoader(file_path, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
LtsvTableLoaderLabeled Tab-separated Values (LTSV) format file loader class.
- Parameters:
file_path (str) – Path to the loading LTSV file.
- table_name¶
Table name string. Defaults to
%(filename)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a LTSV file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(format_name)s"ltsv"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.
5.2.5.2. LTSV Text Loader¶
- class pytablereader.LtsvTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Bases:
LtsvTableLoaderLabeled Tab-separated Values (LTSV) format text loader class.
- Parameters:
text (str) – LTSV text to load.
- table_name¶
Table name string. Defaults to
%(format_name)s%(format_id)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a LTSV text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(format_name)s"ltsv"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.InvalidHeaderNameError – If an invalid label name is included in the LTSV file.
pytablereader.DataError – If the LTSV data is invalid.
5.2.6. Markdown Loader Classes¶
5.2.6.1. Markdown File Loader¶
- class pytablereader.MarkdownTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Markdown files.
- Parameters:
file_path (str) – Path to the loading Markdown file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a Markdown file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(key)s%(format_name)s%(format_id)s%(format_name)s"markdown"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the Markdown data is invalid or empty.
5.2.6.2. Markdown Text Loader¶
- class pytablereader.MarkdownTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from Markdown text data.
- Parameters:
text (str) – Markdown text to load.
- table_name¶
Table name string. Defaults to
%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a Markdown text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(key)s%(format_name)s%(format_id)s%(format_name)s"markdown"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the Markdown data is invalid or empty.
5.2.7. MediaWiki Loader Classes¶
5.2.7.1. MediaWiki File Loader¶
- class pytablereader.MediaWikiTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from MediaWiki files.
- Parameters:
file_path (str) – Path to the loading file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a MediaWiki file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(key)sThis replaced to:(1)captionmark of the table(2)%(format_name)s%(format_id)sifcaptionmark not includedin the table.%(format_name)s"mediawiki"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the MediaWiki data is invalid or empty.
5.2.7.2. MediaWiki Text Loader¶
- class pytablereader.MediaWikiTableTextLoader(text=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A text loader class to extract tabular data from MediaWiki text data.
- Parameters:
text (str) – MediaWiki text to load.
- table_name¶
Table name string. Defaults to
%(key)s.
- load()[source]¶
Extract tabular data as
TableDatainstances from a MediaWiki text object.sourceattribute should contain a text object to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)s""%(key)sThis replaced to:(1)captionmark of the table(2)%(format_name)s%(format_id)sifcaptionmark not includedin the table.%(format_name)s"mediawiki"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the MediaWiki data is invalid or empty.
5.2.8. Spread Sheet Loader Classes¶
5.2.8.1. Excel File Loader¶
- class pytablereader.ExcelTableFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from Microsoft Excel TM files.
- Parameters:
file_path (str) – Path to the loading Excel workbook file.
- table_name¶
Table name string. Defaults to
%(sheet)s.
- start_row¶
The first row to search header row.
- load()[source]¶
Extract tabular data as
TableDatainstances from an Excel file. This method automatically search the header row of the table start fromstart_row. The header row requires all of the columns has value (except empty columns).- Returns:
Loaded
TableDataiterator.TableDatacreated for each sheet in the workbook. Table name determined by the value oftable_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename of the workbook
%(sheet)sName of the sheet
%(format_name)s"spreadsheet"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the header row is not found.
pytablereader.error.OpenError – If failed to open the source file.
5.2.8.2. Google Sheets Loader¶
- class pytablereader.GoogleSheetsTableLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
Concrete class of Google Spreadsheet loader.
- table_name¶
Table name string. Defaults to
%(sheet)s.
- Parameters:
file_path (str) – Path to the Google Sheets credential JSON file.
- Dependency Packages:
- Examples:
- load()[source]¶
Load table data from a Google Spreadsheet.
This method consider
sourceas a path to the credential JSON file to access Google Sheets API.The method automatically search the header row start from
start_row. The condition of the header row is that all of the columns have value (except empty columns).- Returns:
Loaded table data. Return one
TableDatafor each sheet in the workbook. The table name for data will be determined bymake_table_name().- Return type:
iterator of
TableData- Raises:
pytablereader.DataError – If the header row is not found.
pytablereader.OpenError – If the spread sheet not found.
5.2.9. Database Loader Classes¶
5.2.9.1. SQLite File Loader¶
- class pytablereader.SqliteFileLoader(file_path=None, quoting_flags=None, type_hints=None, type_hint_rules=None)[source]¶
A file loader class to extract tabular data from SQLite database files.
- Parameters:
file_path (str) – Path to the loading SQLite database file.
- table_name¶
Table name string. Defaults to
%(filename)s_%(key)s.
- Dependency Packages:
- load()[source]¶
Extract tabular data as
TableDatainstances from a SQLite database file.sourceattribute should contain a path to the file to load.- Returns:
Loaded table data iterator. Table name determined by the value of
table_name. Following format specifiers in thetable_nameare replaced with specific strings:Format specifier
Value after the replacement
%(filename)sFilename (without extension)
%(key)s%(format_name)s%(format_id)s%(format_name)s"sqlite"%(format_id)sA unique number between the same format.
%(global_id)sA unique number between all of the format.
- Return type:
TableDataiterator- Raises:
pytablereader.DataError – If the SQLite database file data is invalid or empty.