ClarTable

Installation

Works with Python ^3.9

git clone git@github.com:clarin-eric/resource-families-html-generator.git # via SSH or
git clone https://github.com/clarin-eric/resource-families-html-generator.git # via HTTPS
cd ./resource-families-html-generator/
pip install .

About

ClarTable is a Python module for generating html presentation layer for tabular data from .csv file.

Usage

Locally:

usage: python -m rfhg [-h] -i PATH -r PATH -o PATH

Create html table from given data and rules. 
To navigate static resources within the module prepend `static.` 
to the path, eg. `-r static.rules/rules.json`

optional arguments:
  -h, --help  show this help message and exit
  -i PATH     path to a .csv file or folder with .csv files
  -r PATH     path to a .json file with rules
  -o PATH     path to file where output html table will be generated

Via CI:

The html tables for resource families can be generated via GitHub. Push new .csv files to /resouce_families and after processing they will appear in gh-pages branch.

CSV format

In order to create html table from .csv file with default rules, the file requires all of following columns (order not important). Note that names of columns are case sensitive. If you need generator to consider additional columns contact michal@clarin.eu or adjust rules.json.

Make sure, that your .csv files use ; (semicolon) as a column separator.

Single cell may containt multiple paragraphs or structures split with #SEP separator. Following the example below the Description cell consists of 3 paragraphs. Some of the cells depend on others, looking into Buttons cell there are 2 buttons names split with the separator and respective URLs in Buttons_URL.

Corpus	Corpus_URL	Language	Size	Annotation	Licence	Description	Buttons	Buttons_URL	Publication	Publication_URL	Note
Example Corpus Name	www.examplaryurl.com	English	100 million tokens	tokenised, PoS-tagged, lemmatised	CC-BY	First examplary sentence #SEPSecond examplary sentence to be started from new line #SEPExample with `<a href="http://some.url">hyperlink</a>` in it	Concordancer#SEPDownload	https://www.concordancer.com/ #SEPhttps://www.download.com	Smith et al. (3019)	https://publication.url	Note text to be displayed in button field

Resulting table:

Table titles and ordering

Table title will be derived from the .csv file name in format X-table_title.csv, where X is index used for table ordering. Tables can be grouped into sections by storing them in the intermediate directory within corpora that is subject to the same indexation principle as .csv files. For example corpora with structure:

Historical corpora
├── 1-Historical corpora in the CLARIN infrastructure
│   ├── 1-Monolingual corpora.csv
│   └── 2-Multilingual corpora.csv
└── 2-Other historical corpora
    ├── 1-Monolingual corpora.csv
    └── 2-Multilingual corpora.csv

Will produce:

Rules format

Rules are composed of nested json notation of tags and field. Given rule:

{"tags": [
	{"tag": "<table class=\"table\" cellspacing=\"2\">", "tags": [
		{"tag": "<thead>", "tags": [
			{"tag": "<tr>", "tags": [
				{"tag": "<th>", "text": "Corpus name"}
			]}	
		]},
		{"tag": "<tbody>", "tags": [
			{"tag": "<tr>", "tags": [
				{"tag": "<td valign=\"top\"", "tags": [
					{"tag": "<p>", "fields": [
						{"text": "<strong>Field data</strong> will be inserted here: %s", "columns": ['column_name_in_csv_file']}
					]}
				]}
			]}
		]}
	]}
]}

Generated html table with names of corpora, assuming there were only 2 rows in a .csv file

<table class ="table" cellspacing="2">
        <thead>
                <tr>
                        <th valign="top">Corpus name
                        </th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td valign="top">
                                <p>
                                <strong>Field data</strong> will be inserted here: NKJP 2.1.4
                                </p>
                        </td>
                </tr>
        </tbody>
        <tbody>
                <tr>
                        <td valign="top">
                                <p>
                                <strong>Field data</strong> will be inserted here: Common Crawl
                                </p>
                        </td>
                </tr>
        </tbody>
</table>

Corpus name
Some text here Field data will be inserted here: NKJP 2.1.4
Some text here Field data will be inserted here: Common Crawl

<tbody> tag encloses tags and fields for row creation, only tags nested within <tbody> ... </tbody> can contain "fields": []

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.github/workflows		.github/workflows
rfhg		rfhg
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
generate-tables.sh		generate-tables.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClarTable

Installation

About

Usage

Locally:

Via CI:

CSV format

Table titles and ordering

Rules format

About

Releases 1

Packages

Contributors 6

Languages

License

clarin-eric/resource-families-html-generator

Folders and files

Latest commit

History

Repository files navigation

ClarTable

Installation

About

Usage

Locally:

Via CI:

CSV format

Table titles and ordering

Rules format

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages