Skip to content

Commit

Permalink
Add Text+ LexFCS spec (WIP)
Browse files Browse the repository at this point in the history
  • Loading branch information
Querela committed Apr 10, 2024
1 parent 533ed82 commit 452dcf1
Show file tree
Hide file tree
Showing 14 changed files with 631 additions and 2 deletions.
7 changes: 7 additions & 0 deletions .github/pages/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,13 @@ <h3 class="h4">Work in Progress</h3>
href="https://github.com/clarin-eric/fcs-misc/tree/main/fcs-aai" rel="noopener"
target="_blank">Sources</a>
</li>
<li>
<svg class="bi mb-1" width="16" height="16">
<use xlink:href="#arrow-right-circle"></use>
</svg> LexFCS 1.0: <a href="lexfcs-specs/lexfcs.html" rel="noopener" target="_blank">HTML</a>, <a
href="lexfcs-specs/lexfcs.pdf" rel="noopener" target="_blank">PDF</a>, <a
href="https://github.com/clarin-eric/fcs-misc/tree/main/lexfcs" rel="noopener" target="_blank">Sources</a>
</li>
</ul>
</div>

Expand Down
37 changes: 37 additions & 0 deletions .github/workflows/build-lexfcs-adoc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: build <lexfcs> adocs

on:
push:
branches:
- main
paths:
- 'lexfcs/**'
- '.github/workflows/build-lexfcs-adoc.yml'
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
container: asciidoctor/docker-asciidoctor

steps:
- uses: actions/checkout@v4

- name: Build HTML
run: asciidoctor -v -D docs -a data-uri --backend=html5 -o lexfcs.html lexfcs/index.adoc

- name: Build PDF
run: asciidoctor-pdf -v -D docs -o lexfcs.pdf lexfcs/index.adoc

- name: Copy attachments
run: cp -R lexfcs/attachments docs/

- name: Store results
uses: actions/upload-artifact@v4
with:
name: lexfcs-specs
path: docs/*
27 changes: 25 additions & 2 deletions .github/workflows/publish-gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,28 @@ jobs:
name: fcs-aai-specs
path: docs/*

lexfcs:
runs-on: ubuntu-latest
container: asciidoctor/docker-asciidoctor

steps:
- uses: actions/checkout@v4

- name: Build HTML
run: asciidoctor -v -D docs -a data-uri --backend=html5 -o lexfcs.html lexfcs/index.adoc

- name: Build PDF
run: asciidoctor-pdf -v -D docs -o lexfcs.pdf lexfcs/index.adoc

- name: Copy attachments
run: cp -R lexfcs/attachments docs/

- name: Store results
uses: actions/upload-artifact@v4
with:
name: lexfcs-specs
path: docs/*

fcs-endpoint-dev-slides:
runs-on: ubuntu-latest
container: asciidoctor/docker-asciidoctor
Expand Down Expand Up @@ -149,10 +171,11 @@ jobs:
runs-on: ubuntu-latest
needs:
[
fcs-aai,
fcs-core-1-0,
fcs-core-2-0,
fcs-core-1-0,
fcs-dataviews-1-0,
fcs-aai,
lexfcs,
fcs-endpoint-dev-slides,
fcs-endpoint-dev-slides-as-doc,
]
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Jump to: [[Specification Documents](#specification-documents)]
- [CLARIN Federated Content Search - FCS **Core 1.0**: `fcs-core-1.0/index.adoc`](fcs-core-1.0/index.adoc)
- [CLARIN Federated Content Search - FCS **Data Views 1.0**: `fcs-dataviews-1.0/index.adoc`](fcs-dataviews-1.0/index.adoc)
- _WIP_ [CLARIN Federated Content Search - FCS **AAI 1.0**: `fcs-aai/index.adoc`](fcs-aai/index.adoc)
- _WIP_ [Text+ **LexFCS 1.0**: `lexfcs/index.adoc`](lexfcs/index.adoc)

### Folder Structure

Expand Down
85 changes: 85 additions & 0 deletions lexfcs/attachments/DataView-LexHits.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning"
xmlns:hits="http://textplus.org/fcs/dataview/hits"
xml:lang="en" vc:minVersion="1.0" vc:maxVersion="1.1"
targetNamespace="http://textplus.org/fcs/dataview/hits" elementFormDefault="qualified">

<xs:annotation>
<xs:documentation>
<h:p>
This schema defines the structure of a
<h:em>generic result</h:em> data view. All CLARIN-FCS endpoints
MUST support this data view.
</h:p>
<h:p>
The value <h:code>application/x-clarin-fcs-hits+xml</h:code>
MUST be used to indicate a <h:em>generic result</h:em> data view.
</h:p>
</xs:documentation>
</xs:annotation>

<xs:element name="Result">
<xs:annotation>
<xs:documentation>
<h:p>
A single result line with one or more marked hits.
White-space is considered <h:em>non-signification</h:em>,
except for delimiting tokens.
</h:p>
<h:p>
CLARIN-FCS client MAY
normalize white-space and strip leading and tailing
white-space and collapse all white-space between
tokens to a single #x20 character.
</h:p>
</xs:documentation>
</xs:annotation>
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="Hit" type="hits:hitType" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>
<h:p>
A hit highlight or a field type annotation. It SHALL not be empty.
</h:p>
<h:p>
One <h:code>Result</h:code> element MUST
one <h:code>Hit</h:code> element, but MAY
contain more than one.
</h:p>
</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:complexType name="hitType" mixed="true">
<!-- nested content (highlighting?)
<xs:sequence>
<xs:element name="Hit" type="xs:string" minOccurs="0" maxOccurs="unbounded"></xs:element>
</xs:sequence>
-->
<xs:attribute name="kind" type="hits:fieldType" use="optional">
<xs:annotation>
<xs:documentation>
<h:p>
Field type identifier for this annotation. Is used in the &lt;Hit&gt; element to determine which function the annotated text has.
</h:p>
</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>

<xs:simpleType name="fieldType">
<xs:restriction base="xs:string">
<!-- <xs:pattern value="[a-zA-Z][a-zA-Z0-9]*" /> -->
<xs:enumeration value="lex-lemma"/>
<xs:enumeration value="lex-pos"/>
<xs:enumeration value="lex-def"/>
<!-- <xs:enumeration value="query"/> -->
</xs:restriction>
</xs:simpleType>
</xs:schema>
6 changes: 6 additions & 0 deletions lexfcs/attachments/lexhits-example.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<Result xmlns="http://textplus.org/fcs/dataview/hits"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://textplus.org/fcs/dataview/hits ./DataView-LexHits.xsd">
The quick brown <Hit kind="lex-lemma">fox</Hit> jumps <Hit kind="lex-pos">over</Hit> the lazy <Hit>dog</Hit>.
</Result>
40 changes: 40 additions & 0 deletions lexfcs/dataviews.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
= LexFCS Data Views
:description: FCS DataViews for Lex Search.


Data formats for result representation.


== Extension of the Hits Data View for LexFCS

Based on:

* <<ref:CLARIN-FCSCore20,FCS Core 2.0 specification>> (section "Basic Search", §2.2.3.2)
* Hits XML schema https://github.com/clarin-eric/fcs-misc/blob/main/schema/Core_2/DataView-Hits.xsd["DataView-Hits.xsd"]

.Example of basic *Hits* Data View
[source,xml]
----
<!-- potential @pid and @ref attributes omitted -->
<fcs:DataView type="application/x-clarin-fcs-hits+xml">
<hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy<hits:Hit>dog</hits:Hit>.</hits:Result>
</fcs:DataView>
----

Reuse of the `<hits:Hit>` element, with the extension of content hinting by using an optional attribute `@kind` with the following allowed values:

* `lex-lemma`: Lemma,
* `lex-pos`: Part of speech,
* `lex-def`: Definition.

Textual content outside of `<hits:Hit>` are displayed unchanged.

.Example of extended *Hits* Data View with additional `@kind` attributes
[source,xml]
----
<fcs:DataView type="application/x-textplus-fcs-hits+xml">
<hits:Result xmlns:hits="http://textplus.org/fcs/dataview/hits"><hits:Hit kind="lex-lemma">Apple</hits:Hit>: <hits:Hit kind="lex-pos">NOUN</hits:Hit>. <hits:Hit kind="lex-def">An apple is an edible fruit produced by an apple tree.</hits:Hit></hits:Result>
</fcs:DataView>
----

Endpoints `MUST` generate responses that are valid according to the XML schema link:attachments/DataView-LexHits.xsd["DataView-LexHits.xsd"].
44 changes: 44 additions & 0 deletions lexfcs/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
= Federated Content Search for Lexical Resources (LexFCS): Specification
Erik Körner <koerner@saw-leipzig.de>; Thomas Eckart <eckart@saw-leipzig.de>; Axel Herold <herold@bbaw.de>; Frank Wiegand <wiegand@bbaw.de>; Frank Michaelis <michaelis@ids-mannheim.de>; Matthias Bremm <bremm@uni-trier.de>; Louis Cotgrove <cotgrove@ids-mannheim.de>; Thorsten Trippel <trippel@ids-mannheim.de>; Felix Rau <f.rau@uni-koeln.de>
v0.1, 2023-05-04
// more metadata
:description: Specification extension of the CLARIN Federated Content Search (FCS) for Lexical Resources (LexFCS).
:organization: Text+
// settings
:doctype: book
// source code
:source-highlighter: rouge
:rouge-style: igor_pro
// toc and heading
:toc:
:toclevels: 4
:sectnums:
:sectnumlevels: 4
:appendix-caption!:
// directory stuff
:imagesdir: images
// pdf
ifdef::backend-pdf[]
:pdf-theme: textplus
:pdf-themesdir: {docdir}/themes
:title-logo-image: image:{docdir}/themes/textplus-logo.svg[pdfwidth=3.25in,align=center]
endif::[]

//ifdef::backend-pdf[]
//[%notitle]
//--
//[abstract]
//{description}
//--
//endif::[]

include::introduction.adoc[leveloffset=+1]

include::lexcql.adoc[leveloffset=+1]

include::dataviews.adoc[leveloffset=+1]

[appendix]
== Normative Appendix

include::lexcql-contextset.adoc[leveloffset=+2]
Loading

0 comments on commit 452dcf1

Please sign in to comment.