|
1 |
| -# Knowledge Graph Repository |
| 1 | +# MApp-KG (Old Knowledge Graph Repository) |
2 | 2 |
|
3 |
| -The *KnowledgeGraphRepository* is developed as a Java-based Spring Boot service using the RDF4Jframework to build the hook with a GraphDB repository instance. |
| 3 | +The *MApp-KG* is developed as a Java-based Spring Boot service using the RDF4J framework to build the hook with a GraphDB repository instance. |
4 | 4 |
|
5 | 5 | ## Description
|
6 | 6 |
|
7 |
| -This software component provides and API for querying, updating and extracting knowledge from a graph database. |
| 7 | +This software component provides an API for querying, updating, and extracting knowledge from a graph database. |
8 | 8 |
|
9 |
| -## Used technologies |
| 9 | +## Used Technologies |
10 | 10 |
|
11 | 11 | | Component | Description | Version |
|
12 | 12 | |-------------|---------------------------------------------------------------------------------------|---------|
|
13 |
| -| Spring Boot | Collection of java libraries for creating REST APIs | 2.7.1 | |
| 13 | +| Spring Boot | Collection of Java libraries for creating REST APIs | 2.7.1 | |
14 | 14 | | RDF4J | Java library for manipulating RDF graphs | 3.0.0 |
|
15 |
| -| GraphDB | GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0 | |
| 15 | +| GraphDB | GraphDB is an enterprise-ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0 | |
16 | 16 |
|
| 17 | +## How to Configure |
17 | 18 |
|
| 19 | +Configure the GraphDB connection by setting the proper values for `db.url`, `db.username`, and `db.password` in `src/main/resources/application.properties` and `src/main/resources/application-gessi.properties` (if you use docker) or and `src/main/resources/application-localhost.properties` (if you use localhost). |
18 | 20 |
|
19 |
| -## How to configure |
| 21 | +Configure the RML file path by setting the proper value for `rml.path` to use a custom RML file for schema integration. |
20 | 22 |
|
21 |
| -Configure the GraphDB connection by setting the proper values for ```db.url```, ```db.username``` and ```db.password``` in ```src/main/resources/application.properties```. |
22 |
| - |
23 |
| -Configure the RML file path by setting proper value for ```rml.path``` to use a custom RML file for schema integration. |
24 |
| - |
25 |
| -## How to build |
| 23 | +## How to Build |
26 | 24 |
|
27 | 25 | To build the project, run the following command:
|
28 | 26 |
|
29 |
| -```mvn clean install package``` |
| 27 | +```sh |
| 28 | +mvn clean install package |
| 29 | +``` |
30 | 30 |
|
31 |
| -## How to use |
| 31 | +## How to Use |
32 | 32 |
|
33 |
| -To run the service using Java from the generated package (.jar), run the following command: |
| 33 | +To run the service using Java from the generated package (`.jar`), run the following command: |
34 | 34 |
|
35 |
| -```java -jar target/repo-0.0.1-SNAPSHOT.jar``` |
| 35 | +```sh |
| 36 | +java -jar target/repo-0.0.1-SNAPSHOT.jar |
| 37 | +``` |
36 | 38 |
|
37 |
| -To deploy the service in a Docker container, run the following commands from project root: |
| 39 | +To deploy the service in a Docker container, follow these steps: |
38 | 40 |
|
39 |
| -```docker build -t {image-name}``` |
40 |
| -```docker run -d -p {port#}:{port#} {image-name}``` |
| 41 | +### Build Docker Image |
| 42 | +```sh |
| 43 | +docker build -t kg_repository . |
| 44 | +``` |
41 | 45 |
|
| 46 | +### Run Docker Container |
| 47 | +```sh |
| 48 | +docker run -d -p 3003:3003 --name KG_Repository kg_repository |
| 49 | +``` |
42 | 50 |
|
43 |
| -## How to deploy (old) |
44 |
| -1. |
45 |
| - `docker build -t knowledge_graph_repository:latest .` |
46 |
| -2. |
47 |
| - `docker run -d --name KG_Repository -p 3003:3003 knowledge_graph_repository:latest` |
| 51 | +## How to Deploy (New Method) |
48 | 52 |
|
49 |
| -## How to deploy (new) |
| 53 | +### Step 1: Pull Image |
| 54 | +```sh |
| 55 | +docker pull mtiessler/kg_repository:latest |
| 56 | +``` |
50 | 57 |
|
51 |
| -### Step 1: Pull image |
52 |
| -`docker pull mtiessler/kg_repository:latest` |
53 |
| -### Step 2: Build image |
54 |
| -`docker build -t mtiessler/kg_repository:latest .` |
55 |
| -### Step 3: Create kg_repository.env file |
56 |
| -Here go the credentials to access the SPARQL Database. |
57 |
| -The .env file has to be in the directory where the commands are being run. |
| 58 | +### Step 2: Build Image (if needed) |
| 59 | +```sh |
| 60 | +docker build -t mtiessler/kg_repository:latest . |
| 61 | +``` |
| 62 | + |
| 63 | +### Step 3: Create `kg_repository.env` File |
| 64 | +This file contains the credentials required to access the SPARQL database. |
| 65 | +The `.env` file must be in the same directory where the commands are executed. |
58 | 66 |
|
59 | 67 | ```
|
60 | 68 | DB_USERNAME=username
|
61 |
| -DB_PASSWORD=password |
| 69 | +DB_PASSWORD=password |
62 | 70 | ```
|
63 |
| -## Features |
64 | 71 |
|
65 |
| -The API of the App Data Repository is available here: http://localhost:8080/swagger-ui/. Below we provide a brief summarization of the main functionalities integrated in the last version of this service. |
| 72 | +## Features |
66 | 73 |
|
67 |
| -Main methods for data import are listed below: |
| 74 | +The API of the MAPP-KG is available in the [Postman Collection](https://www.postman.com/gessi-fib-upc/gessi-nlp4se/collection/ak3s503/mapp-kg-old-app-repo?action=share&source=copy-link&creator=32448387) |
68 | 75 |
|
69 |
| -- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as body for the HTTP request. See the Swagger doc for the schema. |
70 |
| -- **Add Mobile Apps (RDF format)**: Store all triplets withina given RDF file. |
| 76 | +### Main Data Import Methods: |
| 77 | +- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as the body of the HTTP request. See the Swagger documentation for the schema. |
| 78 | +- **Add Mobile Apps (RDF format)**: Store all triples within a given RDF file. |
71 | 79 | - **Add Mobile Apps (RML-based)**: Store all mobile apps extracted from a JSON file using a given RML mapping instance.
|
72 | 80 |
|
73 |
| -In addition, based on inductive knowledge generation techniques: |
74 |
| - |
75 |
| -- Send a POST request to /derivedNLFeatures to send textual data (i.e. descriptions, summaries, changelogs and/or reviews) through a natural language pipeline in order to extract potential app features. This requests needs the following query parameters: |
76 |
| - - documentType: the type of document to be processed. Possible values are: DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED and ALL. |
77 |
| - - batch-size: the number of documents to be processed at once. |
78 |
| - - from: offset. A value of n tells the service to start processing documents from the n-th onwards. |
79 |
| - - (optional) maxSubj: The subjectivity threshold. When processing reviews, all reviews above this threshold won't go through the NL pipeline. |
80 |
| -- Send a POST request to /computeFeatureSimilarity to find and match synonyms between app features. This method accepts a "threshold" request parameter between 0 and 1. Default value is 0.5. |
81 |
| -- Send a DELETE request to /deleteFeatureSimilarities to undo feature synonymy computed with /computeFeatureSimilarity. |
82 |
| - |
83 |
| -## File structure |
84 |
| - |
85 |
| -- \src\main\java\upc.edu.gessi.repo |
86 |
| - - AppGraphRepoApplication.java: Main class. |
87 |
| - - \controller: this package contains the repositories for processing HTTP requests. |
88 |
| - - GraphDBController.java: Logic for storing and retrieving data from the GraphDB repository. |
89 |
| - - InductiveKnowledgeController.java: auxiliary repository handling extended knowledge generation embedded into the system. |
90 |
| - - \domain: this package contains entities for the domain. |
91 |
| - - \service: this package includes the services that build this application. |
92 |
| - - GraphDBService.java: main service. It contains methods for querying and updating the database. |
93 |
| - - NLFeatureService.java: auxiliary service that communicates with a remote NL service for feature extraction. |
94 |
| - - \utils: package with several auxiliary functions. |
95 |
| - |
96 |
| -## RDF graph example |
97 |
| -You can find an RDF graph instance already populated with app info in [data/statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data). The data was originally obtained using the https://github.com/gessi-chatbots/app_data_scanner_service service. |
98 |
| -App info includes, among other info: |
99 |
| - |
| 81 | +### Inductive Knowledge Generation: |
| 82 | +- **Extract Features**: Send a `POST` request to `/derivedNLFeatures` with textual data (descriptions, summaries, changelogs, and/or reviews) to extract potential app features. |
| 83 | + - **Query parameters:** |
| 84 | + - `documentType`: Type of document to be processed (DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED, ALL). |
| 85 | + - `batch-size`: Number of documents processed at once. |
| 86 | + - `from`: Offset to start processing from the nth document. |
| 87 | + - (Optional) `maxSubj`: Subjectivity threshold (reviews above this won't be processed). |
| 88 | +- **Feature Similarity Matching**: Send a `POST` request to `/computeFeatureSimilarity` to find and match synonyms between app features. |
| 89 | + - Accepts a `threshold` parameter (between 0 and 1, default is 0.5). |
| 90 | +- **Undo Feature Synonymy**: Send a `DELETE` request to `/deleteFeatureSimilarities` to undo feature synonymy computed with `/computeFeatureSimilarity`. |
| 91 | + |
| 92 | +## File Structure |
| 93 | + |
| 94 | +- `src/main/java/upc/edu/gessi/repo` |
| 95 | + - **AppGraphRepoApplication.java**: Main class. |
| 96 | + - **Controller Package**: Handles HTTP requests. |
| 97 | + - `GraphDBController.java`: Logic for storing and retrieving data from the GraphDB repository. |
| 98 | + - `InductiveKnowledgeController.java`: Auxiliary repository handling extended knowledge generation. |
| 99 | + - **Domain Package**: Contains domain-specific entities. |
| 100 | + - **Service Package**: Business logic and database interaction. |
| 101 | + - `GraphDBService.java`: Main service containing methods for querying and updating the database. |
| 102 | + - `NLFeatureService.java`: Auxiliary service that communicates with a remote NL service for feature extraction. |
| 103 | + - **Utils Package**: Auxiliary functions. |
| 104 | + |
| 105 | +## RDF Graph Example |
| 106 | + |
| 107 | +You can find an RDF graph instance already populated with app info in [statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data). |
| 108 | + |
| 109 | +The data was originally obtained using the [App Data Scanner Service](https://github.com/gessi-chatbots/app_data_scanner_service). |
| 110 | + |
| 111 | +App info includes: |
100 | 112 | - Package name
|
101 | 113 | - Description
|
102 | 114 | - Summary
|
103 | 115 | - Changelog
|
104 | 116 | - Reviews
|
105 | 117 | - Annotated features
|
106 |
| - |
107 |
| - |
108 |
| -## Queries |
109 |
| - |
110 |
| -### Find most 100 000 recent reviews of a given market segment |
111 |
| - |
112 |
| -``` |
113 |
| -PREFIX schema: <https://schema.org/> |
114 |
| -PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
115 |
| -
|
116 |
| -SELECT ?subject ?predicate ?object |
117 |
| -WHERE { |
118 |
| - ?mobileApp rdf:type schema:MobileApplication ; |
119 |
| - schema:applicationCategory "COMMUNICATION" ; |
120 |
| - schema:review ?subject . |
121 |
| - ?subject rdf:type schema:Review ; |
122 |
| - ?predicate ?object . |
123 |
| -} |
124 |
| -ORDER BY DESC(?datePublished) |
125 |
| -LIMIT 100000 |
126 |
| -``` |
127 |
| - |
128 |
| -### Feature Occurrences |
129 |
| -``` |
130 |
| -PREFIX schema: <https://schema.org/> |
131 |
| -PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
132 |
| -
|
133 |
| -SELECT ?identifier (SUM(?occurrences) AS ?totalOccurrences) |
134 |
| -WHERE { |
135 |
| - { |
136 |
| - SELECT ?identifierDigitalDocument (COUNT(?identifierDigitalDocument) AS ?occurrences) |
137 |
| - WHERE { |
138 |
| - ?digitalDocument rdf:type schema:DigitalDocument ; |
139 |
| - schema:keywords ?keywords . |
140 |
| - ?keywords rdf:type schema:DefinedTerm ; |
141 |
| - schema:identifier ?identifierDigitalDocument . |
142 |
| - } |
143 |
| - GROUP BY ?identifierDigitalDocument |
144 |
| - } |
145 |
| - UNION |
146 |
| - { |
147 |
| - SELECT ?identifierReview (COUNT(?identifierReview) AS ?occurrences) |
148 |
| - WHERE { |
149 |
| - ?review rdf:type schema:Review; |
150 |
| - schema:keywords ?definedReviewTerm. |
151 |
| - ?definedReviewTerm rdf:type schema:DefinedTerm; |
152 |
| - schema:identifier ?identifierReview . |
153 |
| - } |
154 |
| - GROUP BY ?identifierReview |
155 |
| - } |
156 |
| -} |
157 |
| -GROUP BY (COALESCE(?identifierDigitalDocument, ?identifierReview) AS ?identifier) |
158 |
| -ORDER BY DESC(?totalOccurrences) |
159 |
| -
|
160 |
| -``` |
161 |
| -### Clean unrefenced reviews |
162 |
| -``` |
163 |
| -PREFIX schema: <https://schema.org/> |
164 |
| -DELETE { |
165 |
| - ?mobileApp schema:review ?emptyReview . |
166 |
| -} |
167 |
| -WHERE { |
168 |
| - ?mobileApp schema:review ?emptyReview . |
169 |
| - FILTER NOT EXISTS { ?emptyReview ?p ?o } |
170 |
| -} |
171 |
| -``` |
172 |
| - |
173 |
| -### Count property document features |
174 |
| -``` |
175 |
| -PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
176 |
| -PREFIX schema: <https://schema.org/> |
177 |
| -
|
178 |
| -SELECT ?appName ?summary ?description (COUNT(DISTINCT ?summaryFeatures) AS ?countSummaryFeatures) (COUNT(DISTINCT ?descriptionFeatures) AS ?countDescriptionFeatures) |
179 |
| -WHERE { |
180 |
| - ?app rdf:type schema:MobileApplication; |
181 |
| - schema:identifier ?appName; |
182 |
| - schema:abstract ?summary; |
183 |
| - schema:description ?description . |
184 |
| - |
185 |
| - OPTIONAL { |
186 |
| - ?summary schema:keywords ?summaryFeatures . |
187 |
| - } |
188 |
| - OPTIONAL { |
189 |
| - ?description schema:keywords ?descriptionFeatures . |
190 |
| - } |
191 |
| -} |
192 |
| -GROUP BY ?appName ?summary ?description |
193 |
| -
|
194 |
| -``` |
0 commit comments