Skip to content

Commit 8894044

Browse files
committed
updated README.md
1 parent e323686 commit 8894044

File tree

1 file changed

+77
-154
lines changed

1 file changed

+77
-154
lines changed

README.md

Lines changed: 77 additions & 154 deletions
Original file line numberDiff line numberDiff line change
@@ -1,194 +1,117 @@
1-
# Knowledge Graph Repository
1+
# MApp-KG (Old Knowledge Graph Repository)
22

3-
The *KnowledgeGraphRepository* is developed as a Java-based Spring Boot service using the RDF4Jframework to build the hook with a GraphDB repository instance.
3+
The *MApp-KG* is developed as a Java-based Spring Boot service using the RDF4J framework to build the hook with a GraphDB repository instance.
44

55
## Description
66

7-
This software component provides and API for querying, updating and extracting knowledge from a graph database.
7+
This software component provides an API for querying, updating, and extracting knowledge from a graph database.
88

9-
## Used technologies
9+
## Used Technologies
1010

1111
| Component | Description | Version |
1212
|-------------|---------------------------------------------------------------------------------------|---------|
13-
| Spring Boot | Collection of java libraries for creating REST APIs | 2.7.1 |
13+
| Spring Boot | Collection of Java libraries for creating REST APIs | 2.7.1 |
1414
| RDF4J | Java library for manipulating RDF graphs | 3.0.0 |
15-
| GraphDB | GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0 |
15+
| GraphDB | GraphDB is an enterprise-ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0 |
1616

17+
## How to Configure
1718

19+
Configure the GraphDB connection by setting the proper values for `db.url`, `db.username`, and `db.password` in `src/main/resources/application.properties` and `src/main/resources/application-gessi.properties` (if you use docker) or and `src/main/resources/application-localhost.properties` (if you use localhost).
1820

19-
## How to configure
21+
Configure the RML file path by setting the proper value for `rml.path` to use a custom RML file for schema integration.
2022

21-
Configure the GraphDB connection by setting the proper values for ```db.url```, ```db.username``` and ```db.password``` in ```src/main/resources/application.properties```.
22-
23-
Configure the RML file path by setting proper value for ```rml.path``` to use a custom RML file for schema integration.
24-
25-
## How to build
23+
## How to Build
2624

2725
To build the project, run the following command:
2826

29-
```mvn clean install package```
27+
```sh
28+
mvn clean install package
29+
```
3030

31-
## How to use
31+
## How to Use
3232

33-
To run the service using Java from the generated package (.jar), run the following command:
33+
To run the service using Java from the generated package (`.jar`), run the following command:
3434

35-
```java -jar target/repo-0.0.1-SNAPSHOT.jar```
35+
```sh
36+
java -jar target/repo-0.0.1-SNAPSHOT.jar
37+
```
3638

37-
To deploy the service in a Docker container, run the following commands from project root:
39+
To deploy the service in a Docker container, follow these steps:
3840

39-
```docker build -t {image-name}```
40-
```docker run -d -p {port#}:{port#} {image-name}```
41+
### Build Docker Image
42+
```sh
43+
docker build -t kg_repository .
44+
```
4145

46+
### Run Docker Container
47+
```sh
48+
docker run -d -p 3003:3003 --name KG_Repository kg_repository
49+
```
4250

43-
## How to deploy (old)
44-
1.
45-
`docker build -t knowledge_graph_repository:latest .`
46-
2.
47-
`docker run -d --name KG_Repository -p 3003:3003 knowledge_graph_repository:latest`
51+
## How to Deploy (New Method)
4852

49-
## How to deploy (new)
53+
### Step 1: Pull Image
54+
```sh
55+
docker pull mtiessler/kg_repository:latest
56+
```
5057

51-
### Step 1: Pull image
52-
`docker pull mtiessler/kg_repository:latest`
53-
### Step 2: Build image
54-
`docker build -t mtiessler/kg_repository:latest .`
55-
### Step 3: Create kg_repository.env file
56-
Here go the credentials to access the SPARQL Database.
57-
The .env file has to be in the directory where the commands are being run.
58+
### Step 2: Build Image (if needed)
59+
```sh
60+
docker build -t mtiessler/kg_repository:latest .
61+
```
62+
63+
### Step 3: Create `kg_repository.env` File
64+
This file contains the credentials required to access the SPARQL database.
65+
The `.env` file must be in the same directory where the commands are executed.
5866

5967
```
6068
DB_USERNAME=username
61-
DB_PASSWORD=password
69+
DB_PASSWORD=password
6270
```
63-
## Features
6471

65-
The API of the App Data Repository is available here: http://localhost:8080/swagger-ui/. Below we provide a brief summarization of the main functionalities integrated in the last version of this service.
72+
## Features
6673

67-
Main methods for data import are listed below:
74+
The API of the MAPP-KG is available in the [Postman Collection](https://www.postman.com/gessi-fib-upc/gessi-nlp4se/collection/ak3s503/mapp-kg-old-app-repo?action=share&source=copy-link&creator=32448387)
6875

69-
- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as body for the HTTP request. See the Swagger doc for the schema.
70-
- **Add Mobile Apps (RDF format)**: Store all triplets withina given RDF file.
76+
### Main Data Import Methods:
77+
- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as the body of the HTTP request. See the Swagger documentation for the schema.
78+
- **Add Mobile Apps (RDF format)**: Store all triples within a given RDF file.
7179
- **Add Mobile Apps (RML-based)**: Store all mobile apps extracted from a JSON file using a given RML mapping instance.
7280

73-
In addition, based on inductive knowledge generation techniques:
74-
75-
- Send a POST request to /derivedNLFeatures to send textual data (i.e. descriptions, summaries, changelogs and/or reviews) through a natural language pipeline in order to extract potential app features. This requests needs the following query parameters:
76-
- documentType: the type of document to be processed. Possible values are: DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED and ALL.
77-
- batch-size: the number of documents to be processed at once.
78-
- from: offset. A value of n tells the service to start processing documents from the n-th onwards.
79-
- (optional) maxSubj: The subjectivity threshold. When processing reviews, all reviews above this threshold won't go through the NL pipeline.
80-
- Send a POST request to /computeFeatureSimilarity to find and match synonyms between app features. This method accepts a "threshold" request parameter between 0 and 1. Default value is 0.5.
81-
- Send a DELETE request to /deleteFeatureSimilarities to undo feature synonymy computed with /computeFeatureSimilarity.
82-
83-
## File structure
84-
85-
- \src\main\java\upc.edu.gessi.repo
86-
- AppGraphRepoApplication.java: Main class.
87-
- \controller: this package contains the repositories for processing HTTP requests.
88-
- GraphDBController.java: Logic for storing and retrieving data from the GraphDB repository.
89-
- InductiveKnowledgeController.java: auxiliary repository handling extended knowledge generation embedded into the system.
90-
- \domain: this package contains entities for the domain.
91-
- \service: this package includes the services that build this application.
92-
- GraphDBService.java: main service. It contains methods for querying and updating the database.
93-
- NLFeatureService.java: auxiliary service that communicates with a remote NL service for feature extraction.
94-
- \utils: package with several auxiliary functions.
95-
96-
## RDF graph example
97-
You can find an RDF graph instance already populated with app info in [data/statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data). The data was originally obtained using the https://github.com/gessi-chatbots/app_data_scanner_service service.
98-
App info includes, among other info:
99-
81+
### Inductive Knowledge Generation:
82+
- **Extract Features**: Send a `POST` request to `/derivedNLFeatures` with textual data (descriptions, summaries, changelogs, and/or reviews) to extract potential app features.
83+
- **Query parameters:**
84+
- `documentType`: Type of document to be processed (DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED, ALL).
85+
- `batch-size`: Number of documents processed at once.
86+
- `from`: Offset to start processing from the nth document.
87+
- (Optional) `maxSubj`: Subjectivity threshold (reviews above this won't be processed).
88+
- **Feature Similarity Matching**: Send a `POST` request to `/computeFeatureSimilarity` to find and match synonyms between app features.
89+
- Accepts a `threshold` parameter (between 0 and 1, default is 0.5).
90+
- **Undo Feature Synonymy**: Send a `DELETE` request to `/deleteFeatureSimilarities` to undo feature synonymy computed with `/computeFeatureSimilarity`.
91+
92+
## File Structure
93+
94+
- `src/main/java/upc/edu/gessi/repo`
95+
- **AppGraphRepoApplication.java**: Main class.
96+
- **Controller Package**: Handles HTTP requests.
97+
- `GraphDBController.java`: Logic for storing and retrieving data from the GraphDB repository.
98+
- `InductiveKnowledgeController.java`: Auxiliary repository handling extended knowledge generation.
99+
- **Domain Package**: Contains domain-specific entities.
100+
- **Service Package**: Business logic and database interaction.
101+
- `GraphDBService.java`: Main service containing methods for querying and updating the database.
102+
- `NLFeatureService.java`: Auxiliary service that communicates with a remote NL service for feature extraction.
103+
- **Utils Package**: Auxiliary functions.
104+
105+
## RDF Graph Example
106+
107+
You can find an RDF graph instance already populated with app info in [statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data).
108+
109+
The data was originally obtained using the [App Data Scanner Service](https://github.com/gessi-chatbots/app_data_scanner_service).
110+
111+
App info includes:
100112
- Package name
101113
- Description
102114
- Summary
103115
- Changelog
104116
- Reviews
105117
- Annotated features
106-
107-
108-
## Queries
109-
110-
### Find most 100 000 recent reviews of a given market segment
111-
112-
```
113-
PREFIX schema: <https://schema.org/>
114-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
115-
116-
SELECT ?subject ?predicate ?object
117-
WHERE {
118-
?mobileApp rdf:type schema:MobileApplication ;
119-
schema:applicationCategory "COMMUNICATION" ;
120-
schema:review ?subject .
121-
?subject rdf:type schema:Review ;
122-
?predicate ?object .
123-
}
124-
ORDER BY DESC(?datePublished)
125-
LIMIT 100000
126-
```
127-
128-
### Feature Occurrences
129-
```
130-
PREFIX schema: <https://schema.org/>
131-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
132-
133-
SELECT ?identifier (SUM(?occurrences) AS ?totalOccurrences)
134-
WHERE {
135-
{
136-
SELECT ?identifierDigitalDocument (COUNT(?identifierDigitalDocument) AS ?occurrences)
137-
WHERE {
138-
?digitalDocument rdf:type schema:DigitalDocument ;
139-
schema:keywords ?keywords .
140-
?keywords rdf:type schema:DefinedTerm ;
141-
schema:identifier ?identifierDigitalDocument .
142-
}
143-
GROUP BY ?identifierDigitalDocument
144-
}
145-
UNION
146-
{
147-
SELECT ?identifierReview (COUNT(?identifierReview) AS ?occurrences)
148-
WHERE {
149-
?review rdf:type schema:Review;
150-
schema:keywords ?definedReviewTerm.
151-
?definedReviewTerm rdf:type schema:DefinedTerm;
152-
schema:identifier ?identifierReview .
153-
}
154-
GROUP BY ?identifierReview
155-
}
156-
}
157-
GROUP BY (COALESCE(?identifierDigitalDocument, ?identifierReview) AS ?identifier)
158-
ORDER BY DESC(?totalOccurrences)
159-
160-
```
161-
### Clean unrefenced reviews
162-
```
163-
PREFIX schema: <https://schema.org/>
164-
DELETE {
165-
?mobileApp schema:review ?emptyReview .
166-
}
167-
WHERE {
168-
?mobileApp schema:review ?emptyReview .
169-
FILTER NOT EXISTS { ?emptyReview ?p ?o }
170-
}
171-
```
172-
173-
### Count property document features
174-
```
175-
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
176-
PREFIX schema: <https://schema.org/>
177-
178-
SELECT ?appName ?summary ?description (COUNT(DISTINCT ?summaryFeatures) AS ?countSummaryFeatures) (COUNT(DISTINCT ?descriptionFeatures) AS ?countDescriptionFeatures)
179-
WHERE {
180-
?app rdf:type schema:MobileApplication;
181-
schema:identifier ?appName;
182-
schema:abstract ?summary;
183-
schema:description ?description .
184-
185-
OPTIONAL {
186-
?summary schema:keywords ?summaryFeatures .
187-
}
188-
OPTIONAL {
189-
?description schema:keywords ?descriptionFeatures .
190-
}
191-
}
192-
GROUP BY ?appName ?summary ?description
193-
194-
```

0 commit comments

Comments
 (0)