updated README.md

mtiessler · mtiessler · commit 8894044f35cf · 2025-02-17T11:08:54.000+01:00
diff --git a/README.md b/README.md
@@ -1,194 +1,117 @@
-# Knowledge Graph Repository
+# MApp-KG (Old Knowledge Graph Repository)
 
-The *KnowledgeGraphRepository* is developed as a Java-based Spring Boot service using the RDF4Jframework to build the hook with a GraphDB repository instance. 
+The *MApp-KG* is developed as a Java-based Spring Boot service using the RDF4J framework to build the hook with a GraphDB repository instance.
 
 ## Description
 
-This software component provides and API for querying, updating and extracting knowledge from a graph database. 
+This software component provides an API for querying, updating, and extracting knowledge from a graph database.
 
-## Used technologies
+## Used Technologies
 
 | Component   | Description                                                                           | Version |
 |-------------|---------------------------------------------------------------------------------------|---------|
-| Spring Boot | Collection of java libraries for creating REST APIs                                   | 2.7.1   |
+| Spring Boot | Collection of Java libraries for creating REST APIs                                   | 2.7.1   |
 | RDF4J       | Java library for manipulating RDF graphs                                              | 3.0.0   | 
-| GraphDB     | GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0  |
+| GraphDB     | GraphDB is an enterprise-ready Semantic Graph Database, compliant with W3C Standards. | 10.1.0  |
 
+## How to Configure
 
+Configure the GraphDB connection by setting the proper values for `db.url`, `db.username`, and `db.password` in `src/main/resources/application.properties` and `src/main/resources/application-gessi.properties` (if you use docker) or  and `src/main/resources/application-localhost.properties` (if you use localhost).
 
-## How to configure
+Configure the RML file path by setting the proper value for `rml.path` to use a custom RML file for schema integration.
 
-Configure the GraphDB connection by setting the proper values for ```db.url```, ```db.username``` and ```db.password``` in ```src/main/resources/application.properties```.
-
-Configure the RML file path by setting proper value for ```rml.path``` to use a custom RML file for schema integration.
-
-## How to build
+## How to Build
 
 To build the project, run the following command:
 
-```mvn clean install package```
+```sh
+mvn clean install package
+```
 
-## How to use
+## How to Use
 
-To run the service using Java from the generated package (.jar), run the following command:
+To run the service using Java from the generated package (`.jar`), run the following command:
 
-```java -jar target/repo-0.0.1-SNAPSHOT.jar```
+```sh
+java -jar target/repo-0.0.1-SNAPSHOT.jar
+```
 
-To deploy the service in a Docker container, run the following commands from project root:
+To deploy the service in a Docker container, follow these steps:
 
-```docker build -t {image-name}```
-```docker run -d -p {port#}:{port#} {image-name}```
+### Build Docker Image
+```sh
+docker build -t kg_repository .
+```
 
+### Run Docker Container
+```sh
+docker run -d -p 3003:3003 --name KG_Repository kg_repository
+```
 
-## How to deploy (old)
-1. 
-    `docker build -t knowledge_graph_repository:latest .`
-2. 
-    `docker run -d --name KG_Repository -p 3003:3003 knowledge_graph_repository:latest`
+## How to Deploy (New Method)
 
-## How to deploy (new)
+### Step 1: Pull Image
+```sh
+docker pull mtiessler/kg_repository:latest
+```
 
-### Step 1: Pull image
-`docker pull mtiessler/kg_repository:latest`
-### Step 2: Build image
-`docker build -t mtiessler/kg_repository:latest .`
-### Step 3: Create kg_repository.env file
-Here go the credentials to access the SPARQL Database. 
-The .env file has to be in the directory where the commands are being run. 
+### Step 2: Build Image (if needed)
+```sh
+docker build -t mtiessler/kg_repository:latest .
+```
+
+### Step 3: Create `kg_repository.env` File
+This file contains the credentials required to access the SPARQL database.
+The `.env` file must be in the same directory where the commands are executed.
 
 ```
 DB_USERNAME=username
-DB_PASSWORD=password 
+DB_PASSWORD=password
 ```
-## Features
 
-The API of the App Data Repository is available here: http://localhost:8080/swagger-ui/. Below we provide a brief summarization of the main functionalities integrated in the last version of this service.
+## Features
 
-Main methods for data import are listed below:
+The API of the MAPP-KG is available in the [Postman Collection](https://www.postman.com/gessi-fib-upc/gessi-nlp4se/collection/ak3s503/mapp-kg-old-app-repo?action=share&source=copy-link&creator=32448387)
 
-- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as body for the HTTP request. See the Swagger doc for the schema.
-- **Add Mobile Apps (RDF format)**: Store all triplets withina given RDF file.
+### Main Data Import Methods:
+- **Add Mobile Apps (JSON format)**: Store a list of mobile apps using a JSON Array of mobile apps as the body of the HTTP request. See the Swagger documentation for the schema.
+- **Add Mobile Apps (RDF format)**: Store all triples within a given RDF file.
 - **Add Mobile Apps (RML-based)**: Store all mobile apps extracted from a JSON file using a given RML mapping instance.
 
-In addition, based on inductive knowledge generation techniques:
-
-- Send a POST request to /derivedNLFeatures to send textual data (i.e. descriptions, summaries, changelogs and/or reviews) through a natural language pipeline in order to extract potential app features. This requests needs the following query parameters:
-  - documentType: the type of document to be processed. Possible values are: DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED and ALL.
-  - batch-size: the number of documents to be processed at once.
-  - from: offset. A value of n tells the service to start processing documents from the n-th onwards.
-  - (optional) maxSubj: The subjectivity threshold. When processing reviews, all reviews above this threshold won't go through the NL pipeline.
-- Send a POST request to /computeFeatureSimilarity to find and match synonyms between app features. This method accepts a "threshold" request parameter between 0 and 1. Default value is 0.5.
-- Send a DELETE request to /deleteFeatureSimilarities to undo feature synonymy computed with /computeFeatureSimilarity.
-
-## File structure
-
-- \src\main\java\upc.edu.gessi.repo
-  - AppGraphRepoApplication.java: Main class.
-  - \controller: this package contains the repositories for processing HTTP requests.
-  	- GraphDBController.java: Logic for storing and retrieving data from the GraphDB repository.
-    - InductiveKnowledgeController.java: auxiliary repository handling extended knowledge generation embedded into the system.
-  - \domain: this package contains entities for the domain.
-  - \service: this package includes the services that build this application.
-    - GraphDBService.java: main service. It contains methods for querying and updating the database.
-    - NLFeatureService.java: auxiliary service that communicates with a remote NL service for feature extraction.
-  - \utils: package with several auxiliary functions.
-
-## RDF graph example
-You can find an RDF graph instance already populated with app info in [data/statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data). The data was originally obtained using the https://github.com/gessi-chatbots/app_data_scanner_service service.
-App info includes, among other info:
-
+### Inductive Knowledge Generation:
+- **Extract Features**: Send a `POST` request to `/derivedNLFeatures` with textual data (descriptions, summaries, changelogs, and/or reviews) to extract potential app features.
+  - **Query parameters:**
+    - `documentType`: Type of document to be processed (DESCRIPTION, SUMMARY, CHANGELOG, REVIEWS, USER_ANNOTATED, ALL).
+    - `batch-size`: Number of documents processed at once.
+    - `from`: Offset to start processing from the nth document.
+    - (Optional) `maxSubj`: Subjectivity threshold (reviews above this won't be processed).
+- **Feature Similarity Matching**: Send a `POST` request to `/computeFeatureSimilarity` to find and match synonyms between app features.
+  - Accepts a `threshold` parameter (between 0 and 1, default is 0.5).
+- **Undo Feature Synonymy**: Send a `DELETE` request to `/deleteFeatureSimilarities` to undo feature synonymy computed with `/computeFeatureSimilarity`.
+
+## File Structure
+
+- `src/main/java/upc/edu/gessi/repo`
+  - **AppGraphRepoApplication.java**: Main class.
+  - **Controller Package**: Handles HTTP requests.
+    - `GraphDBController.java`: Logic for storing and retrieving data from the GraphDB repository.
+    - `InductiveKnowledgeController.java`: Auxiliary repository handling extended knowledge generation.
+  - **Domain Package**: Contains domain-specific entities.
+  - **Service Package**: Business logic and database interaction.
+    - `GraphDBService.java`: Main service containing methods for querying and updating the database.
+    - `NLFeatureService.java`: Auxiliary service that communicates with a remote NL service for feature extraction.
+  - **Utils Package**: Auxiliary functions.
+
+## RDF Graph Example
+
+You can find an RDF graph instance already populated with app info in [statements.zip](https://github.com/gessi-chatbots/app_data_repository/tree/master/data).
+
+The data was originally obtained using the [App Data Scanner Service](https://github.com/gessi-chatbots/app_data_scanner_service).
+
+App info includes:
 - Package name
 - Description
 - Summary
 - Changelog
 - Reviews
 - Annotated features
-
-
-## Queries
-
-### Find most 100 000 recent reviews of a given market segment
-
-``` 
-PREFIX schema: <https://schema.org/>
-PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-
-SELECT ?subject ?predicate ?object
-WHERE {
-    ?mobileApp rdf:type schema:MobileApplication ;
-               schema:applicationCategory "COMMUNICATION" ;
-               schema:review ?subject .
-    ?subject rdf:type schema:Review ;
-             ?predicate ?object .
-}
-ORDER BY DESC(?datePublished)
-LIMIT 100000
-```
-
-### Feature Occurrences
-```
-PREFIX schema: <https://schema.org/>
-PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-
-SELECT ?identifier (SUM(?occurrences) AS ?totalOccurrences)
-WHERE {
-  {
-    SELECT ?identifierDigitalDocument (COUNT(?identifierDigitalDocument) AS ?occurrences)
-    WHERE {
-      ?digitalDocument rdf:type schema:DigitalDocument ;
-                       schema:keywords ?keywords .
-      ?keywords rdf:type schema:DefinedTerm ;
-                schema:identifier ?identifierDigitalDocument .
-    }
-    GROUP BY ?identifierDigitalDocument
-  }
-  UNION
-  {
-    SELECT ?identifierReview (COUNT(?identifierReview) AS ?occurrences)
-    WHERE {
-      ?review rdf:type schema:Review;
-              schema:keywords ?definedReviewTerm.
-      ?definedReviewTerm rdf:type schema:DefinedTerm;
-                         schema:identifier ?identifierReview .
-    }
-    GROUP BY ?identifierReview
-  }
-}
-GROUP BY (COALESCE(?identifierDigitalDocument, ?identifierReview) AS ?identifier)
-ORDER BY DESC(?totalOccurrences)
-
-```
-### Clean unrefenced reviews
-```
-PREFIX schema: <https://schema.org/>
-DELETE {
-    ?mobileApp schema:review ?emptyReview .
-}
-WHERE {
-    ?mobileApp schema:review ?emptyReview .
-    FILTER NOT EXISTS { ?emptyReview ?p ?o }
-}
-```
-
-### Count property document features
-```
-PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-PREFIX schema: <https://schema.org/>
-
-SELECT ?appName ?summary ?description (COUNT(DISTINCT ?summaryFeatures) AS ?countSummaryFeatures) (COUNT(DISTINCT ?descriptionFeatures) AS ?countDescriptionFeatures)
-WHERE {
-  ?app rdf:type schema:MobileApplication;
-       schema:identifier ?appName;
-       schema:abstract ?summary;
-       schema:description ?description .
-       
-  OPTIONAL {
-    ?summary schema:keywords ?summaryFeatures .
-  }
-  OPTIONAL {
-    ?description schema:keywords ?descriptionFeatures .
-  }
-}
-GROUP BY ?appName ?summary ?description
-
-```