Skip to content

Commit 43c7b0b

Browse files
lqdevCopilot
andcommitted
feat: add natural language query endpoint (/api/ask)
Add NL-to-SPARQL translation using GitHub Models GPT-4o-mini with schema-injected few-shot prompting. The /api/ask endpoint accepts natural language questions, translates to SPARQL, validates syntax via RDFLib, enforces safety constraints (block mutating queries, inject LIMIT), caches results, and retries with error feedback. - api/nl_to_sparql.py: translation module with prompt, validation, cache - api/function_app.py: new /api/ask route alongside existing /api/sparql - api/requirements.txt: add openai>=1.0 - tests/test_nl_to_sparql.py: 28 tests covering validation, safety, cache, LLM mock - README.md: document /api/ask with curl/JS/Python examples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7fbe580 commit 43c7b0b

5 files changed

Lines changed: 626 additions & 4 deletions

File tree

README.md

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,58 @@ Use [[wikilinks]] to link between articles.
7373

7474
The knowledge graph is queryable at `/api/sparql`. It accepts standard [W3C SPARQL 1.1 Protocol](https://www.w3.org/TR/sparql11-protocol/) requests and returns [SPARQL Results JSON](https://www.w3.org/TR/sparql11-results-json/).
7575

76+
### Asking Questions in Natural Language
77+
78+
Don't know SPARQL? Use the `/api/ask` endpoint to query the knowledge graph with plain English. It uses the same LLM (GitHub Models GPT-4o-mini) to translate your question into SPARQL, execute it, and return the results alongside the generated query.
79+
80+
> **Note:** Requires `GITHUB_TOKEN` environment variable (or configured as an Azure Static Web Apps app setting).
81+
82+
**cURL:**
83+
```bash
84+
curl -X POST \
85+
-H "Content-Type: application/json" \
86+
-d '{"question": "What entities are in the knowledge graph?"}' \
87+
https://<your-swa-domain>/api/ask
88+
```
89+
90+
**JavaScript:**
91+
```js
92+
const res = await fetch("/api/ask", {
93+
method: "POST",
94+
headers: { "Content-Type": "application/json" },
95+
body: JSON.stringify({ question: "Which articles mention SPARQL?" }),
96+
});
97+
const data = await res.json();
98+
console.log("Generated SPARQL:", data.sparql);
99+
data.results.results.bindings.forEach(row => console.log(row));
100+
```
101+
102+
**Python:**
103+
```python
104+
import requests
105+
106+
res = requests.post(
107+
"https://<your-swa-domain>/api/ask",
108+
json={"question": "Find all organizations"},
109+
).json()
110+
111+
print("SPARQL:", res["sparql"])
112+
for row in res["results"]["results"]["bindings"]:
113+
print(row)
114+
```
115+
116+
The response includes the generated SPARQL query so you can learn the query language as you go:
117+
118+
```json
119+
{
120+
"question": "What entities are in the knowledge graph?",
121+
"sparql": "PREFIX schema: <https://schema.org/>\nSELECT DISTINCT ?entity ?name ?type WHERE {\n ?entity a ?type ;\n schema:name ?name .\n FILTER(?type != schema:Article)\n}\nLIMIT 100",
122+
"results": { "head": { "vars": ["entity", "name", "type"] }, "results": { "bindings": [...] } }
123+
}
124+
```
125+
126+
### Querying with SPARQL Directly
127+
76128
**From a browser** — paste the URL with a `query` parameter (URL-encoded):
77129
```
78130
https://<your-swa-domain>/api/sparql?query=PREFIX%20schema%3A%20...
@@ -163,7 +215,7 @@ LIMIT 50
163215
│ ├── views/ # Precomputed JSON views
164216
│ ├── cache/ # Per-chunk extraction cache
165217
│ └── manifest.json # Build metadata
166-
├── api/ # Azure Function (SPARQL endpoint)
218+
├── api/ # Azure Function (SPARQL + NL query endpoints)
167219
├── app/ # Static web app
168220
├── tests/ # Test suite
169221
└── .github/workflows/
@@ -176,7 +228,7 @@ LIMIT 50
176228
| Decision | Choice | Rationale |
177229
|----------|--------|-----------|
178230
| LLM Provider | GitHub Models (free) | Zero cost, GITHUB_TOKEN auth |
179-
| LLM Model | `openai/gpt-4o-mini` | Best quality/limit ratio (150 req/day) |
231+
| NL→SPARQL | GPT-4o-mini + schema-injected few-shot | Same LLM as extraction; schema injection prevents hallucinated predicates |
180232
| SPARQL Engine | RDFLib | Pure Python, small footprint, built-in JSON-LD |
181233
| Validation | pySHACL | Standard W3C SHACL, works with RDFLib |
182234
| Batching | 3-5 chunks/request | Stay under 8K input token limit |

api/function_app.py

Lines changed: 76 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
"""Azure Function: SPARQL endpoint using RDFLib.
1+
"""Azure Function: SPARQL endpoint and natural language query interface.
22
33
Loads all .ttl files from the graph/articles/ directory into a combined
4-
RDFLib Dataset, then serves SPARQL queries via HTTP GET/POST.
4+
RDFLib Dataset, then serves SPARQL queries via HTTP GET/POST and
5+
natural language questions via the /api/ask endpoint.
56
"""
67

78
import json
@@ -13,6 +14,8 @@
1314
import rdflib
1415
from rdflib import Dataset, Graph
1516

17+
from .nl_to_sparql import translate, validate_sparql, enforce_safety
18+
1619
app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
1720

1821
# Module-level cache: load graph once per cold start
@@ -100,3 +103,74 @@ def sparql_endpoint(req: func.HttpRequest) -> func.HttpResponse:
100103
status_code=400,
101104
mimetype="application/json",
102105
)
106+
107+
108+
@app.route(route="ask", methods=["GET", "POST"])
109+
def ask_endpoint(req: func.HttpRequest) -> func.HttpResponse:
110+
"""Translate a natural language question to SPARQL and execute it."""
111+
# Extract question
112+
question = None
113+
if req.method == "GET":
114+
question = req.params.get("question")
115+
elif req.method == "POST":
116+
content_type = req.headers.get("Content-Type", "")
117+
if "application/json" in content_type:
118+
try:
119+
body = req.get_json()
120+
question = body.get("question")
121+
except ValueError:
122+
pass
123+
if not question:
124+
question = req.params.get("question")
125+
126+
if not question:
127+
return func.HttpResponse(
128+
json.dumps({"error": "Missing 'question' parameter"}),
129+
status_code=400,
130+
mimetype="application/json",
131+
)
132+
133+
# Translate to SPARQL
134+
sparql, error = translate(question)
135+
if error:
136+
status = 502 if "rate limit" in error.lower() or "api error" in error.lower() else 400
137+
return func.HttpResponse(
138+
json.dumps({"error": error, "question": question}),
139+
status_code=status,
140+
mimetype="application/json",
141+
headers={"Access-Control-Allow-Origin": "*"},
142+
)
143+
144+
# Execute query
145+
try:
146+
ds = _load_dataset()
147+
result = ds.query(sparql)
148+
serialized = result.serialize(format="json")
149+
if isinstance(serialized, bytes):
150+
serialized = serialized.decode("utf-8")
151+
results = json.loads(serialized)
152+
except Exception as e:
153+
logging.error(f"SPARQL execution error for NL query: {e}")
154+
return func.HttpResponse(
155+
json.dumps({
156+
"error": f"Query execution failed: {str(e)}",
157+
"question": question,
158+
"sparql": sparql,
159+
}),
160+
status_code=400,
161+
mimetype="application/json",
162+
headers={"Access-Control-Allow-Origin": "*"},
163+
)
164+
165+
return func.HttpResponse(
166+
json.dumps({
167+
"question": question,
168+
"sparql": sparql,
169+
"results": results,
170+
}),
171+
mimetype="application/json",
172+
headers={
173+
"Access-Control-Allow-Origin": "*",
174+
"Cache-Control": "public, max-age=300",
175+
},
176+
)

0 commit comments

Comments
 (0)