-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Add schema validation and placeholders to index mappings #3240
[Enhancement] Add schema validation and placeholders to index mappings #3240
Conversation
…ad of string constants Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
…ums rather than use their own mappings Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
…holders Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
…h_index_mappings_from_files
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
…cter issue Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
public static String getMappingFromFile(String path) throws IOException { | ||
URL url = IndexUtils.class.getClassLoader().getResource(path); | ||
if (url == null) { | ||
throw new IOException("Resource not found: " + path); | ||
} | ||
|
||
String mapping = Resources.toString(url, Charsets.UTF_8).trim(); | ||
if (mapping.isEmpty() || !StringUtils.isJson(mapping)) { | ||
throw new IllegalArgumentException("Invalid or non-JSON mapping at: " + path); | ||
if (mapping.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have this check here considering we are going to check mapping.isBlank()
in the replacePlaceholders
method. Seems like redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added this check here since this method can be used on its own, incase used in other places, i believe it is safer to have this additional check
} | ||
|
||
String placeholderMapping = Resources.toString(url, Charsets.UTF_8); | ||
mapping = mapping.replace(placeholder.getKey(), placeholderMapping); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are creating string in every replace. May be we could use a StringBuilder to have in-place replacement?
May be something like this:
public static String replacePlaceholders(String mapping) throws IOException {
if (mapping == null || mapping.isBlank()) {
throw new IllegalArgumentException("Mapping cannot be null or empty");
}
// Preload resources into memory to avoid redundant I/O
Map<String, String> loadedPlaceholders = new HashMap<>();
for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
if (url == null) {
throw new IOException("Resource not found: " + placeholder.getValue());
}
// Load and cache the content
loadedPlaceholders.put(placeholder.getKey(), Resources.toString(url, Charsets.UTF_8));
}
// Use StringBuilder for efficient in-place replacements
StringBuilder result = new StringBuilder(mapping);
for (Map.Entry<String, String> entry : loadedPlaceholders.entrySet()) {
String placeholder = entry.getKey();
String replacement = entry.getValue();
// Replace all occurrences of the placeholder
int index;
while ((index = result.indexOf(placeholder)) != -1) {
result.replace(index, index + placeholder.length(), replacement);
}
}
return result.toString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, thanks for the suggestion, saves on the I/O and is more efficient!
@@ -336,4 +347,25 @@ public static JsonObject getJsonObjectFromString(String jsonString) { | |||
return JsonParser.parseString(jsonString).getAsJsonObject(); | |||
} | |||
|
|||
public static void validateSchema(String schemaString, String instanceString) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add try catch block to catch any exceptions like:
catch (JsonProcessingException e) {
throw new IllegalArgumentException("Invalid JSON format: " + e.getMessage(), e);
} catch (Exception e) {
throw new OpenSearchParseException("Schema validation failed: " + e.getMessage(), e);
}
|
||
// Validate JSON node against the schema | ||
Set<ValidationMessage> errors = schema.validate(jsonNode); | ||
if (!errors.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this:
if (!errors.isEmpty()) {
String errorMessage = errors.stream()
.map(ValidationMessage::getMessage)
.collect(Collectors.joining(", "));
throw new OpenSearchParseException(
"Validation failed: " + errorMessage +
" for instance: " + instanceString +
" with schema: " + schemaString
);
}
@@ -1,6 +1,6 @@ | |||
{ | |||
"_meta": { | |||
"schema_version": "1" | |||
"schema_version": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I see that JSON files in the repository are named using snake_case. Let's follow the same convention.
…tion handling Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
@dhrubo-os addressed review comments, please review! |
- "_meta_.schema_version" provided type is integer | ||
|
||
Note: Validation can be made more strict if a specific schema is defined for each index. | ||
* Checks if mapping is a valid json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit] if you want to be fancy :D
/**
* Checks if the provided mapping is a valid JSON and validates it against a schema.
*
* <p>The schema is located at <code>mappings/schema.json</code> and enforces the following validations:</p>
*
* <ul>
* <li>Mandatory fields:
* <ul>
* <li><code>_meta</code></li>
* <li><code>_meta.schema_version</code></li>
* <li><code>properties</code></li>
* </ul>
* </li>
* <li>No additional fields are allowed at the root level.</li>
* <li>No additional fields are allowed in the <code>_meta</code> object.</li>
* <li><code>properties</code> must be an object type.</li>
* <li><code>_meta</code> must be an object type.</li>
* <li><code>_meta.schema_version</code> must be an integer.</li>
* </ul>
*
* <p><strong>Note:</strong> Validation can be made stricter if a specific schema is defined for each index.</p>
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks neat, ive added it
Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
#3240) * feat(index mappings): fetch mappings and version from json file instead of string constants Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: changing exception being thrown Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: remove unused file Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: fix typo in comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: adding new line at the end of files Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * feat: add test cases Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: remove test code Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix(test): in main the versions were not updated appropriately Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: move mapping templates under common module Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: update comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * feat: add enhancements to validate index schema and allow using placeholders Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: modifying comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * test: adding testcase for MLIndex to catch failures before runtime Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: rename dir from mappings to index-mappings Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: add null checks Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: modify mappin paths for placeholders Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: adding dependencies for testing Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix(test): compare json object rather than strings to avoid eol character issue Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: combine if statements into single check Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactoring: null handling + clean code Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * spotless apply Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * tests: adding more UT Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: dependencies to handle jarhell Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * spotless apply Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: add header and use single instance of mapper Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fixed: doc syntax Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: renamed files, efficient loading of resources, better exception handling Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: cleaner comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> --------- Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> (cherry picked from commit 374cfd5)
opensearch-project#3240) * feat(index mappings): fetch mappings and version from json file instead of string constants Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: changing exception being thrown Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: remove unused file Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: fix typo in comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * chore: adding new line at the end of files Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * feat: add test cases Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: remove test code Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix(test): in main the versions were not updated appropriately Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: move mapping templates under common module Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: update comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * feat: add enhancements to validate index schema and allow using placeholders Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: modifying comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * test: adding testcase for MLIndex to catch failures before runtime Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: rename dir from mappings to index-mappings Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: add null checks Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: modify mappin paths for placeholders Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: adding dependencies for testing Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix(test): compare json object rather than strings to avoid eol character issue Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: combine if statements into single check Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactoring: null handling + clean code Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * spotless apply Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * tests: adding more UT Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fix: dependencies to handle jarhell Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * spotless apply Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: add header and use single instance of mapper Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * fixed: doc syntax Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: renamed files, efficient loading of resources, better exception handling Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> * refactor: cleaner comment Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com> --------- Signed-off-by: Pavan Yekbote <mail2pavanyekbote@gmail.com>
Description
Added schema validation and placeholders to index mappings
Related Issues
Resolves #2951
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.