-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component name validation #1794
Component name validation #1794
Conversation
…ases for the same.
…ases for the same.
WalkthroughThis pull request introduces input validation in the MongoProcessor class by adding checks that ensure entity names (such as action names, intent names, synonyms, lookups, and slots) contain only letters, numbers, and underscores. When an invalid name is provided, an AppException is raised with standardized error messages. Complementary unit tests have been added to verify that the proper exceptions are raised for various invalid input cases. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant M as MongoProcessor
participant V as Validator
U->>+M: Call add_action(name, bot, user, ...)
M->>+V: Validate action name format
V-->>-M: Return validation result
alt Name valid
M-->>U: Process and add action
else Name invalid
M-->>U: Raise AppException with error message
end
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
kairon/shared/data/processor.py (1)
1531-1532
: Extract validation logic into a reusable method.The special character validation is repeated across multiple methods. Consider extracting this into a reusable validation method to improve maintainability and reduce code duplication.
# Add to the MongoProcessor class: +def validate_component_name(self, name: Text, component_type: Text): + """ + Validates component name for special characters. + + :param name: Name to validate + :param component_type: Type of component (e.g., "Intent", "Action", etc.) + :raises: AppException if validation fails + """ + if name and Utility.special_match(name): + raise AppException("Invalid name! Only letters, numbers, and underscores (_) are allowed.") # Then use it in methods like: -if text and Utility.special_match(text): - raise AppException("Invalid name! Only letters, numbers, and underscores (_) are allowed.") +self.validate_component_name(text, "Intent")
🧹 Nitpick comments (6)
tests/unit_test/data_processor/data_processor2_test.py (2)
51-285
: Add positive test cases for component name validation.While the negative test cases thoroughly verify that invalid names are rejected, it would be beneficial to add positive test cases to ensure that valid names (containing letters, numbers, and underscores) are accepted successfully.
Consider adding test cases like:
def test_add_intent_with_valid_name(): processor = MongoProcessor() processor.add_intent("greeting_message_123", "tests", "testUser", is_integration=False) # Add assertions to verify the intent was created successfully def test_add_utterance_with_valid_name(): processor = MongoProcessor() processor.add_utterance_name("test_add_123", "test", "testUser") # Add assertions to verify the utterance was created successfully
287-531
: Optimize test data for action name validation tests.The test cases include complete action configurations, but since we're only testing name validation, we can simplify the test data to focus on the name field. This would make the tests more maintainable and clearer in their intent.
For example, instead of:
def test_add_email_action_with_invalid_name(): processor = MongoProcessor() config = { "action_name": "email~config", "smtp_url": "test.test.com", "smtp_port": 25, "smtp_userid": None, "smtp_password": {"value": "test"}, "from_email": {"value": "from_email", "parameter_type": "slot"}, "to_email": {"value": ["test@test.com"], "parameter_type": "value"}, "subject": "Test Subject", "response": "Test Response", "tls": False, } with pytest.raises(AppException, match=re.escape("Invalid name! Only letters, numbers, and underscores (_) are allowed.")): processor.add_email_action(config, user, bot)Consider:
def test_add_email_action_with_invalid_name(): processor = MongoProcessor() config = { "action_name": "email~config", # Add only required fields for the test } with pytest.raises(AppException, match=re.escape("Invalid name! Only letters, numbers, and underscores (_) are allowed.")): processor.add_email_action(config, user, bot)kairon/shared/data/processor.py (4)
178-1007
: Consider splitting MongoProcessor into smaller, focused modules.The MongoProcessor class is quite large (8800+ lines) and handles multiple responsibilities. Consider splitting it into smaller, focused modules based on functionality:
- IntentProcessor: For intent-related operations
- ActionProcessor: For action-related operations
- SlotProcessor: For slot-related operations
- ResponseProcessor: For response-related operations
- ValidationProcessor: For validation logic
This would improve maintainability, testability and make the code easier to understand.
🧰 Tools
🪛 Ruff (0.8.2)
311-314: Return the condition
config.get("dynamic_url_slot_name") == slot
directlyReplace with
return config.get("dynamic_url_slot_name") == slot
(SIM103)
324-328: Use
return any(set_slot.get("name") == slot for set_slot in set_slots)
instead offor
loopReplace with
return any(set_slot.get("name") == slot for set_slot in set_slots)
(SIM110)
338-341: Return the condition directly
Inline condition
(SIM103)
352-356: Use
return any(MongoProcessor.is_slot_in_field(note, slot) for note in notes)
instead offor
loopReplace with
return any(MongoProcessor.is_slot_in_field(note, slot) for note in notes)
(SIM110)
361-364: Return the condition
set_slot == slot
directlyReplace with
return set_slot == slot
(SIM103)
378-382: Use
return any(set_slot.get("name") == slot for set_slot in set_slots)
instead offor
loopReplace with
return any(set_slot.get("name") == slot for set_slot in set_slots)
(SIM110)
397-401: Use
return any(value == slot for value in metadata.values())
instead offor
loopReplace with
return any(value == slot for value in metadata.values())
(SIM110)
410-413: Return the condition
config.get("set_slot") == slot
directlyReplace with
return config.get("set_slot") == slot
(SIM103)
418-421: Use
return any(set_slot.get("name") == slot for set_slot in set_slots)
instead offor
loopReplace with
return any(set_slot.get("name") == slot for set_slot in set_slots)
(SIM110)
453-457: Use
return any(set_slot.get("name") == slot for set_slot in set_slots)
instead offor
loopReplace with
return any(set_slot.get("name") == slot for set_slot in set_slots)
(SIM110)
498-498:
kairon.importer.validator.file_validator.TrainingDataValidator
imported but unusedRemove unused import:
kairon.importer.validator.file_validator.TrainingDataValidator
(F401)
553-553: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
569-569: Do not perform function call
REQUIREMENTS.copy
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
679-682: Use ternary operator
type_value = "text" if not metadata else "json"
instead ofif
-else
-blockReplace
if
-else
-block withtype_value = "text" if not metadata else "json"
(SIM108)
725-725: Do not perform function call
REQUIREMENTS.copy
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
806-806: Local variable
actions
is assigned to but never usedRemove assignment to unused variable
actions
(F841)
1531-2548
: Add validation for empty or whitespace-only names.The special character validation should also check for empty or whitespace-only names to prevent invalid data. Consider adding this check to the validation logic.
+def validate_component_name(self, name: Text, component_type: Text): + """ + Validates component name for special characters and empty values. + + :param name: Name to validate + :param component_type: Type of component (e.g., "Intent", "Action", etc.) + :raises: AppException if validation fails + """ + if Utility.check_empty_string(name): + raise AppException(f"{component_type} name cannot be empty or blank spaces") + if Utility.special_match(name): + raise AppException("Invalid name! Only letters, numbers, and underscores (_) are allowed.")🧰 Tools
🪛 Ruff (0.8.2)
1774-1774: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
2162-2162: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
2329-2332: Use a single
if
statement instead of nestedif
statements(SIM102)
8034-8037
: Use regex pattern constant for FAQ intent name generation.The FAQ intent name generation uses string manipulation that could be improved with a regex pattern constant.
+FAQ_INTENT_NAME_PATTERN = re.compile(r'[^a-zA-Z0-9_]') -intent = re.sub(r"[^a-zA-Z0-9_]", "_", intent) if Utility.special_match(intent) else intent +intent = FAQ_INTENT_NAME_PATTERN.sub("_", intent) if Utility.special_match(intent) else intent
1531-1532
: Standardize error messages across validation checks.The error messages for special character validation are consistent but could be improved by including the component type in the message.
+def validate_component_name(self, name: Text, component_type: Text): + """ + Validates component name for special characters and empty values. + + :param name: Name to validate + :param component_type: Type of component (e.g., "Intent", "Action", etc.) + :raises: AppException if validation fails + """ + if Utility.check_empty_string(name): + raise AppException(f"{component_type} name cannot be empty or blank spaces") + if Utility.special_match(name): + raise AppException(f"Invalid {component_type.lower()} name! Only letters, numbers, and underscores (_) are allowed.")Also applies to: 4193-4194, 4295-4296, 4421-4422, 4570-4571, 4613-4614, 4761-4762, 4802-4803, 5801-5802, 7151-7152, 7391-7392, 7713-7714, 8392-8393, 8507-8508
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
kairon/shared/data/processor.py
(23 hunks)tests/unit_test/data_processor/data_processor2_test.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: Analyze (python)
- GitHub Check: Python CI
🔇 Additional comments (1)
tests/unit_test/data_processor/data_processor2_test.py (1)
534-787
: LGTM! Well-structured test cases for data validation.The test cases for data format correction and validation are comprehensive, covering:
- Data type validation for strings, integers, and floats
- Handling of missing columns
- Validation of metadata and payload
- Edge cases like int values in float fields
The test data and assertions effectively verify the expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed
Added validations for all components names and added and fixed test cases for the same.
Summary by CodeRabbit