-
Notifications
You must be signed in to change notification settings - Fork 0
Qgenie Prompt testing #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>
|
From GHES (comment) by @qgeniecodeassistant[bot] Code AssistantReviewed Commits: cb7da87, efb34ea, 35d8fd8, 118100c, 7e8838f, 25236bb, b2dd328, be7511b, 04f1ad7, c75a637, ed965fd, c788f17, f4ff803, 44fe97b, a5056d7
Updated the correct code with updated syntax, removed device_group Signed-off-by: Sharvari Medhe smedhe@qti.qualcomm.com
Signed-off-by: Mohit Soni mohisoni@qti.qualcomm.com
📢 Expanded On-Device Sampling Support in QEfficientExcited to share that On-Device Sampling—previously available only ✅ Newly Supported Architectures:
|
| File | Lines Changed | Issues Found | Highest Severity |
|---|---|---|---|
.github/workflows/Mirror_Fork_PRs_to_GHES.yml |
+235 | 3 | High |
QEfficient/cloud/infer.py |
+12 | 0 | - |
QEfficient/generation/cloud_infer.py |
+4 | 1 | Medium |
QEfficient/generation/embedding_handler.py |
+367 | 2 | Medium |
QEfficient/generation/text_generation_inference.py |
+89 | 2 | Medium |
QEfficient/generation/vlm_generation.py |
+800 | 1 | Low |
QEfficient/transformers/models/*/modeling_*.py |
~2000+ | 0 | - |
| Various model files | Multiple | 0 | - |
Key Changes
- CCL Infrastructure: Added
comp_ctx_lengths_prefillandcomp_ctx_lengths_decodeparameters throughout the generation pipeline - Vision-Language Models: New
VisionHandlerandVisionLanguageGenerationclasses for VLM support - Model Architecture Updates: CCL support added to 20+ model architectures (Llama, Mistral, Gemma, etc.)
- GitHub Workflow: New workflow for mirroring fork PRs to GHES
Critical Issues Identified
- [SECURITY] Hardcoded credentials exposure in GitHub workflow (High)
- [FUNCTIONALITY] Missing error handling in vision processing (Medium)
- [PERFORMANCE] Inefficient session management in vision inference (Medium)
- [MAINTAINABILITY] Incomplete state tracking in generation classes (Medium)
[SECURITY] Hardcoded credentials and token exposure in GitHub workflow - High Severity
The GitHub workflow file contains multiple security vulnerabilities:
- Token length logging (line 207): Logs the length of sensitive tokens which can aid attackers
- Verbose error output (lines 215-217): Prints full API responses that may contain sensitive data
- Insufficient access control validation (lines 153-161): Simple string comparison for authorization without rate limiting
Security Risks:
- Token metadata exposure aids brute force attacks
- API response leakage may expose internal infrastructure details
- No protection against authorization bypass attempts
Fixed Code Snippet:
# Remove token length logging
RESP="$(curl -sS -H "Authorization: token ${GHES_PAT}" \
-H "Accept: application/vnd.github+json" \
"${API}/pulls?state=open&head=${GHES_OWNER}:${BRANCH}" \
-w "\n%{http_code}")"
# Don't log: echo "Token length: ${#GHES_PAT}"
HTTP_CODE="$(printf '%s\n' "$RESP" | tail -n1)"
JSON="$(printf '%s\n' "$RESP" | sed '$d')"
echo "HTTP_CODE=${HTTP_CODE}"
if [ "${HTTP_CODE}" != "200" ]; then
echo "Non-200 response from GHES pulls query"
# Don't echo full JSON: echo "$JSON"
exit 78
fi[FUNCTIONALITY] Missing error handling in vision session activation - Medium Severity
In QEfficient/generation/cloud_infer.py, the is_active flag is set but never used for validation. The code sets self.is_active = True after activation but doesn't check this flag before operations, potentially leading to operations on inactive sessions.
Problem:
- The
is_activeflag is introduced but not utilized for state validation - No checks prevent operations on deactivated sessions
- Could lead to runtime errors if session is used after deactivation
Fixed Code Snippet:
def __init__(self, ...):
# ... existing code ...
self.is_active = False
if activate:
self.activate()
self.is_active = True
def run(self, inputs):
if not self.is_active:
raise RuntimeError("Cannot run inference on inactive session. Call activate() first.")
# ... existing run logic ...
def deactivate(self):
if self.is_active:
# ... deactivation logic ...
self.is_active = False[PERFORMANCE] Inefficient session activation/deactivation in vision inference - Medium Severity
In QEfficient/generation/embedding_handler.py, the run_vision_inference method performs session activation/deactivation for every inference call. This creates unnecessary overhead, especially for batch processing or multiple sequential inferences.
Performance Impact:
- Session activation/deactivation on every call adds latency
- For batch processing, this overhead multiplies
- Resource allocation/deallocation overhead
Recommendation:
Implement session pooling or keep sessions active for the duration of a batch operation.
Fixed Code Snippet:
def run_vision_inference(self, vision_inputs: Dict[str, np.ndarray], keep_active: bool = False) -> Dict[str, np.ndarray]:
"""Execute vision model inference with optional session persistence
Args:
vision_inputs: Preprocessed vision inputs
keep_active: If True, keep session active after inference for subsequent calls
"""
if not self._vision_session:
raise ValueError("Vision session not available")
lang_was_active = False
try:
if self._lang_session and self._lang_session.is_active:
logger.debug("Deactivating language session before vision inference")
self._lang_session.deactivate()
lang_was_active = True
if not self._vision_session.is_active:
logger.debug("Activating vision session for inference")
self._vision_session.activate()
vision_outputs = self._vision_session.run(vision_inputs)
if not keep_active:
logger.debug("Deactivating vision session after inference")
self._vision_session.deactivate()
if lang_was_active and self._lang_session:
logger.debug("Reactivating language session after vision inference")
self._lang_session.activate()
return vision_outputs
except Exception as e:
# ... error handling ...[FUNCTIONALITY] Incomplete CCL initialization in decode phase - Medium Severity
In QEfficient/generation/text_generation_inference.py, the initialize_ccl method is defined but the CCL state is not properly maintained across decode iterations. The method recalculates ccl_id on every call without tracking previous state, potentially causing inconsistent context length management.
Issue:
- CCL ID calculation starts from
ccl_id_initial = 0on every call - No state persistence between decode iterations
- Could lead to incorrect context length selection during long sequences
Fixed Code Snippet:
def initialize_ccl(self, decode_inputs):
"""Initialize CCL with state tracking"""
if not hasattr(self, '_ccl_state'):
self._ccl_state = {
'list_of_comp_ctx_lengths': [np.zeros(length) for length in self.comp_ctx_lengths_decode],
'current_ccl_id': 0,
'max_ccl_id': len(self.comp_ctx_lengths_decode) - 1
}
max_position_id = np.max(decode_inputs["position_ids"])
# Update CCL ID based on current position
for i in range(self._ccl_state['current_ccl_id'], len(self.comp_ctx_lengths_decode)):
if max_position_id < self.comp_ctx_lengths_decode[i]:
self._ccl_state['current_ccl_id'] = i
break
return self._ccl_state['current_ccl_id'], self._ccl_state['max_ccl_id']** Version 1.3.6**
Help us improve!
How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful
How much time did it save you (in hours)? 0 | <1 | 1-3 | >4
Let us know your detailed feedback
Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/48#issuecomment-1010952
Signed-off-by: qraniumcitest <rmakar@qti.qualcomm.com>
|
From GHES (comment) by @qgeniecodeassistant[bot] Code AssistantReviewed commit: c285190 "Update compile.py Signed-off-by: qraniumcitest rmakar@qti.qualcomm.com" PR OverviewThis PR modifies the compilation script for QEfficient cloud compilation. The change appears to be incomplete and introduces a critical syntax error. Files Changed Summary
Critical Issues
ImpactThe current change breaks the entire compilation script and must be fixed before merging. The script will not execute in its current state. [FUNCTIONALITY] Incomplete import statement causes syntax error - CriticalLine 9 contains an incomplete The original code had a blank line at line 9, which has been replaced with Impact:
Fixed Code Snippet: import argparse
import QEfficientIf a specific module was intended to be imported, it should be added. Otherwise, the blank line should be restored to maintain the original functionality. ** Version 1.3.6** Help us improve! How useful was this code feedback? Not very useful 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ Very useful How much time did it save you (in hours)? 0 | <1 | 1-3 | >4 Let us know your detailed feedback Source: https://github.qualcomm.com/qranium/efficient-transformers/pull/48#issuecomment-1011030 |
No description provided.