Skip to content

Conversation

@noah-paige
Copy link
Contributor

@noah-paige noah-paige commented Oct 14, 2024

Feature or Bugfix

  • Feature

Detail

  • Add Natural Language Querying Feature in data.all
    • Text To SQL Generation
    • Text Document Analysis

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and tested frontend and comments from yesterday

@noah-paige
Copy link
Contributor Author

Some example views from the UI:

Text To SQL:

Screenshot 2024-10-24 at 11 55 50 AM

Document Analyzer:

Screenshot 2024-10-24 at 11 55 33 AM

UserGuide Information in this PR:
documentation/userguide/docs/worksheets.md - changes in this PR

cc: @zsaltys @TejasRGitHub @anushka-singh

@noah-paige noah-paige linked an issue Oct 24, 2024 that may be closed by this pull request
@noah-paige
Copy link
Contributor Author

@dlpzx - added additional tests and believe I have addressed all comments

The one pending item is when testing the happy paths for both TextToSQL and analyzeTextDocument GQL operations - often the integration tests will fail with the following error:

An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

If you retry the test in isolation / after some time it will resolve itself. Looking at the service quota limits, Claude 3.5 Sonnet has a fairly low max invoke count per account per minute of 50 (note: we are still far under that per minute threshold but I think that or similar could be causing this throttling exception - other Claude 3 models have a much higher per minute tolerance of invocations)

Not sure if we should opt for a Claude 3 version for the time being to remediate this testing case and also to improve UX for a user using these features, thoughts?

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but needs to resolve conflicts

def __init__(self):
self._session = SessionHelper.get_session()
self._client = self._session.client('bedrock-runtime')
model_id = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am facing errors of the type: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-3-5-sonnet-20240620-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model. I think it is related to the enforced cross-region inference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you have faced something similar when using this model

# Conflicts:
#	backend/dataall/modules/s3_datasets/services/dataset_service.py
#	backend/dataall/modules/worksheets/api/resolvers.py
#	backend/dataall/modules/worksheets/services/worksheet_service.py
#	deploy/stacks/lambda_api.py
#	frontend/src/modules/Worksheets/views/WorksheetView.js
@petrkalos petrkalos force-pushed the feat/worksheet-nlq-beta branch from d42b9fc to 27e2b8b Compare October 10, 2025 09:54
petrkalos added a commit that referenced this pull request Oct 17, 2025
~~⚠️ merge after #1639~~ (cherry-picked the resource_thresholds feature)
### Feature or Bugfix
- Feature

### Detail
- Implements #1599 - see
design for full explanation

### Relates
- #1599

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Pelin Kuran <52086912+pelinKuran@users.noreply.github.com>
Co-authored-by: kalosp <kalosp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Natural Language Querying (NLQ) using genAI

4 participants