Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReIndex blob data files directly from Azure #630

Open
professorDante opened this issue Sep 14, 2023 · 3 comments
Open

ReIndex blob data files directly from Azure #630

professorDante opened this issue Sep 14, 2023 · 3 comments
Labels
enhancement New feature or request ingestion

Comments

@professorDante
Copy link

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Place new files into the Blob storage from the resource group directly in Azure, and be able to reindex vector DB.

Any log messages given by the failure

Expected/desired behavior

see above

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
N/A

azd version?

run azd version and copy paste here.
N/A

Versions

Mention any other details that might be useful

It would be most useful to be able to reindex the vector DB if you have pushed new files into the Blob. You can do this from the prepdocs.py script, if you've done this locally.
Screen Shot 2023-09-13 at 5 16 00 PM

However, say we have Azure files and wish to add them to the index, once we drag and drop into the Blob, how do we reindex that proceedure?


Thanks! We'll be in touch soon.

@kibnelbachyr
Copy link

I am also interested if there is any option to do this from the Azure Portal.

@pamelafox
Copy link
Collaborator

pamelafox commented Sep 28, 2023

I think the approach for this would be to use an Azure Function with a blob trigger, and then use prepdocs.py to process that blob trigger.

For example, here's what the function code might start with:

@app.blob_trigger(arg_name="doc", path="unchunked", connection="AzureWebJobsStorage")
def create_chunks(doc: func.InputStream):
    return asyncio.run(create_chunks_async(doc))

You would need to call prepdocs from that code. It'd take some custom coding and provisioning, but should be possible.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

@github-actions github-actions bot added the Stale label Nov 28, 2023
@pamelafox pamelafox added enhancement New feature or request ingestion labels Nov 29, 2023
@github-actions github-actions bot removed the Stale label Nov 30, 2023
ratkinsoncinz pushed a commit to cinzlab/azure-search-openai-demo that referenced this issue Oct 6, 2024
Co-authored-by: Ian Seabock (Centific Technologies Inc) <v-ianseabock@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ingestion
Projects
None yet
Development

No branches or pull requests

3 participants