Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft cosmos data extractor #116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

jyuu
Copy link

@jyuu jyuu commented Sep 21, 2022

@Herman-Wu @scottveirs I finally had a cycle this morning to put together an extractor to pull the metadata from Cosmos. This uses the Microsoft.Azure.Cosmos library from NuGet to connect to the Cosmos DB SQL API account. Currently, it allows users to paginate through the results, but when I have another spare cycle, I can save the results to CSV (or another output type). Please let me know if these fields capture what is needed for training (or whether I should parse it down further).

Thank you!

string endpoint = "https://aifororcasmetadatastore.documents.azure.com:443/";
string key = "[INSERT PRIMARY KEY HERE]";

CosmosClient client = new CosmosClient(endpoint, key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use connection string instead of endpoint and key? If so, we don't need to hardcode endpoint.

Would also be nice to expose as flag (see below comment).

Database database = await client.CreateDatabaseIfNotExistsAsync("predictions");
Container container = await database.CreateContainerIfNotExistsAsync("metadata", "/source_guid");

string sql = "SELECT m.id, m.modelId, m.audioUri, m.imageUri, m.reviewed, m.timestamp, m.whaleFoundConfidence, m.location.id AS location_id, m.location.name AS location_name, m.location.longitude AS location_long, m.location.latitude AS location_lat, m.source_guid, p.id AS prediction_id, p.startTime AS prediction_startTime, p.duration AS prediction_duration, p.confidence AS prediction_confidence FROM metadata m JOIN p IN m.predictions";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add line breaks? :) See http://net-informations.com/q/faq/multilines.html for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also have a dumb question: what is "p"?

QueryDefinition query = new (sql);

QueryRequestOptions options = new ();
options.MaxItemCount = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have this set from an external flag? This would be useful if we distribute the tool as a binary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the developer is using Visual Studio, they can put the url and key in their User Secrets so that it does not go into the checked in code. Otherwise they will need to put it into appsettings.Development.json file and make sure that file does not get checked in as part of the pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a console application, hence my suggestion. User secrets/appsettings.Development.json are generally used for web apps.

@dthaler
Copy link
Collaborator

dthaler commented Sep 18, 2024

@pastorep what should we do with this PR now?

@dthaler
Copy link
Collaborator

dthaler commented Sep 18, 2024

@micya what should we do with this PR now?

@micowan
Copy link
Collaborator

micowan commented Sep 18, 2024

@dthaler. Don't think this will affect any of the existing UI/APIs. Might be worth holding onto for future reference since is proports to export the data in a CSV format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants