Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try optimizing file processing. #60

Open
tdresser opened this issue Jan 6, 2024 · 5 comments
Open

Try optimizing file processing. #60

tdresser opened this issue Jan 6, 2024 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@tdresser
Copy link

tdresser commented Jan 6, 2024

Parsing a 10MB xml file is pretty slow currently.

I don't have a good intuition for what part is slow. Do you think there might be some low hanging fruit here? Any hunches?

The first step would be profiling.

@tdresser tdresser changed the title Consider adopting a faster xml parser Try optimizing parsing. Jan 7, 2024
@dangowans
Copy link
Member

Thanks for reaching out @tdresser . You are likely right that any slow downs would come from the parser step.

I'm by no means attached to the current parser, xml2js. It was most likely picked because of the significant usage and the Typescript support.

If there's something faster out there, I'm definitely not opposed to switching things up.

dangowans referenced this issue in cityssm/node-green-button-parser Jan 8, 2024
file downloaded from from https://green-button.github.io/samples/
to test issue #40
@dangowans
Copy link
Member

So it looks like with a 26 MB test file, GitHub Actions takes 3.5s and my development machine takes 2.5s. Are your results similar?

@dangowans dangowans added enhancement New feature or request help wanted Extra attention is needed labels Jan 8, 2024
@tdresser
Copy link
Author

tdresser commented Jan 8, 2024

I think I'm using the term "Parsing" inaccurately here.

What I'm trying to refer to is "Time from file upload until it's done processing and I can see the data on the dashboard."
I don't think that's dominated by the XML to json conversion. I'm not sure what all ends up getting done, but it takes O(minutes) for me.

I guess there's a bunch of database writes in there?

Oh, that likely means this issue is on the wrong repo. I'm happy to move this over if there is anything worth looking at.

@dangowans
Copy link
Member

Ok, I think I understand now. I'm going to move this issue to the EMILE repository as the parser doesn't do the uploading, database writing, or visualizing.

@dangowans dangowans transferred this issue from cityssm/node-green-button-parser Jan 9, 2024
@tdresser tdresser changed the title Try optimizing parsing. Try optimizing file processing. Jan 13, 2024
@tdresser
Copy link
Author

Thanks for moving.

I've done a little bit of poking around, and I think I've got a fairly minimal change in mind that should improve things a bunch, assuming I'm not completely misunderstanding how this works!

The vast majority of the time (based on a couple performance traces) is spent in refreshAggregatedEnergyDataTables.

IIUC, this needs to be run after any asset is updated.
Today, we run this after every data point is added via addEnergyData.
I think when processing a file, we could do a single call to refreshAggregatedEnergyDataTables per assetID modified.

Does that make sense? I guess today you might be able to view data from a partially processed file, and this would eliminate that possibility, but on my machine, I think this would be a ~10x speedup.

Am I understanding this correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants