Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTREE indexes to be case sensitive and GIN are case insensitive #335

Open
wakibi opened this issue Sep 21, 2020 · 18 comments
Open

BTREE indexes to be case sensitive and GIN are case insensitive #335

wakibi opened this issue Sep 21, 2020 · 18 comments

Comments

@wakibi
Copy link
Contributor

wakibi commented Sep 21, 2020

By their nature, BTREE indexes are case sensitive and that should be included in our documentation. Removing the case sensitivity on BTREEs removes the performance gains they were used in the first place. Users need made be aware of this

@stale
Copy link

stale bot commented Feb 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Feb 17, 2021
@stale stale bot closed this as completed Feb 24, 2021
@edwinmp edwinmp reopened this Feb 24, 2021
@stale stale bot removed the wontfix This will not be worked on label Feb 24, 2021
@stale
Copy link

stale bot commented Apr 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 26, 2021
@k8hughes
Copy link
Collaborator

I think we still need this.

@stale stale bot removed the wontfix This will not be worked on label Apr 26, 2021
@edwinmp edwinmp added pinned and removed pinned labels Apr 26, 2021
@k8hughes
Copy link
Collaborator

To be able to work on this we'd need to know the collumns that are most searched by for it to be efficent.

@stale
Copy link

stale bot commented Jun 27, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jun 27, 2021
@stale stale bot closed this as completed Jul 4, 2021
@edwinmp edwinmp reopened this Jul 4, 2021
@edwinmp edwinmp added pinned and removed wontfix This will not be worked on labels Jul 4, 2021
@wakibi
Copy link
Contributor Author

wakibi commented Jul 9, 2021

There is a blocker here, some of the fields to be indexed are rather too long and the DB can not index them. Suggestions on how to handle this are much welcome @akmiller01 @dean-breed @bill-anderson

@bill-anderson
Copy link

You don't want BTREE indexes on titles and descriptions, do you? In my old money (of my particular old currency) we had cross-reference indexes for free text fields. Is this GiST in new money?

@wakibi
Copy link
Contributor Author

wakibi commented Jul 9, 2021

That's right @bill-anderson. GiST and GIN are to be used for those extremely long text columns, but they too have a max string length (which is about 8000 characters) they can be applied to. Unfortunately, some of our data exceeds that length!

@bill-anderson
Copy link

So what I think we should do is move this into an ETL discussion to limit all free-text fields to, say, 5000 characters. I don't think that will affect the integrity of the data. But we should do a bit of research on the long culprits.

@dean-breed
Copy link
Collaborator

@wakibi Would it be possible to get a list of reporting organisations and columns which would be truncated?

@dean-breed
Copy link
Collaborator

@wakibi Can I get this list?

@edwinmp
Copy link
Collaborator

edwinmp commented Apr 21, 2022

@dean-breed sorry Chris is on leave ... let me stick this on Slack where he can't miss it.

@wakibi
Copy link
Contributor Author

wakibi commented Apr 25, 2022

@dean-breed Find attached. I have added all those with more than 7500 (my reasoning is those approaching that figure could easily soon hit the 8000 ceiling as well).
long_columns.csv

@dean-breed
Copy link
Collaborator

@wakibi - I could do something on this myself but to avoid mistake likelihood. Could you output the reporting org reference also?

@wakibi
Copy link
Contributor Author

wakibi commented Apr 25, 2022

@wakibi - I could do something on this myself but to avoid mistake likelihood. Could you output the reporting org reference also?

long_columns.csv

Attached. I had forgotten to format the first one.

@dean-breed
Copy link
Collaborator

Hi Chris @wakibi, is it possible to output this list as part of the ETL process? I could then include the information in the data quality file?

@wakibi
Copy link
Contributor Author

wakibi commented May 3, 2022

Hi @dean-breed, do you want it sent as an email attachment?

@dean-breed
Copy link
Collaborator

That would be great if it could. Or just stored in a repo somewhere. Either works @wakibi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants