Skip to content

Conversation

micpst
Copy link
Collaborator

@micpst micpst commented Jul 10, 2024

This PR restructures benchmarks folder, integrates benchmarking pipeline with Hugging Face and adds new metrics for IQL evaluation - acc/precision/recall for decision making, IQL generation, parseability and correctness rates.

@micpst micpst added the refactor Code change that neither fixes a bug nor adds a feature label Jul 10, 2024
@micpst micpst self-assigned this Jul 10, 2024
Copy link

Trivy scanning results.

Copy link

github-actions bot commented Jul 10, 2024

badge

Code Coverage Summary

Filename                                                  Stmts    Miss  Cover    Missing
------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------------
dbally/_main.py                                              13       1  92.31%   10
dbally/_types.py                                              8       1  87.50%   24
dbally/exceptions.py                                          1       0  100.00%
dbally/assistants/base.py                                    24       0  100.00%
dbally/assistants/openai.py                                  59       2  96.61%   59-76
dbally/audit/event_tracker.py                                36       8  77.78%   38-40, 53, 64, 74, 91, 97
dbally/audit/events.py                                       34       0  100.00%
dbally/audit/spans.py                                         7       0  100.00%
dbally/audit/event_handlers/base.py                          15       0  100.00%
dbally/audit/event_handlers/buffer_event_handler.py           8       0  100.00%
dbally/audit/event_handlers/cli_event_handler.py             56      35  37.50%   11-13, 47-55, 65-66, 79-98, 120-127, 138-145
dbally/audit/event_handlers/langsmith_event_handler.py       29      25  13.79%   6-106
dbally/audit/event_handlers/otel_event_handler.py            74      27  63.51%   19, 123, 126, 138-139, 159-179, 191-199, 209-219, 222-223
dbally/collection/collection.py                             126       3  97.62%   136, 153, 325
dbally/collection/exceptions.py                              13       0  100.00%
dbally/collection/results.py                                 14       0  100.00%
dbally/embeddings/base.py                                     5       0  100.00%
dbally/embeddings/exceptions.py                              15       6  60.00%   10-11, 20, 29-30, 39
dbally/embeddings/litellm.py                                 28      12  57.14%   7-8, 44, 68-84
dbally/gradio/gradio_interface.py                           111     111  0.00%    1-301
dbally/iql/_exceptions.py                                    49       1  97.96%   74
dbally/iql/_processor.py                                     84       5  94.05%   20, 75, 81, 87, 102
dbally/iql/_query.py                                         17       1  94.12%   8
dbally/iql/_type_validators.py                               38       2  94.74%   24, 28
dbally/iql/syntax.py                                         36       9  75.00%   6-9, 27, 36, 60, 63-66
dbally/iql_generator/iql_generator.py                        31       2  93.55%   89-90
dbally/iql_generator/prompt.py                               16       1  93.75%   33
dbally/llms/base.py                                          28       1  96.43%   34
dbally/llms/litellm.py                                       24      10  58.33%   8-9, 48-54, 61, 78
dbally/llms/local.py                                         18      18  0.00%    1-60
dbally/llms/clients/base.py                                  23       2  91.30%   46-47
dbally/llms/clients/exceptions.py                            15       6  60.00%   10-11, 20, 29-30, 39
dbally/llms/clients/litellm.py                               44      21  52.27%   8-9, 65-71, 97-120
dbally/llms/clients/local.py                                 33      33  0.00%    1-95
dbally/nl_responder/nl_responder.py                          24       4  83.33%   74-85
dbally/nl_responder/prompts.py                               19       4  78.95%   64-67
dbally/prompt/elements.py                                    25       1  96.00%   29
dbally/prompt/template.py                                    65       7  89.23%   33, 41, 49, 110, 127, 153, 205
dbally/similarity/chroma_store.py                            37       0  100.00%
dbally/similarity/elastic_vector_search.py                   19      16  15.79%   5-102
dbally/similarity/elasticsearch_store.py                     22      19  13.64%   5-107
dbally/similarity/faiss_store.py                             38      35  7.90%    5-103
dbally/similarity/fetcher.py                                  5       0  100.00%
dbally/similarity/index.py                                   26       0  100.00%
dbally/similarity/sqlalchemy_base.py                         44      19  56.82%   35-37, 46, 68, 77, 86-89, 99-105, 123-126
dbally/similarity/store.py                                    7       0  100.00%
dbally/view_selection/base.py                                 7       0  100.00%
dbally/view_selection/llm_view_selector.py                   17       0  100.00%
dbally/view_selection/prompt.py                               9       0  100.00%
dbally/view_selection/random_view_selector.py                10      10  0.00%    1-36
dbally/views/base.py                                         16       1  93.75%   52
dbally/views/decorators.py                                    6       0  100.00%
dbally/views/exceptions.py                                    8       4  50.00%   23-26
dbally/views/exposed_functions.py                            33       1  96.97%   24
dbally/views/methods_base.py                                 34       2  94.12%   75, 83
dbally/views/pandas_base.py                                  33       1  96.97%   64
dbally/views/sqlalchemy_base.py                              37       7  81.08%   48, 63-65, 83-87
dbally/views/structured.py                                   46       4  91.30%   80-87
dbally/views/freeform/text2sql/config.py                     21       1  95.24%   47
dbally/views/freeform/text2sql/exceptions.py                  7       3  57.14%   12-14
dbally/views/freeform/text2sql/prompt.py                     11       0  100.00%
dbally/views/freeform/text2sql/view.py                       93      22  76.34%   49, 52, 55, 58, 61, 73, 151, 155-158, 161, 191, 206-207, 215, 228-233
dbally_cli/main.py                                            5       5  0.00%    1-13
dbally_cli/text2sql.py                                       94      94  0.00%    1-248
dbally_codegen/autodiscovery.py                             122      18  85.25%   54-57, 241-243, 264-278, 281-284, 353-354, 450-455
dbally_codegen/generator.py                                 175       7  96.00%   81, 91, 314, 342, 360, 374, 420
TOTAL                                                      2247     628  72.05%

Diff against main

Filename                                 Stmts    Miss  Cover
-------------------------------------  -------  ------  -------
dbally/iql_generator/iql_generator.py       +4      +2  -6.45%
dbally/views/exceptions.py                  +8      +4  +50.00%
dbally/views/structured.py                  +8      +4  -8.70%
TOTAL                                      +20     +10  -0.20%

Results for commit: cb70ee9

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@micpst micpst force-pushed the mp/refactor-benchmarks branch from 1731da4 to db2207c Compare July 10, 2024 08:32
@micpst micpst marked this pull request as ready for review July 24, 2024 08:56
Copy link
Member

@mhordynski mhordynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive work, thanks @micpst 🥇

@micpst micpst merged commit cd2cece into main Aug 6, 2024
3 checks passed
@micpst micpst deleted the mp/refactor-benchmarks branch August 6, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code change that neither fixes a bug nor adds a feature
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants