Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework query handling #184

Merged
merged 47 commits into from
Apr 11, 2023
Merged

Rework query handling #184

merged 47 commits into from
Apr 11, 2023

Conversation

frensing
Copy link
Contributor

@frensing frensing commented Oct 31, 2022

Rework of the query handling

Each Worker gets their own QueryHandler.

The updated config will have for example the following structure:

...
workers:
  - className: "WorkerName"
    queries:
      location: "path/to/file"
      format: "one-per-line"
      caching: true
      order: linear
      pattern:
        endpoint: "http://localhost:3030/sparql"
        outputFolder: "queryCache"
        limit: 10
      lang: "lang.SPARQL"
...

QueryHandler

Each QueryHandler has:

  • location of the query file or folder containing the query files
  • QuerySet, containing all the queries from the file (or folder)
  • QuerySelector, which generates the index of the next query
  • langProcessor to generate TripleStats
  • pattern to generate queries from pattern queries

QuerySet

The QuerySet is either in-memory or file-based.

  • InMemoryQuerySet loads all the queries into Strings in memory when initializing.
  • FileBasedQuerySet retrieves a query directly from the file when its requested.

The config option caching can be set to true for in-memory or false for file-based.

Each QuerySet has a QuerySource from which the queries are read.

QuerySource

A QuerySource is the wrapper for the handling of the query files.
3 different QuerySources are implemented:

  • FileLineQuerySource expects a query file with one query per line
  • FileSeparatorQuerySource expects a file with (multi-line) queries separated by a separator line. Default separator line is "###"
  • FolderQuerySource expects a directory with query files that each contain one (multi-line) query

QuerySelector

A QuerySelector is basically a number generator giving the next index of a query to load.
2 QuerySelectors are implemented:

  • LinearQuerySelector which gives each index in ascending order, restarting at 0 when reaching the last one.
  • RandomQuerySelector uses java.util.Random to generate the next index. The seed is either provided in the config or the workerID is used

TODO

  • update documentation
  • update javadoc
  • update IGUANA schema

@frensing frensing requested a review from bigerl October 31, 2022 13:56
@frensing frensing marked this pull request as ready for review November 4, 2022 19:12
@frensing
Copy link
Contributor Author

frensing commented Nov 4, 2022

All prior functionality is now in the new QueryHandler.
All prior test cases have been updated where it made sense and run successfully.

Next step is to update the documentations

Copy link
Member

@bigerl bigerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. Looks very good. I've pointed out only some minor things and added some questions here and there.

As you wrote, todos left are:

  • JavaDoc
  • update the Documentation

@bigerl bigerl linked an issue Mar 8, 2023 that may be closed by this pull request
@bigerl bigerl linked an issue Mar 22, 2023 that may be closed by this pull request
@nck-mlcnv
Copy link
Contributor

nck-mlcnv commented Mar 22, 2023

Things left to do:

  • update development section of documentation
  • update the README.md file

@nck-mlcnv nck-mlcnv self-assigned this Apr 3, 2023
Copy link
Member

@bigerl bigerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation looks very good and the code around query handling, also. In general, there still seems to be a long way to go.

-[ ] Besides the comments please review if we really need interfaces+abstract classes for various abstractions. It seems to me that there is no real value in the interface and that the interface could be merged into the abstract class.

Copy link
Member

@bigerl bigerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nck-mlcnv

  • Was SPARQLWorker removed from the Documentation?
  • Please double check that the removed abstractions are not referenced in the documentation.
  • update documentation to reflect renaming QueryList and getQuery

Please check the boxes here when done.

@bigerl bigerl merged commit efd02d5 into develop Apr 11, 2023
@bigerl bigerl deleted the feature/rework-query-handling branch April 11, 2023 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for non-random query chooser UPDATEWorker does not initialize
3 participants