From 7479b340874401cd1ac905098117d364a759503f Mon Sep 17 00:00:00 2001 From: Matthew Middlehurst Date: Thu, 30 Mar 2023 15:34:58 +0100 Subject: [PATCH] 030323 meeting --- meeting_notes/03-03-2023.md | 133 ++++++++++++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 meeting_notes/03-03-2023.md diff --git a/meeting_notes/03-03-2023.md b/meeting_notes/03-03-2023.md new file mode 100644 index 0000000..efea2b3 --- /dev/null +++ b/meeting_notes/03-03-2023.md @@ -0,0 +1,133 @@ +# aeon meeting 3/3/23 +###### tags: `aeon` + +Suggested agenda/discussion points + +### 1. Fork: +name: public vote? final decision then pypi/conda/web domain/core package name/slack + +Decision about the name: `aeon`: + +- pypi: `aeon` +- python package: `aeon` +- github org: `aeon-toolbox`, `aeon-project`, `aeon-ml` +- github repo: `aeon` +- domain name: +- slack: + +__Summary__: Final decision made to use the name `aeon`, private poll to decide on the alternate name for github, domain name. + +### 2. Mission statement: +how do we differentiate? My initial text, but feel free to completely re-write! + +Hello sktime community, + +Eight of the core developers from the sktime project, including two community council members, are splitting from the project. The new repository will be a fork of the v0.16.0 sktime release. This decision was reached after much consideration and was influenced by various factors, including design decisions and interactions between developers. +We understand that some of the recent communication between sktime developers may have been perceived as hostile, and we apologize for any discomfort this may have caused. Our aim is to create a positive and collaborative environment for everyone involved in the project going forward. +Our new toolkit will be called `aeon`. +With `aeon`, we have an opportunity to refocus on our core values and make significant changes. While a lot will remain the same in the new project, the first release will have some breaking changes compared to sktime to meet our new design philosophy. We invite all members of the community to provide their input and share their ideas on how they would like to see the project evolve. We start with these principles: +a. Our governance structure will be simple and non-hierarchical. +b. Our code will lightweight and easy to maintain. +c. Our approach will be friendly and inclusive. + + +__Summary__: Still todo. Need to decide on governance, major design decisions (transformers). How much do we want link to and compare to sktime. Will this be posted with the first release on new social media? + +### 3. Transformers +the issues: + +3.1. input/output types: machine learning side wants numpy input/output not pd.multiindex +Nearly all of the panel transformers use nested_univ internal data type (so called X_inner_mtype). This should be changed to numpy, then extended to lists of numpy once introduced. + +__MM__: Probably make the switch for classification/regression/clustering usages i.e. ShapeletTransform, RandomIntervals. Some panel stuff is implemented for forecasting though? +__Tony__: Yes not proposing removing it, just allowing the tsml type classifiers to use numpy + +3.2. 2d numpy: Classification et al wants 2D numpy to represent [n_series][n_timepoints], forecasting wants 2D numpy to represent [n_timepoints][n_series] + +__MM__: I dont really see a way to solve thing other than one side giving up the dtype, or splitting back into 'panel' and 'series' or 'classification/regression/clustering' and 'forecasting' transformers. +If we want to be as compatible to sklearn as possible on the machine learning side, having 2d input accepted as a univariate dataset (shape n,1,m) is important. +__Tony__: The current model for doing it would be with a tag + +__Lukasz__: Before jumping to solutions, how about we look at bit closer at what problem specifically are we trying to address. I would really like to add at least what are are the candiate data represenations for learning tasks or algos to be able to find commonalities/differences before making a call. + +- could we define what an _observation_ is for different time series learning tasks? +- same vs different length series +- same vs different sampling frequencies +- ideally we would need a data container that is capable of mapping keys (identifiers) to sequences +- do we have specific examples where transformers are shared between different learning tasks i.e. classification & forecasting? +- could we require flag to transformers to specify what an observation it? +- for transformers could we have a common core tranformers and two interfaces: 1 for forecasting like problems, 2 for classiffication like problems? +- separate data converstion from format conversion + + +3.3. so called "Vectorization" .... + +__MM__: This could still be done using converters if we were to split transformations, but not implciitly anymore. + + +__Summary__: Still needs further discussion on actions to take. Seems to be generally agreed that the current implicit conversion of 2D data is not the design we want. A few options: +- Require a flag from the user telling transformers how to process 2D data when it is input +- Split ML and forecasting tasks by datatype, i.e. ML uses numpy and forecasting Pandas fully +- Split ML and forecasting tasks by transformer, i.e. have a separate package for both tasks and require explicit conversion between both + +### 4. Governance: evolution or revolution? +current roles: + +- Core developer +- Community council (CC) +- Code of conduct committee (CoCC) +- Algorithm maintainer (in codeowners file) + +__Lukasz__: I would be in favor of adjusting the bar for becoming a core dev to sth like: if you made a non-trivial contribution to the project you get write permission and become a commiter. I would very much like to have a technical team/commitee responsible for bigegr decisions regarding technical roadmap and strategy and providing guidance and coaching to the community members. It is also a good opportunity to discuss the responsibilities of CC since I'm not sure I understand what was the role before. We need to give someone admin/ownerhip of digital resources and ideally that group would make a committment to tenure to avoid gaps, who that should ideally be is up for discussion. Lastly I do not see value in specific role of algorithm maintainer, it creates territorial behaviors which might be counterproductive. I'm inerested in using CODEOWNERS file as a mechanism to "subscribe" to notification about specific module/file changes so that I can be in the loop for reviewing PRs and participaring in discussions. + +__Leo__: BDFL + +- [ ] ⚙️ Please put your thoughts in [this issue](https://github.com/scikit-time/scikit-time/issues/2) +- [ ] reach out to numfocus to get recommendations for gov + +__Summary__: Light discussion with no conclusion. Ideas for governance should be posted in the issue and generally agreed that running any decision by numfocus would be a good idea. + + +### 5. Project milestones + +6.1. renaming +6.2. release +6.3. pydata +6.4. numFOCUS + + +__Summary__: Agreed that applying for NumFocus affiliate status would be a good mid-term goal. + +## Not discussed (future topics): + + +6. Developer meetings + Start developer meeting if someone wants to call one (Tony to start with classification) + Do we want a roadmap for each module? e.g. +https://hackmd.io/7hzbOJGBRhSCJDEuZWs7kg + +__MW__: I think would be good to have a general community meeting (standup and get2know of 15 minutes) frequently to welcome new people. We could rotate in who is moderating the meeting. + +7. Docs +Tony: I'm not clear how it all gets generated, where it goes etc, some clarification would help + + 1. docs module: sphinx builds html pages and puts them on readthedocs, which is redirected to a website by a hook? All the files in docs are manually maintained + 2. docstrings: written to the api as determined by api_reference + 3. examples and docs/source/examples: examples directory manually maintained and only viewable through the source code. docs/source/examples autogenerated from docs. + 4. Root Dir: mark down files linked from readme + README.md + CODE_OF_CONDUCT.md + CODEOWNERS + CONTRIBUTING.md + ESTIMATOR_OVERVIEW.md + estimator_overview_table.md + GOVERNANCE.md + LICENCE + +8. First release? + +- when? what? versioning? + +9. Future topics + +- DL deps: tensorflow vs pytorch vs ... \ No newline at end of file