-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifications made in response to suggestions 1 and 3 raised by Mr. Daniel during the TDML meeting #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make pytorch related dependencies optional, I replaced requirements.txt with pyproject.toml, and I believe pyproject will perform better in this regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the way to go. pyproject.toml offers more options and more modern tools are using it as the default option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the basic configuration such as version number in setup.py so that I can obtain the version from .bunmversion.cfg, and I am wondering if it is appropriate to include TensorFlow related dependencies in optional dependencies as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tensorflow is a big package. pandas is also big, but not that big (if I remember correctly, tensorflow will add 1-2 Gb to the installed dependencies). But we would need to see if the basic functionality is using pandas or not. If not, leaving to optional it would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, it seems that the entire project did not use pandas. Perhaps I can delete it directly in the next pull request...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous update, I made some modifications to the names of some classes. This are updaptes and adjustments for all files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although by definition, these two parameters have default values. But if these default values are not included in the user's dataset, an error will occur in the test, just like WHU-building.json. I think it might be more reasonable to set it to None here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. What is the standard specifying in here? (this is more a note for me to check next week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the standard:
"confidence": {
"type": "number",
"default": 1.0,
"minimum": 0.0,
"maximum": 1.0
},
"isNegative": {
"type": "boolean",
"default": false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the standard is defining a default value, so setting the values to None
will make the implementation not according with the standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will change it to the previous form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think we need to modify the JSON schemas case because these two default values are not included in the examples, such as WHU-building.json, which would cause our test to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this seems unreasonable because by definition isNegative and confidence are not mandatory fields. But they also have default values. The definition here seems somewhat contradictory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the definition, the type of extension is EX-Extent. But in practical use, it may be necessary to refer to WHU-building.json to preserve the usage of simple arrays with multiple coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note to check next week, but I remember that when I was making a more deep review, there was something about the definition of extent
in the standard version 1.0 that is really difficult to model with pydantic. The changes in the version 1.1 made the pydantic code straightforward, but I don't recall it it was this particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I checked the version 1.1:
"extent": {
"oneOf": [
{
"type": "array",
"items": {
"type": "number"
},
"minItems": 4
},
{
"$ref": "https://raw.githubusercontent.com/opengeospatial/TrainingDML-AI_SWG/main/schemas/1.1/json_schema/ex_extent.json"
}
]
}
So I think the modification is right here.
As for the if branch, I have found that once Union is involved in the class definition, pydantic often cannot correctly implement the function, so there is such a code logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the definition in version 1.1 and the one proposed in this PR, but as far as I know, the current implementation of the pytdml package should comply with the version 1.0 of the standard.
In this particular case, I am in doubt on how to proceed. The relation between EX_Extent
and EOTrainingDataset
in 1.0 and 1.1 are equivalent in practice, but coding the relation as it is defined in 1.0 would generate quite a lot of complexity, as defining different type roots in a pydantic class is not straightforward. We might need to ask about this in the working group, but I am inclined to model this after 1.1 (only for this relation) and add a comment indicating why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the current implementation of the pytdml package should comply with the version 1.0 of the standard.
Perhaps I can find a time to ask my teacher or bring it up at the next meeting. But I think this project is basically led by both of us independently. So I think just add a comment indicating why is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some formatting issues with this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modifications here ensure that the workflow is configured correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe these are related to pydantic 2.0 👍
One thing that would need to be solved is the conflict in the files, this is due to the fork not being updated. https://github.com/Relifest/pytdml needs to be rebased with the last changes in https://github.com/openrsgis/pytdml/ before being able to merge |
I have already resolved it. In addition, I have set a rule for the master branch that requires a review before merging. You can check and see if there are any other rules that need to be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is lovely to see the green check in the tests, congratulations!
One question that might not be related with this Pull Request I have: how do we know the difference between basic types and extended types? I don't recall seeing the difference stated in the standard spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the way to go. pyproject.toml offers more options and more modern tools are using it as the default option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tensorflow is a big package. pandas is also big, but not that big (if I remember correctly, tensorflow will add 1-2 Gb to the installed dependencies). But we would need to see if the basic functionality is using pandas or not. If not, leaving to optional it would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. What is the standard specifying in here? (this is more a note for me to check next week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note to check next week, but I remember that when I was making a more deep review, there was something about the definition of extent
in the standard version 1.0 that is really difficult to model with pydantic. The changes in the version 1.1 made the pydantic code straightforward, but I don't recall it it was this particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe these are related to pydantic 2.0 👍
The extranted_types here mainly refer to the extension of classes in basic_types, involving the extension of functionality or the division of functional boundaries. On the other hand, the classes within basic_types that have inheritance relationships with each other are mainly defined iterations. This does not seem to clearly indicate the difference between the two in the standard. It should just be an idea generated during the code implementation process at that time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in the second review, I have important comments that should have been addressed before merging. The current merged code is not compatible with the standard definition.
One is about the relation between EX_Extent
and EOTrainingDataset
; the other is about default values no longer being taken into consideration after merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the definition in version 1.1 and the one proposed in this PR, but as far as I know, the current implementation of the pytdml package should comply with the version 1.0 of the standard.
In this particular case, I am in doubt on how to proceed. The relation between EX_Extent
and EOTrainingDataset
in 1.0 and 1.1 are equivalent in practice, but coding the relation as it is defined in 1.0 would generate quite a lot of complexity, as defining different type roots in a pydantic class is not straightforward. We might need to ask about this in the working group, but I am inclined to model this after 1.1 (only for this relation) and add a comment indicating why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the standard is defining a default value, so setting the values to None
will make the implementation not according with the standard
This modification mainly involves The structure of the package and separation of concerns和Continuous integration and configuring the GitHub actions and flows. I have already sent an email to Daniel last week regarding the specific implementation process. There are still something that may be not suitable and need to be modified, and I hope to receive discussion and review. Thanks very much.