Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifications made in response to suggestions 1 and 3 raised by Mr. Daniel during the TDML meeting #21

Merged
merged 15 commits into from
Oct 30, 2024

Conversation

Relifest
Copy link
Collaborator

This modification mainly involves The structure of the package and separation of concerns和Continuous integration and configuring the GitHub actions and flows. I have already sent an email to Daniel last week regarding the specific implementation process. There are still something that may be not suitable and need to be modified, and I hope to receive discussion and review. Thanks very much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make pytorch related dependencies optional, I replaced requirements.txt with pyproject.toml, and I believe pyproject will perform better in this regard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the way to go. pyproject.toml offers more options and more modern tools are using it as the default option

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the basic configuration such as version number in setup.py so that I can obtain the version from .bunmversion.cfg, and I am wondering if it is appropriate to include TensorFlow related dependencies in optional dependencies as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tensorflow is a big package. pandas is also big, but not that big (if I remember correctly, tensorflow will add 1-2 Gb to the installed dependencies). But we would need to see if the basic functionality is using pandas or not. If not, leaving to optional it would be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, it seems that the entire project did not use pandas. Perhaps I can delete it directly in the next pull request...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous update, I made some modifications to the names of some classes. This are updaptes and adjustments for all files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although by definition, these two parameters have default values. But if these default values are not included in the user's dataset, an error will occur in the test, just like WHU-building.json. I think it might be more reasonable to set it to None here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. What is the standard specifying in here? (this is more a note for me to check next week)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the standard:
"confidence": {
"type": "number",
"default": 1.0,
"minimum": 0.0,
"maximum": 1.0
},
"isNegative": {
"type": "boolean",
"default": false
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the standard is defining a default value, so setting the values to None will make the implementation not according with the standard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will change it to the previous form.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think we need to modify the JSON schemas case because these two default values are not included in the examples, such as WHU-building.json, which would cause our test to fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this seems unreasonable because by definition isNegative and confidence are not mandatory fields. But they also have default values. The definition here seems somewhat contradictory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the definition, the type of extension is EX-Extent. But in practical use, it may be necessary to refer to WHU-building.json to preserve the usage of simple arrays with multiple coordinates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note to check next week, but I remember that when I was making a more deep review, there was something about the definition of extent in the standard version 1.0 that is really difficult to model with pydantic. The changes in the version 1.1 made the pydantic code straightforward, but I don't recall it it was this particular case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I checked the version 1.1:
"extent": {
"oneOf": [
{
"type": "array",
"items": {
"type": "number"
},
"minItems": 4
},
{
"$ref": "https://raw.githubusercontent.com/opengeospatial/TrainingDML-AI_SWG/main/schemas/1.1/json_schema/ex_extent.json"
}
]
}
So I think the modification is right here.
As for the if branch, I have found that once Union is involved in the class definition, pydantic often cannot correctly implement the function, so there is such a code logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the definition in version 1.1 and the one proposed in this PR, but as far as I know, the current implementation of the pytdml package should comply with the version 1.0 of the standard.

In this particular case, I am in doubt on how to proceed. The relation between EX_Extent and EOTrainingDataset in 1.0 and 1.1 are equivalent in practice, but coding the relation as it is defined in 1.0 would generate quite a lot of complexity, as defining different type roots in a pydantic class is not straightforward. We might need to ask about this in the working group, but I am inclined to model this after 1.1 (only for this relation) and add a comment indicating why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current implementation of the pytdml package should comply with the version 1.0 of the standard.
Perhaps I can find a time to ask my teacher or bring it up at the next meeting. But I think this project is basically led by both of us independently. So I think just add a comment indicating why is OK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some formatting issues with this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifications here ensure that the workflow is configured correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe these are related to pydantic 2.0 👍

@photonbit
Copy link
Contributor

One thing that would need to be solved is the conflict in the files, this is due to the fork not being updated. https://github.com/Relifest/pytdml needs to be rebased with the last changes in https://github.com/openrsgis/pytdml/ before being able to merge

@Relifest Relifest closed this Oct 25, 2024
@Relifest Relifest reopened this Oct 25, 2024
@Relifest
Copy link
Collaborator Author

One thing that would need to be solved is the conflict in the files, this is due to the fork not being updated. https://github.com/Relifest/pytdml needs to be rebased with the last changes in https://github.com/openrsgis/pytdml/ before being able to merge

I have already resolved it. In addition, I have set a rule for the master branch that requires a review before merging. You can check and see if there are any other rules that need to be added.

Copy link
Contributor

@photonbit photonbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is lovely to see the green check in the tests, congratulations!

One question that might not be related with this Pull Request I have: how do we know the difference between basic types and extended types? I don't recall seeing the difference stated in the standard spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the way to go. pyproject.toml offers more options and more modern tools are using it as the default option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tensorflow is a big package. pandas is also big, but not that big (if I remember correctly, tensorflow will add 1-2 Gb to the installed dependencies). But we would need to see if the basic functionality is using pandas or not. If not, leaving to optional it would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. What is the standard specifying in here? (this is more a note for me to check next week)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note to check next week, but I remember that when I was making a more deep review, there was something about the definition of extent in the standard version 1.0 that is really difficult to model with pydantic. The changes in the version 1.1 made the pydantic code straightforward, but I don't recall it it was this particular case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe these are related to pydantic 2.0 👍

@Relifest
Copy link
Collaborator Author

One question that might not be related with this Pull Request I have: how do we know the difference between basic types and extended types? I don't recall seeing the difference stated in the standard spec.

It is lovely to see the green check in the tests, congratulations!

One question that might not be related with this Pull Request I have: how do we know the difference between basic types and extended types? I don't recall seeing the difference stated in the standard spec.

The extranted_types here mainly refer to the extension of classes in basic_types, involving the extension of functionality or the division of functional boundaries. On the other hand, the classes within basic_types that have inheritance relationships with each other are mainly defined iterations. This does not seem to clearly indicate the difference between the two in the standard. It should just be an idea generated during the code implementation process at that time.

@Relifest Relifest merged commit e808920 into openrsgis:main Oct 30, 2024
2 checks passed
Copy link
Contributor

@photonbit photonbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in the second review, I have important comments that should have been addressed before merging. The current merged code is not compatible with the standard definition.

One is about the relation between EX_Extent and EOTrainingDataset; the other is about default values no longer being taken into consideration after merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the definition in version 1.1 and the one proposed in this PR, but as far as I know, the current implementation of the pytdml package should comply with the version 1.0 of the standard.

In this particular case, I am in doubt on how to proceed. The relation between EX_Extent and EOTrainingDataset in 1.0 and 1.1 are equivalent in practice, but coding the relation as it is defined in 1.0 would generate quite a lot of complexity, as defining different type roots in a pydantic class is not straightforward. We might need to ask about this in the working group, but I am inclined to model this after 1.1 (only for this relation) and add a comment indicating why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the standard is defining a default value, so setting the values to None will make the implementation not according with the standard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants