-
Notifications
You must be signed in to change notification settings - Fork 705
[Design] Intermediate Representation #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to have IR now, we still need to look into the details of the IR struct.
| ExtraSelect map[string]string // e.g. {"validation": "select * from iris.val;"} | ||
| Estimator string // e.g. "DNNClassifier" | ||
| Attribute map[string]interface{} // e.g. {"train.epoch": 1000, "model.hidden_units": [10 10]} | ||
| Feature map[string]FieldMeta // e.g. {"sepal_length": {"float", "", [1], false}, ...} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature only have information about how to parse the column data, we still need information about how to construct a feature column on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added FeatureColumn field in FieldMeta.
| Select string // e.g. "select * from iris.train" | ||
| ExtraSelect map[string]string // e.g. {"validation": "select * from iris.val;"} | ||
| Estimator string // e.g. "DNNClassifier" | ||
| Attribute map[string]interface{} // e.g. {"train.epoch": 1000, "model.hidden_units": [10 10]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#774 here's some of my thoughts about refactoring attribute resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@typhoonzero Thanks for the link. I think Attribute map[string]interface{} here is fine.
I believe AttrMeta belong to the code generation package since different machine learning toolkits accept different attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe AttrMeta belong to the code generation package since different machine learning toolkits accept different attributes
It is. Each submitter's code generation package define it's AttrMeta and call a function to get an Attribute map[string]interface{}.
| ExtraConfig string // Extra configuration in JSON format. e.g. OSS credential | ||
| Select string // e.g. "select * from iris.train" | ||
| Estimator string // e.g. "DNNClassifier" | ||
| Attribute map[string]interface{} // e.g. {"predict.batch_size": 32} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think singular is fine. The map type already indicates there are multiple attributes. The same reasoning applies to Feature.
| IsSparse bool // e.g. false | ||
| } | ||
|
|
||
| // TrainIR is the intermediate representation for code generation of a training job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need to implement an IR for each job, how about simplifying like:
type FeatureMeta struct {
DType string
Delimiter string
...
}
type DBConn struct {
Driver string
User string
....
}
type ClauseIR struct {
Estimator string
SelectClause string
Attributes map[string]interface{}
DBConn DBConn
Features map[string]FeatureMeta
...
}Each generator can extend the ClauseIR as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yancey1989 Thanks for the suggestion. Combining all three IRs to a single ClauseIR does save some code. However, I still advocate using separate IRs for different job types. Here is my reasoning.
- Avoid confusion. The developer of
xgboost.Predictwould be confused by theValidationSelectfield inClauseIR. Also, as we adding more features to SQLFlow, more fields would be added toCluaseIR, and the confusion will increase. - Less work. We either distinguish the job type in
sqlor incodegen. However, there are manycodegens and only onesql. Distinguishing the job type insqlsaves works in allcodegens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.