-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added text parsing capability to merge script so that titles are standardized to[MODULE] SPARK-XXX: Description.
Added accuracy tweaks based on test results
Replaced queue with deque
@@ -285,6 +286,56 @@ def resolve_jira_issues(title, merge_branches, comment): | |||
for jira_id in jira_ids: | |||
resolve_jira_issue(merge_branches, comment, jira_id) | |||
|
|||
def standardize_jira_ref(text): | |||
# Standardize the [MODULE] SPARK-XXXXX prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be pretty cool to add a doctest here. I've generally found doctests to be pretty helpful when developing parsing / string manipulation stuff like this, such as the pull request title parser that I wrote for spark-prs.appspot.com: https://github.com/databricks/spark-pr-dashboard/blob/0ee9102b7bb30fa61e2f47c3d282c5a018a7d198/sparkprs/utils.py#L55
Unfortunately, this script isn't well-structured to support doctests because the main code isn't in a method. If you want to test this, we can move the lines below this function (starting on 291/341) into a new main() function, then call that from a `if name == 'main' block. It will blow up the size of the diff a bit, but that's not a huge deal since we don't backport changes to this file anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, Josh. Check out the latest diff and see if it's what you had in mind.
I like what you have in spark-pr-dashboard. Would it make more sense to reference your code from the merge script instead? It's a bit different & would require a bit of finagling, but I hate to have similar code duplicated in different places.
Also corrected a spelling mistake in one of the print statements.
ok to test |
Test build #29305 has finished for PR 5149 at commit
|
Test build #29857 has finished for PR 5149 at commit
|
return clean_text | ||
|
||
def main(): | ||
os.chdir(SPARK_HOME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure since it's a bit tricky with the diff here - all of this is simply re-organization, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - everything inside main() was previously containerless. All statements not part of a function were moved into main() and doctest execution was added at the end. The only new code is the function standardize_jira_ref(), along with its reference on line 365.
Hey @texasmichelle thanks for contributing this. It slipped of my radar but it will be nice to get something like this in. One thing though, even though I originally intended the format to be "SPARK XXX", in practice, pretty much every contributor now puts brackets around that part /cc @srowen. So it has now sort of become the de-facto standard! We should probably update this page to simply tell people to put brackets: So I think what we really want now is to coerce the presence of brackets rather than remove it! If you look at some recent titles, a few of them have this problem. Sorry for some delay in reviewing this, I can address any updates promptly in the next week. Maybe we can start with that pretty simple rule, and then we can expand in subsequent patches to do fancier stuff. The broad organization here looks good. |
If we're being super pedantic, I write |
Thanks for the feedback @pwendell & @srowen. Just to be clear before I refactor the regex's - we're looking for: Also, is this the only place we want this functionality? I couldn't find any other references, but don't know the code base well yet. |
Test build #30502 has finished for PR 5149 at commit
|
Yeah, fair point that most PRs are writing |
OK - changes we discussed are complete. |
Test build #30543 has finished for PR 5149 at commit
|
pr_events = get_json("%s/issues/%s/events" % (GITHUB_API_BASE, pr_num)) | ||
|
||
url = pr["url"] | ||
title = standardize_jira_ref(pr["title"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the title has been modified, I'd prompt the user here for a yes/no as to whether to use the modified title. (i.e. just say "I've re-written the title as follows to match the format: . Would you like to use the new title?"... with and each on a new line). That way if our heuristic is broken in some way the committer can just ignore the re-writing.
This is looking good in terms of functionality. I made some comments to help simplify the code a bit. |
…orig or modified)
Great, detailed suggestions @pwendell - thanks! A few of the things you pointed out were remnants of the old format (no brackets, adding a colon) and could be removed. I initially chose deque for its performance over lists, but considering the scale here, those benefits are negligible. For better maintainability, deque was swapped out. Also, a prompt was added for choosing between the original and modified titles. |
url = pr["url"] | ||
|
||
# Decide whether to use the modified title or not | ||
print "I've re-written the title as follows to match the standard format:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we only do the prompt if we've made a modification? In many cases the title is properly formatted, so we don't need to do this. I'd compute standardize_jira_ref(pr["title"])
earlier on, then check whether it matches pr["title"]
. Only ask the user if it's different.
Left one minor comment about the usability of the prompt - but otherwise looking good. |
Test build #30595 has finished for PR 5149 at commit
|
Test build #30597 has finished for PR 5149 at commit
|
Prompt now disappears if the title is not modified - good call, @pwendell. |
LGTM |
Test build #30615 has finished for PR 5149 at commit
|
Okay I'll pull this in now. |
Ah actually looks like there is a small bug caused by the refactoring (ping @texasmichelle):
|
I should have caught that :/ Fix: I defined one variable as global since it is used in clean_up(). The others are added as additional parameters to the merge_pr() function call. |
Test build #30673 has finished for PR 5149 at commit
|
Thanks! I've merged this in. |
…X prefix Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode [MODULE] SPARK-XXXX: Description Author: texasmichelle <texasmichelle@gmail.com> Closes apache#5149 from texasmichelle/master and squashes the following commits: 9b6b0a7 [texasmichelle] resolved variable scope issue 7d5fa20 [texasmichelle] only prompt if title has been modified 8c195bb [texasmichelle] removed erroneous line 4f1ed46 [texasmichelle] Deque removal, logic simplifications, & prompt user to pick a title (orig or modified) df73f6a [texasmichelle] reworked regex's to enforce brackets around JIRA ref 43b5aed [texasmichelle] Merge remote-tracking branch 'apache/master' 25229c6 [texasmichelle] Merge remote-tracking branch 'apache/master' aa20a6e [texasmichelle] Move code into main() and add doctest for new text parsing method 48520ba [texasmichelle] SPARK-1684: Corrected import statement 042099d [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix 8f4a7d1 [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix
Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode
[MODULE] SPARK-XXXX: Description