[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149

texasmichelle · 2015-03-24T02:06:37Z

Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode

[MODULE] SPARK-XXXX: Description

Added text parsing capability to merge script so that titles are standardized to[MODULE] SPARK-XXX: Description.

Added accuracy tweaks based on test results

Replaced queue with deque

JoshRosen · 2015-03-24T07:40:45Z

dev/merge_spark_pr.py

@@ -285,6 +286,56 @@ def resolve_jira_issues(title, merge_branches, comment):
    for jira_id in jira_ids:
        resolve_jira_issue(merge_branches, comment, jira_id)

+def standardize_jira_ref(text):
+    # Standardize the [MODULE] SPARK-XXXXX prefix


It would be pretty cool to add a doctest here. I've generally found doctests to be pretty helpful when developing parsing / string manipulation stuff like this, such as the pull request title parser that I wrote for spark-prs.appspot.com: https://github.com/databricks/spark-pr-dashboard/blob/0ee9102b7bb30fa61e2f47c3d282c5a018a7d198/sparkprs/utils.py#L55

Unfortunately, this script isn't well-structured to support doctests because the main code isn't in a method. If you want to test this, we can move the lines below this function (starting on 291/341) into a new main() function, then call that from a `if name == 'main' block. It will blow up the size of the diff a bit, but that's not a huge deal since we don't backport changes to this file anyways.

Great idea, Josh. Check out the latest diff and see if it's what you had in mind.
I like what you have in spark-pr-dashboard. Would it make more sense to reference your code from the merge script instead? It's a bit different & would require a bit of finagling, but I hate to have similar code duplicated in different places.

Also corrected a spelling mistake in one of the print statements.

srowen · 2015-03-27T13:32:47Z

ok to test

SparkQA · 2015-03-27T15:02:15Z

Test build #29305 has finished for PR 5149 at commit aa20a6e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-04-08T14:23:25Z

Test build #29857 has finished for PR 5149 at commit 25229c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

pwendell · 2015-04-17T04:26:44Z

dev/merge_spark_pr.py

+    return clean_text
+
+def main():
+    os.chdir(SPARK_HOME)


Just to be sure since it's a bit tricky with the diff here - all of this is simply re-organization, correct?

Right - everything inside main() was previously containerless. All statements not part of a function were moved into main() and doctest execution was added at the end. The only new code is the function standardize_jira_ref(), along with its reference on line 365.

pwendell · 2015-04-17T04:31:39Z

Hey @texasmichelle thanks for contributing this. It slipped of my radar but it will be nice to get something like this in. One thing though, even though I originally intended the format to be "SPARK XXX", in practice, pretty much every contributor now puts brackets around that part /cc @srowen. So it has now sort of become the de-facto standard!

We should probably update this page to simply tell people to put brackets:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

So I think what we really want now is to coerce the presence of brackets rather than remove it! If you look at some recent titles, a few of them have this problem.
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog

Sorry for some delay in reviewing this, I can address any updates promptly in the next week. Maybe we can start with that pretty simple rule, and then we can expand in subsequent patches to do fancier stuff.

The broad organization here looks good.

srowen · 2015-04-17T08:26:25Z

If we're being super pedantic, I write SPARK-xxxx [COMPONENT] ... and that's what I'll enshrine in the contributing wiki update. I'd say no brackets... well I don't feel strongly but if that's what's in the code and what I think people are mostly doing, let's leave that.

texasmichelle · 2015-04-17T22:03:56Z

Thanks for the feedback @pwendell & @srowen. Just to be clear before I refactor the regex's - we're looking for:
[SPARK-XXXX] [COMPONENT] Description
i.e. a space between closing/opening brackets, no colon before Description, & everything prior to Description forced to uppercase? I'm not crazy about having brackets around the JIRA ref, but it's in the vast majority of existing PRs & therefore makes sense.

Also, is this the only place we want this functionality? I couldn't find any other references, but don't know the code base well yet.

SparkQA · 2015-04-17T23:21:30Z

Test build #30502 has finished for PR 5149 at commit 43b5aed.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Params(
- sealed abstract class Node extends Serializable
- sealed trait Split extends Serializable
- final class CategoricalSplit(
- final class ContinuousSplit(override val featureIndex: Int, val threshold: Double) extends Split
- trait DecisionTreeModel
This patch does not change any dependencies.

srowen · 2015-04-18T07:07:18Z

Yeah, fair point that most PRs are writing [SPARK-xxxx]. Honestly do whatever is less additional work at this point. Either standard going forward is fine. I'll reflect whatever it is in the updated contribution wiki.

texasmichelle · 2015-04-19T03:02:36Z

OK - changes we discussed are complete.

SparkQA · 2015-04-19T04:43:25Z

Test build #30543 has finished for PR 5149 at commit df73f6a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

pwendell · 2015-04-20T04:28:29Z

dev/merge_spark_pr.py

+    pr_events = get_json("%s/issues/%s/events" % (GITHUB_API_BASE, pr_num))
+
+    url = pr["url"]
+    title = standardize_jira_ref(pr["title"])


If the title has been modified, I'd prompt the user here for a yes/no as to whether to use the modified title. (i.e. just say "I've re-written the title as follows to match the format: . Would you like to use the new title?"... with and each on a new line). That way if our heuristic is broken in some way the committer can just ignore the re-writing.

pwendell · 2015-04-20T05:40:24Z

This is looking good in terms of functionality. I made some comments to help simplify the code a bit.

…orig or modified)

texasmichelle · 2015-04-20T17:54:44Z

Great, detailed suggestions @pwendell - thanks! A few of the things you pointed out were remnants of the old format (no brackets, adding a colon) and could be removed. I initially chose deque for its performance over lists, but considering the scale here, those benefits are negligible. For better maintainability, deque was swapped out. Also, a prompt was added for choosing between the original and modified titles.

pwendell · 2015-04-20T18:13:30Z

dev/merge_spark_pr.py

+    url = pr["url"]
+
+    # Decide whether to use the modified title or not
+    print "I've re-written the title as follows to match the standard format:"


Can we only do the prompt if we've made a modification? In many cases the title is properly formatted, so we don't need to do this. I'd compute standardize_jira_ref(pr["title"]) earlier on, then check whether it matches pr["title"]. Only ask the user if it's different.

pwendell · 2015-04-20T18:14:17Z

Left one minor comment about the usability of the prompt - but otherwise looking good.

SparkQA · 2015-04-20T19:33:19Z

Test build #30595 has finished for PR 5149 at commit 4f1ed46.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-20T19:56:23Z

Test build #30597 has finished for PR 5149 at commit 8c195bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

texasmichelle · 2015-04-21T00:57:42Z

Prompt now disappears if the title is not modified - good call, @pwendell.
Is everyone happy with the rules here? I uploaded a doc to the JIRA that shows what the before and after would be for existing PR titles (spark_pulls_before_after.txt).

pwendell · 2015-04-21T01:05:07Z

LGTM

SparkQA · 2015-04-21T02:38:15Z

Test build #30615 has finished for PR 5149 at commit 7d5fa20.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

pwendell · 2015-04-21T06:09:26Z

Okay I'll pull this in now.

pwendell · 2015-04-21T06:12:40Z

Ah actually looks like there is a small bug caused by the refactoring (ping @texasmichelle):

Traceback (most recent call last):
  File "./dev/merge_spark_pr.py", line 432, in <module>
    main()
  File "./dev/merge_spark_pr.py", line 410, in main
    merge_hash = merge_pr(pr_num, target_ref)
  File "./dev/merge_spark_pr.py", line 125, in merge_pr
    merge_message_flags += ["-m", title]
NameError: global name 'title' is not defined

texasmichelle · 2015-04-21T13:12:43Z

I should have caught that :/ Fix: I defined one variable as global since it is used in clean_up(). The others are added as additional parameters to the merge_pr() function call.

SparkQA · 2015-04-21T14:58:35Z

Test build #30673 has finished for PR 5149 at commit 9b6b0a7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

pwendell · 2015-04-22T01:06:59Z

Thanks! I've merged this in.

…X prefix Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode [MODULE] SPARK-XXXX: Description Author: texasmichelle <texasmichelle@gmail.com> Closes apache#5149 from texasmichelle/master and squashes the following commits: 9b6b0a7 [texasmichelle] resolved variable scope issue 7d5fa20 [texasmichelle] only prompt if title has been modified 8c195bb [texasmichelle] removed erroneous line 4f1ed46 [texasmichelle] Deque removal, logic simplifications, & prompt user to pick a title (orig or modified) df73f6a [texasmichelle] reworked regex's to enforce brackets around JIRA ref 43b5aed [texasmichelle] Merge remote-tracking branch 'apache/master' 25229c6 [texasmichelle] Merge remote-tracking branch 'apache/master' aa20a6e [texasmichelle] Move code into main() and add doctest for new text parsing method 48520ba [texasmichelle] SPARK-1684: Corrected import statement 042099d [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix 8f4a7d1 [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix

texasmichelle added 3 commits March 23, 2015 20:18

SPARK-1684 Merge script should standardize SPARK-XXX prefix

8f4a7d1

Added text parsing capability to merge script so that titles are standardized to[MODULE] SPARK-XXX: Description.

SPARK-1684 Merge script should standardize SPARK-XXX prefix

042099d

Added accuracy tweaks based on test results

SPARK-1684: Corrected import statement

48520ba

Replaced queue with deque

texasmichelle changed the title ~~SPARK-1684: Merge script should standardize SPARK-XXX prefix~~ [Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix Mar 24, 2015

JoshRosen reviewed Mar 24, 2015
View reviewed changes

Move code into main() and add doctest for new text parsing method

aa20a6e

Also corrected a spelling mistake in one of the print statements.

Merge remote-tracking branch 'apache/master'

25229c6

pwendell reviewed Apr 17, 2015
View reviewed changes

Merge remote-tracking branch 'apache/master'

43b5aed

reworked regex's to enforce brackets around JIRA ref

df73f6a

pwendell reviewed Apr 20, 2015
View reviewed changes

Deque removal, logic simplifications, & prompt user to pick a title (…

4f1ed46

…orig or modified)

removed erroneous line

8c195bb

pwendell reviewed Apr 20, 2015
View reviewed changes

only prompt if title has been modified

7d5fa20

resolved variable scope issue

9b6b0a7

asfgit closed this in a0761ec Apr 22, 2015

[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149

[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149

Uh oh!

Conversation

texasmichelle commented Mar 24, 2015

Uh oh!

JoshRosen Mar 24, 2015

Choose a reason for hiding this comment

Uh oh!

texasmichelle Mar 25, 2015

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 27, 2015

Uh oh!

SparkQA commented Mar 27, 2015

Uh oh!

SparkQA commented Apr 8, 2015

Uh oh!

pwendell Apr 17, 2015

Choose a reason for hiding this comment

Uh oh!

texasmichelle Apr 17, 2015

Choose a reason for hiding this comment

Uh oh!

pwendell commented Apr 17, 2015

Uh oh!

srowen commented Apr 17, 2015

Uh oh!

texasmichelle commented Apr 17, 2015

Uh oh!

SparkQA commented Apr 17, 2015

Uh oh!

srowen commented Apr 18, 2015

Uh oh!

texasmichelle commented Apr 19, 2015

Uh oh!

SparkQA commented Apr 19, 2015

Uh oh!

pwendell Apr 20, 2015

Choose a reason for hiding this comment

Uh oh!

pwendell commented Apr 20, 2015

Uh oh!

texasmichelle commented Apr 20, 2015

Uh oh!

pwendell Apr 20, 2015

Choose a reason for hiding this comment

Uh oh!

pwendell commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

SparkQA commented Apr 20, 2015

Uh oh!

texasmichelle commented Apr 21, 2015

Uh oh!

pwendell commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

pwendell commented Apr 21, 2015

Uh oh!

pwendell commented Apr 21, 2015

Uh oh!

texasmichelle commented Apr 21, 2015

Uh oh!

SparkQA commented Apr 21, 2015

Uh oh!

pwendell commented Apr 22, 2015

Uh oh!

Uh oh!