Skip to content

[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

texasmichelle
Copy link
Contributor

Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode

[MODULE] SPARK-XXXX: Description

Added text parsing capability to merge script so that titles are
standardized to[MODULE] SPARK-XXX: Description.
Added accuracy tweaks based on test results
Replaced queue with deque
@texasmichelle texasmichelle changed the title SPARK-1684: Merge script should standardize SPARK-XXX prefix [Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix Mar 24, 2015
@@ -285,6 +286,56 @@ def resolve_jira_issues(title, merge_branches, comment):
for jira_id in jira_ids:
resolve_jira_issue(merge_branches, comment, jira_id)

def standardize_jira_ref(text):
# Standardize the [MODULE] SPARK-XXXXX prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be pretty cool to add a doctest here. I've generally found doctests to be pretty helpful when developing parsing / string manipulation stuff like this, such as the pull request title parser that I wrote for spark-prs.appspot.com: https://github.com/databricks/spark-pr-dashboard/blob/0ee9102b7bb30fa61e2f47c3d282c5a018a7d198/sparkprs/utils.py#L55

Unfortunately, this script isn't well-structured to support doctests because the main code isn't in a method. If you want to test this, we can move the lines below this function (starting on 291/341) into a new main() function, then call that from a `if name == 'main' block. It will blow up the size of the diff a bit, but that's not a huge deal since we don't backport changes to this file anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, Josh. Check out the latest diff and see if it's what you had in mind.
I like what you have in spark-pr-dashboard. Would it make more sense to reference your code from the merge script instead? It's a bit different & would require a bit of finagling, but I hate to have similar code duplicated in different places.

Also corrected a spelling mistake in one of the print statements.
@srowen
Copy link
Member

srowen commented Mar 27, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #29305 has finished for PR 5149 at commit aa20a6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29857 has finished for PR 5149 at commit 25229c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

return clean_text

def main():
os.chdir(SPARK_HOME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure since it's a bit tricky with the diff here - all of this is simply re-organization, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - everything inside main() was previously containerless. All statements not part of a function were moved into main() and doctest execution was added at the end. The only new code is the function standardize_jira_ref(), along with its reference on line 365.

@pwendell
Copy link
Contributor

Hey @texasmichelle thanks for contributing this. It slipped of my radar but it will be nice to get something like this in. One thing though, even though I originally intended the format to be "SPARK XXX", in practice, pretty much every contributor now puts brackets around that part /cc @srowen. So it has now sort of become the de-facto standard!

We should probably update this page to simply tell people to put brackets:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

So I think what we really want now is to coerce the presence of brackets rather than remove it! If you look at some recent titles, a few of them have this problem.
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog

Sorry for some delay in reviewing this, I can address any updates promptly in the next week. Maybe we can start with that pretty simple rule, and then we can expand in subsequent patches to do fancier stuff.

The broad organization here looks good.

@srowen
Copy link
Member

srowen commented Apr 17, 2015

If we're being super pedantic, I write SPARK-xxxx [COMPONENT] ... and that's what I'll enshrine in the contributing wiki update. I'd say no brackets... well I don't feel strongly but if that's what's in the code and what I think people are mostly doing, let's leave that.

@texasmichelle
Copy link
Contributor Author

Thanks for the feedback @pwendell & @srowen. Just to be clear before I refactor the regex's - we're looking for:
[SPARK-XXXX] [COMPONENT] Description
i.e. a space between closing/opening brackets, no colon before Description, & everything prior to Description forced to uppercase? I'm not crazy about having brackets around the JIRA ref, but it's in the vast majority of existing PRs & therefore makes sense.

Also, is this the only place we want this functionality? I couldn't find any other references, but don't know the code base well yet.

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30502 has finished for PR 5149 at commit 43b5aed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Params(
    • sealed abstract class Node extends Serializable
    • sealed trait Split extends Serializable
    • final class CategoricalSplit(
    • final class ContinuousSplit(override val featureIndex: Int, val threshold: Double) extends Split
    • trait DecisionTreeModel
  • This patch does not change any dependencies.

@srowen
Copy link
Member

srowen commented Apr 18, 2015

Yeah, fair point that most PRs are writing [SPARK-xxxx]. Honestly do whatever is less additional work at this point. Either standard going forward is fine. I'll reflect whatever it is in the updated contribution wiki.

@texasmichelle
Copy link
Contributor Author

OK - changes we discussed are complete.

@SparkQA
Copy link

SparkQA commented Apr 19, 2015

Test build #30543 has finished for PR 5149 at commit df73f6a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

pr_events = get_json("%s/issues/%s/events" % (GITHUB_API_BASE, pr_num))

url = pr["url"]
title = standardize_jira_ref(pr["title"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the title has been modified, I'd prompt the user here for a yes/no as to whether to use the modified title. (i.e. just say "I've re-written the title as follows to match the format: . Would you like to use the new title?"... with and each on a new line). That way if our heuristic is broken in some way the committer can just ignore the re-writing.

@pwendell
Copy link
Contributor

This is looking good in terms of functionality. I made some comments to help simplify the code a bit.

@texasmichelle
Copy link
Contributor Author

Great, detailed suggestions @pwendell - thanks! A few of the things you pointed out were remnants of the old format (no brackets, adding a colon) and could be removed. I initially chose deque for its performance over lists, but considering the scale here, those benefits are negligible. For better maintainability, deque was swapped out. Also, a prompt was added for choosing between the original and modified titles.

url = pr["url"]

# Decide whether to use the modified title or not
print "I've re-written the title as follows to match the standard format:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only do the prompt if we've made a modification? In many cases the title is properly formatted, so we don't need to do this. I'd compute standardize_jira_ref(pr["title"]) earlier on, then check whether it matches pr["title"]. Only ask the user if it's different.

@pwendell
Copy link
Contributor

Left one minor comment about the usability of the prompt - but otherwise looking good.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30595 has finished for PR 5149 at commit 4f1ed46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30597 has finished for PR 5149 at commit 8c195bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@texasmichelle
Copy link
Contributor Author

Prompt now disappears if the title is not modified - good call, @pwendell.
Is everyone happy with the rules here? I uploaded a doc to the JIRA that shows what the before and after would be for existing PR titles (spark_pulls_before_after.txt).

@pwendell
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30615 has finished for PR 5149 at commit 7d5fa20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@pwendell
Copy link
Contributor

Okay I'll pull this in now.

@pwendell
Copy link
Contributor

Ah actually looks like there is a small bug caused by the refactoring (ping @texasmichelle):

Traceback (most recent call last):
  File "./dev/merge_spark_pr.py", line 432, in <module>
    main()
  File "./dev/merge_spark_pr.py", line 410, in main
    merge_hash = merge_pr(pr_num, target_ref)
  File "./dev/merge_spark_pr.py", line 125, in merge_pr
    merge_message_flags += ["-m", title]
NameError: global name 'title' is not defined

@texasmichelle
Copy link
Contributor Author

I should have caught that :/ Fix: I defined one variable as global since it is used in clean_up(). The others are added as additional parameters to the merge_pr() function call.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30673 has finished for PR 5149 at commit 9b6b0a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@pwendell
Copy link
Contributor

Thanks! I've merged this in.

@asfgit asfgit closed this in a0761ec Apr 22, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…X prefix

Cleans up the pull request title in the merge script to follow conventions outlined in the wiki under Contributing Code.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode

[MODULE] SPARK-XXXX: Description

Author: texasmichelle <texasmichelle@gmail.com>

Closes apache#5149 from texasmichelle/master and squashes the following commits:

9b6b0a7 [texasmichelle] resolved variable scope issue
7d5fa20 [texasmichelle] only prompt if title has been modified
8c195bb [texasmichelle] removed erroneous line
4f1ed46 [texasmichelle] Deque removal, logic simplifications, & prompt user to pick a title (orig or modified)
df73f6a [texasmichelle] reworked regex's to enforce brackets around JIRA ref
43b5aed [texasmichelle] Merge remote-tracking branch 'apache/master'
25229c6 [texasmichelle] Merge remote-tracking branch 'apache/master'
aa20a6e [texasmichelle] Move code into main() and add doctest for new text parsing method
48520ba [texasmichelle] SPARK-1684: Corrected import statement
042099d [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix
8f4a7d1 [texasmichelle] SPARK-1684 Merge script should standardize SPARK-XXX prefix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants