-
Notifications
You must be signed in to change notification settings - Fork 832
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding more detailed and structured process to contributors' guide.
- Loading branch information
1 parent
882f4c2
commit bb6a495
Showing
1 changed file
with
74 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,83 @@ | ||
## Interested in contributing to MMLSpark? We're excited to work with you. | ||
|
||
### You can contribute in many ways | ||
### You can contribute in many ways: | ||
|
||
* Use the library and give feedback | ||
* Report a bug | ||
* Request a feature | ||
* Fix a bug | ||
* Add examples and documentation | ||
* Code a new feature | ||
* Review pull requests | ||
* Use the library and give feedback: report bugs, request features. | ||
* Add sample Jupyter notebooks, Python or Scala code examples, documentation | ||
pages. | ||
* Fix bugs and issues. | ||
* Add new features, such as data transformations or machine learning algorithms. | ||
* Review pull requests from other contributors. | ||
|
||
### How to contribute? | ||
|
||
You can give feedback, report bugs and request new features anytime by | ||
opening an issue. Also, you can up-vote and comment on existing issues. | ||
You can give feedback, report bugs and request new features anytime by opening | ||
an issue. Also, you can up-vote or comment on existing issues. | ||
|
||
To make a pull request into the repo, such as bug fixes, documentation | ||
or new features, follow these steps: | ||
If you want to add code, examples or documentation to the repository, follow | ||
this process: | ||
|
||
* If it's a new feature, open an issue for preliminary discussion with | ||
us, to ensure your contribution is a good fit and doesn't duplicate | ||
#### Propose a contribution | ||
|
||
* Preferably, get started by tackling existing issues to get yourself acquainted | ||
with the library source and the process. | ||
* Open an issue, or comment on an existing issue to discuss your contribution | ||
and design, to ensure your contribution is a good fit and doesn't duplicate | ||
on-going work. | ||
* Typically, you'll need to accept Microsoft Contributor Licence | ||
Agreement (CLA). | ||
* Familiarize yourself with coding style and guidelines. | ||
* Fork the repository, code your contribution, and create a pull | ||
request. | ||
* Wait for an MMMLSpark team member to review and accept it. Be patient | ||
as we iron out the process for a new project. | ||
|
||
A good way to get started contributing is to look for issues with a "help | ||
wanted" label. These are issues that we do want to fix, but don't have | ||
resources to work on currently. | ||
* Any algorithm you're planning to contribute should be well known and accepted | ||
for production use, and backed by research papers. | ||
* Algorithms should be highly scalable and suitable for very large datasets. | ||
* All contributions need to comply with the MIT License. Contributors external | ||
to Microsoft need to sign CLA. | ||
|
||
#### Implement your contribution | ||
|
||
* Fork the MMLSpark repository. | ||
* Implement your algorithm in Scala, using our wrapper generation mechanism to | ||
produce PySpark bindings. | ||
* Use SparkML `PipelineStage`s so your algorithm can be used as a part of | ||
pipeline. | ||
* For parameters use `MMLParam`s. | ||
* Implement model saving and loading by extending SparkML `MLReadable`. | ||
* Use good Scala style. | ||
* Binary dependencies should be on Maven Central. | ||
* See this [pull request](https://github.com/Azure/mmlspark/pull/22) for an | ||
example contribution. | ||
|
||
#### Implement tests | ||
|
||
* Set up build environment. Use a Linux machine or VM (we use Ubuntu, but other | ||
distros should work too), and install environment using the [`runme` | ||
script](runme). | ||
* Test your code locally. | ||
* Add tests using ScalaTests — unit tests are required. | ||
* A sample notebook is required as an end-to-end test. | ||
|
||
#### Implement documentation | ||
|
||
* Add a [sample Jupyter notebook](notebooks/samples) that shows the intended use | ||
case of your algorithm, with instructions in step-by-step manner. (The same | ||
notebook could be used for testing the code.) | ||
* Add in-line ScalaDoc comments to your source code, to generate the [API | ||
reference documentation](https://mmlspark.azureedge.net/docs/pyspark/) | ||
|
||
#### Open a pull request | ||
|
||
* In most cases, you should squash your commits into one. | ||
* Open a pull request, and link it to the discussion issue you created earlier. | ||
* An MMLSpark core team member will trigger a build to test your changes. | ||
* Fix any build failures. (The pull request will have comments from the build | ||
with useful links.) | ||
* Wait for code reviews from core team members and others. | ||
* Fix issues found in code review and re-iterate. | ||
|
||
#### Build and check-in | ||
|
||
* Wait for a core team member to merge your code in. | ||
* Your feature will be available through a Docker image and script installation | ||
in the next release, which typically happens around once a month. You can try | ||
out your features sooner by using build artifacts for the version that has | ||
your changes merged in (such versions end with a `.devN`). | ||
|
||
If in doubt about how to do something, see how it was done in existing code or | ||
pull requests, and don't hesitate to ask. |