Skip to content
Merged
Binary file added .github/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
73 changes: 73 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Contributor Covenant Code of Conduct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for such effort of improving the documentation!!!

Let me suggest some things; here is a list of the main ones:

  • README.md:
    • explain first the app purpose, and then its motivation,
    • refactor the installation section to cover the full process from the beginning to the end, explaining the two different alternatives: with/without docker, and how to setup the backend DB,
    • fix docker commands, that does not work,
    • simplify the import/export section to its minimum necesities,
    • add a Requirements section (example in landing)
    • add a link to the Contributing#Development
  • Contributing.md
    • simplify the Contributing#Development section,


## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
education, socio-economic status, nationality, personal appearance, race,
religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at conduct@sourced.tech. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org
82 changes: 82 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Contributing Guidelines

source{d} code annotation tool project is [GPLv3 licensed](LICENSE) and accept
contributions via GitHub pull requests. This document outlines some of the
conventions on development workflow, commit message formatting, contact points,
and other resources to make it easier to get your contribution accepted.

## Certificate of Origin

By contributing to this project you agree to the [Developer Certificate of
Origin (DCO)](DCO). This document was created by the Linux Kernel community and is a
simple statement that you, as a contributor, have the legal right to make the
contribution.

In order to show your agreement with the DCO you should include at the end of commit message,
the following line: `Signed-off-by: John Doe <john.doe@example.com>`, using your real name.

This can be done easily using the [`-s`](https://github.com/git/git/blob/b2c150d3aa82f6583b9aadfecc5f8fa1c74aca09/Documentation/git-commit.txt#L154-L161) flag on the `git commit`.

## Support Channels

The official support channels, for both users and contributors, are:

* GitHub [issues](https://github.com/src-d/code-annotation/issues)\*

\*Before opening a new issue or submitting a new pull request, it's helpful to
search the project - it's likely that another user has already reported the
issue you're facing, or it's a known issue that we're already aware of.

## How to Contribute

Pull Requests (PRs) are the main and exclusive way to contribute to the code-annotation project.
In order for a PR to be accepted it needs to pass a list of requirements:

* If the PR is a bug fix, it has to include a new unit test that fails before the patch is merged.
* If the PR is a new feature, it has to come with a suite of unit tests, that tests the new functionality.
* In any case, all the PRs have to pass the personal evaluation of at least one of the [maintainers](MAINTAINERS) of code-annotation project.

### Format of the commit message

Every commit message should describe what was changed, under which context and, if applicable, the GitHub issue it relates to:

```
plumbing: packp, Skip argument validations for unknown capabilities. Fixes #623
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a single line?
Normally the Fixes #123 tends to be on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```

## Development
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section needs a bit more detail.

For example:
I can run the frontend in dev mode with yarn start. Does that mean the backend is not needed? Or needs to be started in some other way?
If I run make gorun, do I need to build the frontend code before?
What about the github auth tokens mentioned in the REAME? Is also needed for development?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about frontend is it more clear in #36 in description?
I'll add info about token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in that comment the steps are more clearly put


> Please note: you will need a .env file configured with working GitHub OAuth credentials to run the application in development mode.
> Please follow the [README Installation section](./README.md#installation) for instructions on how to do it.
To build and run the tool, execute:

```bash
$ go get -d -u github.com/src-d/code-annotation/...
$ cd $GOPATH/github.com/src-d/code-annotation
$ make serve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing step, setting .env. Otherwise make serve will complain.

Also make serve will not inform that the server is listening on :8080

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same than wrote in the README.md under the "Non-docker" section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same than wrote in README.md under the "Installation section", so unnecessary here

```

### Frontend:

If you want to benefit from frontend hot reloading feature:

Run server. Execute:

```bash
$ UI_DOMAIN=http://127.0.0.1:3000 make gorun
```

And then run frontend in dev mode. Execute:

```bash
$ yarn start
```

### Backend:

Shortcut to run `go run` with environment variables

```bash
$ make gorun
```
36 changes: 36 additions & 0 deletions DCO
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
132 changes: 106 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,140 @@
# Source Code Annotation application
[![Build Status](https://travis-ci.org/src-d/code-annotation.svg)](https://travis-ci.org/src-d/code-annotation)
![unstable](https://svg-badge.appspot.com/badge/stability/unstable?a)

In order to evaluate quality of ML models, as well as to create “ImageNet for source core” there is a need for tools to automate the data collection/labeling/annotation.
# Source Code Annotation Tool

## Installation
Training Machine Learning models often requires large datasets to be duly annotated.
The nature of these annotations vary depending on the dataset considered: they can be
the number to be recognized in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/),
the coordinates of the box containing the objects to be identified in an object detection problem, etc.

This tool provides a simple UI to add annotations to existing datasets, a command line tool
to fetch more elements to be annotated, and an export mechanism.

Currently, the project provides one single example consisting on labeling two pieces of code
as being identical, similar, or different.

Source code annotation tool offers an UI to annotate source code and review these annotations, and a CLI to define the code to be annotated and export the annotations.

![Screenshot](.github/screenshot.png?raw=true)

## Requirements

### Global dependencies

You should already have [Go installed](https://golang.org/doc/install#install), and properly [configured the $GOPATH](https://github.com/golang/go/wiki/SettingGOPATH)

```
go version; # prints your go version
echo $GOPATH; # prints your $GOPATH path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use go env GOPATH instead so it prints the default GOPATH if none has been defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry? We need $GOPATH later to do cd.

```

The project must be under $GOPATH, as required by the Go tooling.
You should be able to navigate into the source code by running:

```
cd $GOPATH/src/github.com/src-d/code-annotation
```

You need also [Yarn v1.x.x installed](https://yarnpkg.com/en/docs/install)

```
yarn --version; # prints your Yarn version
```

### Github OAuth tokens

First you need OAuth application on github. [Read how to create it](https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/).
1. You need an OAuth application on GitHub. See [how to create OAuth applications on GitHub](https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/).

In order to be able to use this application while running the tool locally, make sure you add http://127.0.0.1:8080/oauth-callback to the authorization callback URL field.

On a [page](https://github.com/settings/developers) with your application you will need `Client ID` and `Client Secret`
2. Copy `.env.tpl` to `.env`.

Copy `.env.tpl` to `.env` and set tokens there.
3. Retrieve the values for your application's Client ID and Client Secret from the [GitHub Developer Settings page](https://github.com/settings/developers) and add them to the end of the corresponding lines in .env.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reword:

Edit the .env file to use the Client ID and Client Secret you obtained after creating your OAuth App. You can recover both codes from your registered OAuth Apps at GitHub Developer settings: OAuth Apps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "recover" isn't a correct English word for it.
This text was suggested by @campoy, I prefer to stick to it.
#29 (comment)


### Docker
## Installation

You need to satisfy all [project requirements](#requirements), and then run:

```bash
docker build -t srcd/code-annotation .
docker run --env-file .env --rm -p 8080:8080 srcd/code-annotation
$ go get github.com/src-d/code-annotation/...
$ cd $GOPATH/github.com/src-d/code-annotation
$ make serve
```

### Non-docker
This will start a server locally, which you can access on [http://localhost:8080](http://localhost:8080)

## Importing and Exporting Data

### Import File Pairs for Annotation

The file pairs must be provided via an [SQLite](https://sqlite.org/) database. The database **must follow the expected schema**, please [follow this link](./cli/examples/import/example.sql) to see an example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it assumed that the reader of this README.md will have any extra context/background about this project?
Or –in the other hand– should the README.md be something understandable from the beginning to the end –without extra context–?

If we choose the second, considering the intro of the README.md:

Currently, the project provides one (feature) consisting on labeling two pieces of code as being identical, similar, or different.

I'd reword the first phrase of this section:

The pieces of code to be annotated as identical/similar/different (aka file pairs) must be provided via...


The `import` command will use those file pairs to create a new [SQLite](https://sqlite.org/) or [PostgreSQL](https://www.postgresql.org/) database that will be used internally by the Annotation Tool. The destination database does not need to be empty, new imported file pairs can be added to previous imports.

_Note_: duplicate entries are not filtered, so running an import multiple times will result in repeated rows.

To use it, run it as:

```bash
go get github.com/src-d/code-annotation/...
cd $GOPATH/github.com/src-d/code-annotation
make serve
$ import <path-to-sqlite.db> <destination-DSN>
```

## Development
Where the `DSN` (Data Source Name) argument must be one of:

Backend:
* `sqlite:///path/to/db.db`
* `postgresql://[user[:password]@][netloc][:port][,...][/dbname]`

Some usage examples:

```bash
$ import ./input.db sqlite:///home/user/internal.db
Imported 989 file pairs successfully

$ import /home/user/input.db postgres://testing:testing@localhost:5432/input?sslmode=disable
Imported 562 file pairs successfully
```
make gorun
```

Frontend:
For a complete reference of the PostgreSQL connection string, see the [documentation for the lib/pq Go package](https://godoc.org/github.com/lib/pq#hdr-Connection_String_Parameters).

#### Set the Internal Database Connection

Before starting the application you will need to set the `DB_CONNECTION` variable in the `.env` file. It should point to the database created with the `import` command.

If you want to benifit from frontend hot reloading feature this line in your `.env` file:
This variable uses the same `DSN` string as the `import` command to point to a SQLite or PosgreSQL database.

Some examples:

```
DB_CONNECTION=sqlite:///home/user/internal.db
```

```
UI_DOMAIN=http://127.0.0.1:3000
DB_CONNECTION=postgres://testing:testing@localhost:5432/input?sslmode=disable
```

And then restart server.
### Export Annotation Results

To run frontend in dev mode:
To work with the annotation results, the internal data can be extracted into a new SQLite database using the `export` command.

```
yarn
yarn start
```bash
$ export <origin-DSN> <path-to-sqlite.db>
```

The DSN argument uses the same format as the `import` tool, see the previous section.

In this case, origin will be the internal database, and destination the new database. This new database will have the same contents as the internal one.

The annotations made by the users will be stored in the **`assignments`** table.

## Contributing

Please take a look at [CONTRIBUTING](CONTRIBUTING.md) file to see how to contribute in this project, get more information about the dashboard [architecture](CONTRIBUTING.md#Architecture) and how to launch it for [development](CONTRIBUTING.md#Development) purposes.
[Contributions](https://github.com/src-d/code-annotation/issues) are more than welcome, if you are interested please take a look to
our [Contributing Guidelines](CONTRIBUTING.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add:

If you're interested in running the application in development mode, take a look at the [development section](CONTRIBUTING.md#Development)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should change template

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template is only a guideline, not a mandatory thing to follow. I understood we can things to improve it, especially when linking to other places of our docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since links to other internal docs are welcomed, could you add it? :D


# Code of Conduct

All activities under source{d} projects are governed by the [source{d} code of conduct](CODE_OF_CONDUCT.md).

## License

Expand Down