Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] fairness scores should work on non-binary indicators #264

Open
MBrouns opened this issue Jan 15, 2020 · 3 comments
Open

[FEATURE] fairness scores should work on non-binary indicators #264

MBrouns opened this issue Jan 15, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@MBrouns
Copy link
Collaborator

MBrouns commented Jan 15, 2020

ValueError: equal_opportunity_score only supports binary indicator columns for column. Found values ['Black' 'White']

@MBrouns MBrouns added the enhancement New feature or request label Jan 15, 2020
@tbezemer
Copy link

tbezemer commented Sep 25, 2020

I'm currently having a go at this. I have an implementation that passes the tests of equal_opportunity_score.
I'm just not sure if my implementation of the algorithm is a valid extrapolation to the non-binary case (since I kind of based it on analysing the code alone - I need to dive into the paper).

Thoughts @MBrouns ?

Mainly the line score = ... is the part I'm unsure about.

def equal_opportunity_score(sensitive_column, positive_target=1):
    r"""
    The equality opportunity score calculates the ratio between the probability of a **true positive** outcome
    given the sensitive attribute (column) being true and the same probability given the
    sensitive attribute being false.

    .. math::
        \min \left(\frac{P(\hat{y}=1 | z=1, y=1)}{P(\hat{y}=1 | z=0, y=1)},
        \frac{P(\hat{y}=1 | z=0, y=1)}{P(\hat{y}=1 | z=1, y=1)}\right)

    This is especially useful to use in situations where "fairness" is a theme.

    Usage:
    `equal_opportunity_score('gender')(clf, X, y)`

    Source:
    - M. Hardt, E. Price and N. Srebro (2016), Equality of Opportunity in Supervised Learning

    :param sensitive_column:
        Name of the column containing the binary sensitive attribute (when X is a dataframe)
        or the index of the column (when X is a numpy array).
    :param positive_target: The name of the class which is associated with a positive outcome
    :return: a function (clf, X, y_true) -> float that calculates the equal opportunity score for z = column
    """

    def impl(estimator, X, y_true):
        """Remember: X is the thing going *in* to your pipeline."""
        sensitive_col = (
            X[:, sensitive_column] if isinstance(X, np.ndarray) else X[sensitive_column]
        )

        y_hat = estimator.predict(X)

        p_ys_zs = []

        for subgroup in np.unique(sensitive_col):
            y_given_zi_yi = y_hat[(sensitive_col == subgroup) & (y_true == positive_target)]

            # If we never predict a positive target for one of the subgroups, the model is by definition not
            # fair so we return 0
            if len(y_given_zi_yi) == 0:
                warnings.warn(
                    f"No samples with y_hat == {positive_target} for {sensitive_column} == 1, returning 0",
                    RuntimeWarning,
                )
                return 0

            p_ys_zs.append(np.mean(y_given_zi_yi == positive_target))

        #Getting the min of all pair-wise divisions is the same as getting the min of each mirror pair?
        score = np.minimum(*[ pair[0] / pair[1] for pair in it.permutations(p_ys_zs, 2)])

        return score if not np.isnan(score) else 1.

    return impl

@MBrouns
Copy link
Collaborator Author

MBrouns commented Sep 25, 2020

I think it makes sense in the way you've currently written it, although it would be nice to see a few test cases to show the impact and behaviour

@tbezemer
Copy link

Thanks, good to know the logic works! I'll start working on some tests for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants