Skip to content

Discussion: Idiomatic F# APIs #41

@cartermp

Description

@cartermp

This user experience item describes idiomatic APIs for C# and F#: https://github.com/dotnet/spark/blob/master/ROADMAP.md#user-experience-1

I think this would be a good issue to discuss what idiomatic looks like for F# in the context of spark.

Here's the (basic) sample from the .NET homepage:

// Create a Spark session
let spark =
    SparkSession.Builder()
        .AppName("word_count_sample")
        .GetOrCreate()

// Create a DataFrame
let df = spark.Read().Text("input.txt")

let words = df.Select(Split(df.["value"], " ").Alias("words")

words.Select(Explode(words["words"]).Alias("word"))
     .GroupBy("word")
     .Count()

Although this certainly isn't bad, a more idiomatic API could look something like this:

// Create a Spark session
let spark =
    SparkSession.initiate()
    |> SparkSession.appName "word_count_sample"
    |> SparkSesstion.getOrCreate

// Create a DataFrame
let df = spark |> Spark.readText "input.txt"

let words = df |> DataFrame.map (Split(df.["value"], " ").Alias("words"))

words
|> DataFrame.map (Explode(words["words"]).Alias("word"))
|> DataFrame.groupBy "word"
|> DataFrame.count

The above is just a starting point for a conversation. It would assume a module of combinators for data frames (and potentially other collection-like structures). Although this wouldn't be difficult to implement or maintain - it would be proportional to maintaining the one-liners in the C# LINQ-style implementation - I wonder what else could be done to make it feel more natural for F#, and what the best bang for our buck here is.

In other words, I'd love to solicit feedback on the kinds of things that matter most to F# developers interested in using Spark, so that it's possible to stack these up relative to their implementation and maintenance costs.

Also including @isaacabraham, as he tends to be a lot more creative than I am when it comes to these things 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions