-
Notifications
You must be signed in to change notification settings - Fork 331
Description
This user experience item describes idiomatic APIs for C# and F#: https://github.com/dotnet/spark/blob/master/ROADMAP.md#user-experience-1
I think this would be a good issue to discuss what idiomatic looks like for F# in the context of spark.
Here's the (basic) sample from the .NET homepage:
// Create a Spark session
let spark =
SparkSession.Builder()
.AppName("word_count_sample")
.GetOrCreate()
// Create a DataFrame
let df = spark.Read().Text("input.txt")
let words = df.Select(Split(df.["value"], " ").Alias("words")
words.Select(Explode(words["words"]).Alias("word"))
.GroupBy("word")
.Count()Although this certainly isn't bad, a more idiomatic API could look something like this:
// Create a Spark session
let spark =
SparkSession.initiate()
|> SparkSession.appName "word_count_sample"
|> SparkSesstion.getOrCreate
// Create a DataFrame
let df = spark |> Spark.readText "input.txt"
let words = df |> DataFrame.map (Split(df.["value"], " ").Alias("words"))
words
|> DataFrame.map (Explode(words["words"]).Alias("word"))
|> DataFrame.groupBy "word"
|> DataFrame.countThe above is just a starting point for a conversation. It would assume a module of combinators for data frames (and potentially other collection-like structures). Although this wouldn't be difficult to implement or maintain - it would be proportional to maintaining the one-liners in the C# LINQ-style implementation - I wonder what else could be done to make it feel more natural for F#, and what the best bang for our buck here is.
In other words, I'd love to solicit feedback on the kinds of things that matter most to F# developers interested in using Spark, so that it's possible to stack these up relative to their implementation and maintenance costs.
Also including @isaacabraham, as he tends to be a lot more creative than I am when it comes to these things 😄