Documenting Scala functional chains [closed]_问答_开发者

Closed. This question is opinion-based. It is not currently accepting answers.

开发者_运维百科

Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.

Closed 5 years ago.

Improve this question

Scala (and functional programming, in general), advocates a style of programming where you produce functional "chains" of the form

collection.operation1(...).operation2(...)...

where the operations are various combinations of map, filter, etc.

Where the equivalent Java code might require 50 lines, the Scala code can be done in 1 or 2 lines. The functional chain can change an input collection to something completely different.

The disadvantage of the Scala code is that 10 minutes later (never mind 6 months later), I can't figure out what I was thinking, because the notation is so compact, and lacks type information (because of implied types).

How do you document this? Do you put a large block comment before the chain, changing an elegant 1 line solution into a bulky 40 line solution consisting of 39 lines of comment? Do you intersperse your comments like this?

collection.
  // Select the items that meet condition X
  filter(predicate_function).
  // Change these items from A's to B's
  map(transformation_function).
  // etc.

Something else? No documentation? (Leave them guessing. They'll never "downsize" you then, because no one else can maintain the code. :-))

If you find yourself writing comments at that detail level, you're just repeating what the code says.

For long functional chains, define new functions to replace parts of the chain. Give these meaningful names. Then you might be able to avoid comments. The names of these functions themselves should explain what they do.

The best comments are the ones that explain why the code does something. Well-written code should make the "how" obvious from the code itself.

I don't write that code to begin with (unless it's a script for one-time use or playing around in the REPL).

If I can explain what the code does in one comment and the reads okay, then I keep it as a one liner:

// Find all real-valued square roots and group them in integer bins
ds.filter(_ >= 0).map(math.sqrt).groupBy(_.toInt).map(_._2)

If I can't understand this by reading carefully through the chain of commands, then I should break it up more into functionally distinct units. For example, if I expected someone to not realize that the square root of a negative number is not real-valued, I would say:

// Only non-negative numbers have a real-valued square root
val nonneg = ds.filter(_ >= 0)
// Find square roots and group them in integer bins
nonneg.map(math.sqrt).groupBy(_.toInt).map(_._2)

In particular, if someone doesn't know the Scala collections library well, and doesn't have the patience to spend five to ten minutes understanding one line of code, then either they shouldn't be working on my code (nor on anything else that accomplishes something nontrivial that they don't understand and don't have the patience to understand), or I should know in advance that I'm providing an e.g. language and mathematics tutorial in addition to writing working code, either by writing a paragraph explaining how the following line works, or breaking it out command by command, or including comments at the start of each anonymous function explaining what is going on (as appropriate).

Anyway, if you can't understand what it does, you probably need some intermediate values. They are very helpful for mental-resetting ("I can't see how to get from A to C!...but...okay, I can understand A to B. And I can understand B to C.")

If your chained operations are all monadic transforms: map, flatMap, filter, then it's often much, much clearer to rewrite the logic as a for-comprehension.

coll.filter(predicate).map(transform)

could become

for(elem <- coll if predicate) yield transform(elem)

it's even easier to show off the power of the technique if you have a longer sequence of operations, such as with Kassen's example:

def eligibleCustomers(products: Seq[Product]) = for {
  product <- products
  customer <- product.customers
  paying <- customer if customer.isPremium
  eligible <- paying if paying.age < 20
} yield eligible

If you don't want to split it in multiple methods as hammar suggested you can split the line and give the intermediate values names (and optionally types).

def eligibleCustomers: List[Customer] = {
  val customers = products.flatMap(_.customers)
  val paying = customers.filter(_.isPremium)
  val eligible = paying.filter(_.age < 20)
  eligible
}

The linelength is a somehow natural indicator, when your chain is getting too long. :)

Of course, it will depend upon how trivial the chain is:

customerdata.filter (_.age < 40).filter (_.city == "Rio").
             filter (_.income > 3000).filter (_.joined < 2005) 
             filter (_.sex == 'f'). ...

I recently had your impression, where an application of 3 files, one of them a bit lengthy, consisting of 4 classes, one of them not trivial, and of about 10 to 20 methods. Each method was about 5 to 10 lines, and each 2 of them could have been easily combined to a lager one, but I had to convince myself, that although measuring the elegance in spared lines of codes isn't completely wrong, sparing lines isn't the goal itself.

But splitting a method into two often makes complexity per line lower, but not the overall complexity, to understand the whole program.

If the problem domain is complex - filter data at different levels, rowwise, columnwise, map it, group it, build averages, build graphs, paginate them ... - the complicated job has to be done somewhere.

The program isn't more easy to understand, you just have to hit page down less often. It is a readjustment, that you have to read a line of code more slowly.

It doesn't bother me that much now I'm used to Scala. If you want to be more explicit with types, you can always, for example, replace things like map(_.foo) with map { a:A => a.foo } to make the code more readable in lengthy/complex operations. Not that I usually find the need to do that.