Scala performance insights — from StackOverflow

Scala makes it very easy to use enormous amounts of memory without realizing it. This is usually very powerful, but occasionally can be annoying. For example, suppose you have an array of strings (called array), and a map from those strings to files (called mapping). Suppose you want to get all files that are in the map and come from strings of length greater than two. In Java, you might

int n = 0;
for (String s: array) {
  if (s.length > 2 && mapping.containsKey(s)) n++;
}
String[] bigEnough = new String[n];
n = 0;
for (String s: array) {
  if (s.length <= 2) continue;
  bigEnough[n++] = map.get(s);
}

Whew! Hard work. In Scala, the most compact way to do the same thing is:

val bigEnough = array.filter(_.length > 2).flatMap(mapping.get)

Easy! But, unless you’re fairly familiar with how the collections work, what you might not realize is that this way of doing this created an extra intermediate array (with filter), and an extra object for every element of the array (with mapping.get, which returns an option). It also creates two function objects (one for the filter and one for the flatMap), though that is rarely a major issue since function objects are small.

So basically, the memory usage is, at a primitive level, the same. But Scala’s libraries have many powerful methods that let you create enormous numbers of (usually short-lived) objects very easily. The garbage collector is usually pretty good with that kind of garbage, but if you go in completely oblivious to what memory is being used, you’ll probably run into trouble sooner in Scala than Java.

Note that the Computer Languages Benchmark Game Scala code is written in a rather Java-like style in order to get Java-like performance, and thus has Java-like memory usage. You can do this in Scala: if you write your code to look like high-performance Java code, it will be high-performance Scala code. (You may be able to write it in a more idiomatic Scala style and still get good performance, but it depends on the specifics.)

I should add that per amount of time spent programming, my Scala code is usually faster than my Java code since in Scala I can get the tedious not-performance-critical parts done with less effort, and spend more of my attention optimizing the algorithms and code for the performance-critical parts.

Leave a comment