January 15, 2005

My Java Killer Feature

While I love parameterized types and the enhanced for loop, the one feature of Java 5 that I most adore is the old regionMatches() [1, 2] family of functions on java.lang.String. Here are a couple of sample use-cases that previously required the creation of new Strings or tedious loops using charAt():

Case-folding startsWith()

If you’ve ever written code like

    if (foo.toLowerCase().startsWith(bar.toLowerCase())) ...

and winced, because you were creating and throwing away two Strings, then rejoice! Now you may

    if (foo.regionMatches(true, 0, bar, 0, bar.length())) ...

Comparison of arbitrary string regions

Before:

    if (foo.equals(bar.substring(m))) ...

After:

    if (foo.regionMatches(0, bar, m, bar.length() - m)) ...

Look through your own code for opportunities to use these efficient new functions.

Update - Jan 16, 2005

In response to the skeptical inquiries below, here’s some benchmark code. In my opinion, the results support my idea that regionMatches() is a win for efficiency in the context of time-critical applications. For what it’s worth, my own recent use of regionMatches() is in a service where one fifth of a millisecond is the average response time, and where frequent garbage collection would be disruptive.

Posted by MrFeinberg at January 15, 2005 12:02 PM
Comments

You've got from a self-explanatory line of code (easy to follow, modify, maintain) to one far less so. I'm not saying it's not useful, just maybe not a good idea, unless required.

Posted by: rob at January 15, 2005 08:47 PM

+1 Rob,

I wonder if it performs any better or even if the performance gain would be worth it if it was there? It certianly does beg for a comment!

Posted by: Kirk at January 16, 2005 05:07 AM

I did my own benchmark based off of yours just to see how big the win could be. Interestingly enough, the regionmatches technique doesn't create ANY garbage. Of course the substring method creates a lot of garbage. So much so that I could not prevent garbage collection from running while the metered portion of the code was running.

So, I'd say that you've found a worthwhile win in performance even if readability does suffer with the new method signature. I've attached some results

-Xmx512 -Xms512
GC called between timings

Region
Average: 289.0
Variance: 85.6
Standard Deviation: 9.2
Median: 290
Maximum: 330
Minimum: 280
----------------
Old String
Average: 413.0
Variance: 65.9
Standard Deviation: 8.1
Median: 411
Maximum: 431
Minimum: 400
----------------

GC run between tests.
Region
Average: 286.0
Variance: 123.8899
Standard Deviation: 11.130583991866734
Median: 281
Maximum: 321
Minimum: 270
----------------
Old String
Average: 411.0
Variance: 59.7
Standard Deviation: 7.7
Median: 411
Maximum: 431
Minimum: 400
----------------

Posted by: Kirk at January 16, 2005 05:44 PM

While I see useful application for regionMatches the Examples above rather show that startsWith() lacks a startsWithIgnoreCase() or are startsWith(boolean ignoreCase, ..) counterpart! Speaking of performance I remember that tedious loops with charAt() where quite fast in many cases.

Posted by: ini at January 17, 2005 03:48 AM

It doesn't seem hard to read to me... it's just like System.arraycopy. And if you're comparing substrings that don't start at zero, it's far better.

Posted by: Martin at January 17, 2005 12:39 PM

Just so you are all aware, Java 1.4.2 also has this, and it may be as far back (I have not verified) as Java 1.4.0.

Still GREAT information though.

Posted by: Jeffrey A. Krzysztow at January 19, 2005 05:23 PM