January 15, 2005
My Java Killer Feature
While I love parameterized types and the enhanced for loop, the one feature of Java 5 that I most adore is the old regionMatches()
[1, 2] family of functions on java.lang.String
. Here are a couple of sample use-cases that previously required the creation of new String
s or tedious loops using charAt()
:
Case-folding startsWith()
If you’ve ever written code like
if (foo.toLowerCase().startsWith(bar.toLowerCase())) ...
and winced, because you were creating and throwing away two String
s, then rejoice! Now you may
if (foo.regionMatches(true, 0, bar, 0, bar.length())) ...
Comparison of arbitrary string regions
Before:
if (foo.equals(bar.substring(m))) ...
After:
if (foo.regionMatches(0, bar, m, bar.length() - m)) ...
Look through your own code for opportunities to use these efficient new functions.
Update - Jan 16, 2005
In response to the skeptical inquiries below, here’s some benchmark code. In my opinion, the results support my idea that regionMatches()
is a win for efficiency in the context of time-critical applications. For what it’s worth, my own recent use of regionMatches()
is in a service where one fifth of a millisecond is the average response time, and where frequent garbage collection would be disruptive.
You've got from a self-explanatory line of code (easy to follow, modify, maintain) to one far less so. I'm not saying it's not useful, just maybe not a good idea, unless required.
Posted by: rob at January 15, 2005 08:47 PM+1 Rob,
I wonder if it performs any better or even if the performance gain would be worth it if it was there? It certianly does beg for a comment!
Posted by: Kirk at January 16, 2005 05:07 AMI did my own benchmark based off of yours just to see how big the win could be. Interestingly enough, the regionmatches technique doesn't create ANY garbage. Of course the substring method creates a lot of garbage. So much so that I could not prevent garbage collection from running while the metered portion of the code was running.
So, I'd say that you've found a worthwhile win in performance even if readability does suffer with the new method signature. I've attached some results
-Xmx512 -Xms512
GC called between timings
Region
Average: 289.0
Variance: 85.6
Standard Deviation: 9.2
Median: 290
Maximum: 330
Minimum: 280
----------------
Old String
Average: 413.0
Variance: 65.9
Standard Deviation: 8.1
Median: 411
Maximum: 431
Minimum: 400
----------------
GC run between tests.
Region
Average: 286.0
Variance: 123.8899
Standard Deviation: 11.130583991866734
Median: 281
Maximum: 321
Minimum: 270
----------------
Old String
Average: 411.0
Variance: 59.7
Standard Deviation: 7.7
Median: 411
Maximum: 431
Minimum: 400
----------------
While I see useful application for regionMatches the Examples above rather show that startsWith() lacks a startsWithIgnoreCase() or are startsWith(boolean ignoreCase, ..) counterpart! Speaking of performance I remember that tedious loops with charAt() where quite fast in many cases.
Posted by: ini at January 17, 2005 03:48 AMIt doesn't seem hard to read to me... it's just like System.arraycopy. And if you're comparing substrings that don't start at zero, it's far better.
Posted by: Martin at January 17, 2005 12:39 PMJust so you are all aware, Java 1.4.2 also has this, and it may be as far back (I have not verified) as Java 1.4.0.
Still GREAT information though.
Posted by: Jeffrey A. Krzysztow at January 19, 2005 05:23 PM