The first issue I can see with that code is it's not doing what he expects. He does this to read the file into a StringBuffer:
bf.lines().forEach(s -> sb.append(s));
However, this ends up reading all the lines into one giant line, since the String's that lines() produces have the newline character stripped. This leads to the second lines() call to read a 23MB line (the file produced by gen.py). This is less than optimal.The fastest version I managed to write was:
public void readString5(String data) throws IOException {
int lastIdx = 0;
for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx))
{
parseLine(data.substring(lastIdx, idx));
lastIdx = idx+1;
}
parseLine(data.substring(lastIdx));
}
Not the prettiest thing, but it went from 0.594 GB/s to 1.047 GB/s. Also, it doesn't quite do the same as the lines() method, but that's easily changed.I'm confused.
Where is the second lines call?
Lines 19 and 27 in the source: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...