Chris Lott's Weblog


Tue, 02 Feb 2010

PlumbPak: Just Say No
I've been using flexible hoses covered in stainless-steel mesh to connect faucets to the water-supply pipes for some time now. They're far easier to install than the old solid-copper lines. I am not good at bending solid copper (they kink easily) and I had extreme trouble getting them to seat properly. I figured the new ones were an DIY guy's best friend .. until yesterday.

Look at that hole! Not only did the flexible line blow out, the mesh completely failed to contain the bubble. I left the tag on it at the time out of sheer laziness and now I'm glad I did because now I know what brand to avoid. Company's response? We'll be glad to send you a new one. Gotta love those cheap chinese imports. Thank goodness the kids were home and turned off the water quickly.

posted at: 08:52 | path: | permanent link to this entry

Sun, 10 Jan 2010

Space verus Time in Java's ObjectOutputStream

This past week I helped my colleague D. B. investigate a memory-use issue in a component he wrote. The component reads and sorts records (basically database rows). Because the number of rows can be large, it uses a disk-based mergesort once the count exceeds a user-defined threshold in order to keep memory use in check. Under the covers it uses Java's ObjectOutputStream to write each chunk of sorted input and ObjectInputStream to read those chunks while merging. Well, despite our best intention and design, the sorter component was using far more memory than we expected, resulting in the dreaded "java.lang.OutOfMemoryError: Java heap space" exception. As a first step I installed TPTP trying to localize the problem. It showed me tens of thousands of live char[] instances in use but refused to report for me the origin of those instances, so that tool wasn't much help. Finally he hit upon the answer, namely that ObjectOutputStream was not releasing references. We thought that using writeUnshared() would be enough but it was not. The javadoc hints at this behavior, but I guess does not state it quite as clearly as we would like.

The solution is to call the class's reset() method at suitable intervals. We experimented and found some significant performance differences based on how often we call that method. So I wrote a little code (see below) and gathered some numbers. The tables below show results from processing an input CSV file of 544,222 lines and size 86Mb, on a 2Ghz dual-core laptop running WinXP on an ordinary disk drive. All tests launched from Eclipse in debug mode with a Java breakpoint set on the last executable line. The interval column shows how many records were written between each call to the object output stream's reset() method. The program reports its time in milliseconds. Memory numbers were gathered from the windoze taskmgr "Mem Usage" column when the breakpoint was reached.

Using writeObject() Using writeUnshared()
IntervalTime/MSMemory/KB IntervalTime/MSMemory/KB
1 22,781 10,460 1 23,093 10,464
5 18,579 10,468 5 18,531 10,480
10 17,875 10,496 10 17,907 10,540
50 17,484 10,524 50 17,375 10,516
100 17,453 11,128 100 17,500 10,712
1,00019,781 14,692 1,00018,782 14,540
10,00020,32921,368 10,00019,84421,812

Calling reset at every write yields minimal memory use but it's huge overkill that significantly increases the run time. On the other end, wait too long to call reset and the memory use blows up the JVM, and the run time starts increases also (which I really didn't expect).

A person on the Sun forum (see the link in the code) pointed that this undesired behavior pops up out when the object written out has references within it. In my first attempt at this assessment I simply wrote out String objects read from the CSV file. But in that scenario, writeUnshared works perfectly, does what I expect, and the program uses very little memory. The way to reproduce the excessive memory use problem in this little example is to split the lines and write out the resulting array of string. Then the numbers showed no difference between the writeObject() and writeUnshared() methods.

The conclusion: calling reset at some interval is essential, and there's a sweet spot at about every 100th write where the space/time tradeoff is reasonable. I didn't investigate whether the sweet spot is really at 50 or 150; here's the code if you want to go after that one.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.ObjectOutputStream;
import java.util.Date;

/**
 * Program to measure the space/time tradeoff in performance
 * when using {@link ObjectOutputStream#writeObject(Object)}
 * and {@link ObjectOutputStream#writeUnshared(Object)}.
 * 
 * Also see:
 * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4332184
 * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4363937
 * http://forums.sun.com/thread.jspa?threadID=5419725
 * 
 * @author Chris Lott, 10 January 2009
 */
public class SpaceVsTime {

	public static void main(String[] args) {
		
		if (args.length != 3) {
			System.err.println("Usage: SpaceVsTime csv-file object-file reset-interval");
			return;
		}

		File csvFile = new File(args[0]);
		File objFile = new File(args[1]);
		int recInterval = Integer.parseInt(args[2]);
		
		if (! csvFile.exists()) { 
			System.err.println("Not found or not a file: " + csvFile.getPath());
			return;
		}
		
		try {
			Date start = new Date();
			System.out.println(start + ": started with interval " + recInterval);

			int count = 0;
			BufferedReader bufRdr = new BufferedReader(
					new FileReader(csvFile));
			ObjectOutputStream oos = new ObjectOutputStream(
					new FileOutputStream(objFile));
			String line;
			while( (line = bufRdr.readLine()) != null) {
				String [] cols = line.split(",");
				oos.writeUnshared(cols);
				// oos.writeObject(cols);
				++count;
				if (count % recInterval == 0)
					oos.reset();
			}
			bufRdr.close();
			oos.close();			
			
			Date finish = new Date();
			System.out.println(finish + ": finished for " + count + " records");
 			System.out.println("Execution time (ms): " + (finish.getTime() - start.getTime()));
 			System.out.println();
		}
		catch (Exception ex) {
			ex.printStackTrace();
		}
	}

}


posted at: 13:00 | path: | permanent link to this entry

Wed, 23 Dec 2009

10 Things I Hate About Yelp

Sure, the title is a blatant rip-off of the 1999 movie starring Julia Stiles, and no, I don't actually have 10 things to hate. But I do want to vent a little about Yelp's lousy communication with reviewers. Yelp is totally dependent on this user-generated content, so you'd think they'd make heroic efforts to communicate back to reviewers. In my experience this is not the case.

To explain where I'm coming from, recently I started to write reviews of little places in my hometown that I like. I looked at Yahoo local, but Yahoo just doesn't seem to have much traction, as measured by the lack of pages for many businesses in my town, and the relative dearth of reviews on the few pages that do exist. So over to Yelp, where even my local tiny coffee shop already was listed and had some comments. Fine, I signed up for an account and started generating content.

The first thing I hate is that Yelp does not notify me when a business listing that I submit is approved (not a review, a basic listing). I added an entry for a new cafe that was not yet listed, including all the gory details like the address, phone, website, hours, you name it. Eventually the man behind the curtain at Yelp approved the listing (no email notification tho), after which someone else wrote the first review. Yelp makes a small/big deal about the first review, putting a little graphic #1 on that one, so how about giving me a fair shot at that and notify me?

The second thing I hate is Yelp's utter silence about a decision to hide or expose a review. One of my reviews was shown for a while, long enough to garner a couple of useful/cool votes, but at some point the review vanished. Someone (or some mindless bot?) decided my review should no longer be shown to the general public, but again no email or other heads-up from Yelp went out to me when my review went into the bit bucket. Sure I understand they need to screen out paid reviews, spam reviews, and all manner of garbage but over at Craigslist if one of my posts gets flagged and nuked I see that action in my account. The total lack of Yelp communication isn't just a missing notification, the website is kinda weird about a defrocked review too. For this silly little review, if I'm signed in to Yelp, my review *is shown* in the business listing just fine, as if it were still posted for the world. There is no little red X or other indicator, the page just shows my review mixed in among the others. But if I'm *not* signed in, then the review is *not* shown. I don't know what their original intent of this policy might be, but it just feels like they're trying to put one over on me ("Sure we still love you, really we do."). If I hadn't visited from a different computer I would never have noticed. Gimme a break Yelp, just tell me that you don't like my review so I can delete it or fix it, don't blow sunshine up my (ahem).

OK I feel a little better, and that's enough ranting for today.



posted at: 10:42 | path: | permanent link to this entry

Wed, 16 Dec 2009

Test entry for Blosxom.
This is a little file to test the blogging software.

posted at: 10:42 | path: | permanent link to this entry


http://chris-lott.org/contact.html