Monday, March 10, 2008

Errors, Interpretations, and Restatement

Well guys, I have an embarrassing confession. Due to a coding error, all of my market stress indicators have been understated since the beginning of August 2007. Here is what the Flipper Market Share, Troubled Inventory, and Sellers In Trouble Market Share really look like:




There were actually three different, distinct issues with the data. One was an issue with the overly strict way I was interpreting the data set, one was an artifact of the process I used to recalculate the statistics, and one was a simple coding mistake. In the interest of due diligence, and in hope of retaining your trust, I would like to discuss each issue openly. I'll start with the coding error.

$year_in_seconds = 31536000


When my system determines whether a house listing is a flipper or not, it compares the listing date to the last time the property was sold. If the listing took place within two years of the last sale, my system flags it as a flipper. In addition, a listing will carried over as a flipper if it was a flipper the week before. My mistake was a failure to set the comparison variable for the length of a year in seconds, so the system didn't look back far enough in the past sales database.

This error probably happened during a system overhaul I performed last summer, and the reductions in flipper market share happened so gradually (due to carry over) that I didn't attribute it to a coding mistake. It turns out the only flipper stat I've been reporting since August 8, 2007 is the decay rate of the flipper listings since then.

SELECT price WHERE address LIKE '$address%'

All of these issues came to light this Saturday when I was researching a different data source. I've been puzzled by a recent disconnect in inventory levels between MetroMLS and my source, so I decided to try a third source just to make sure. The inventory levels between my two sources matched OK, but I got a huge surprise when I noticed a spike in SIT levels.

It turns out I have been using overly strict criteria when comparing listing addresses to sales addresses in my database for SIT/FIT determination. As you can imagine, there are lots of ways to format an address. Unfortunately, the two sets of data I've been comparing come from disparate sources, and the address suffixes are not always included in one set. When I first set up the system I made a conscious decision to use a strict comparison structure because I was afraid of creating false positives (e.g. "123 Fake St" and "123 Fake Ave" would both compare positively against "123 Fake").

As luck would have it, the third data source was formatted almost exactly like my sales database, so it gave much higher SIT and FIT inventory levels. This led me to revisit the comparison issue and recalculate the statistics for my entire data set using a friendlier comparison operator.

Something Old, Something New

Lastly, an artifact of the recalculation process led to a slight increase in stress indicator levels. When SIT/FIT levels are calculated, it is done the week the listing data is acquired. Unfortunately, my sales data source is often a month behind (and sometimes two or three). As a result, any sale that happened within that lag period won't get picked up by the system at the time. However, when I reran the calculations, the data was there and the comparisons were made. Due to this, there was a very small increase in SIT/FIT inventory.

Summary

Here is a chart I prepared summarizing these issues, and their effect on the data set:

Year Variable ErrorOverly Strict InterpretationRecalc Increase
% of Dataset Effected
40%
100%
100%
Magnitude of change on effected data
0-200% Increase
0-100% Increase
0-5% Increase


One good thing is these issues had absolutely no effect on the other stats. Inventory, price, and price level inventory are unchanged. However, the market is clearly under much more stress than I previously indicated. I will have more on the issue later in the week.

I would also like to thank everyone for bearing with me as I deal with these issues. I am a scientist by trade, and I firmly believe that the most important part of the experimental process is the airing of mistakes. My goal with this blog is to try and tell the housing bubble story by interpreting data in novel ways, and that story can't be told at all if it's not told accurately.

Even at the expense of reputation.

20 comments :

Anonymous said...

Okay on the errors, I guess. These charts changed dramatically and I guess I cannot trust what I have seen (or see). I also note that the total market number of properties is less than the sum of the flippers, rest of market, etc. This seems to be a big problem, too. I am glad you notified and have been candid but I cannot use this information for decision making at this time.
Richard

Buying Time said...

Max, you are an innovator. Continually producing new metrics as the market evolves. You get huge kudos in my book for that.

The fact that you continually check your data to make sure it makes sense, also tells me you are a good analyst.

Full disclosure only increases you credibility. Keep up the good work.

And wow...that FIT graphic is intense!

Max said...

I also note that the total market number of properties is less than the sum of the flippers, rest of market, etc.

This is true, because FITs are a subset of SITs, which is an explanation I've made in the past.

I would hope that anyone using this data for business decisions will do their own research and reach their own conclusions. That is one reason I try not to make any forward-looking statements in my posts.

smf said...

I am not surprised at all. As I mentioned in other blogs before, if I am looking at the high end correctly, I would say at least 80% of the houses are speculative.

Gwynster said...

Max, I have a question for you.

I noticed that there was an abrupt stop in new REOs coming on the market from Dec to now. Just the same old ones going on and off the pending list. Now we know that the foreclosure activity is increasing but where are they?

Also, were seeing more and more lay offs locally, were all those people renting or have they just not processed those REOs yet?

I have that _hair raising at the nape of my neck_ kinda feeling and am tryng to put my finger on it.

Max said...

Also, were seeing more and more lay offs locally, were all those people renting or have they just not processed those REOs yet?

I still maintain that REO inventory on bank portfolios is increasing faster than they can dispose of it. This could be due to overwhelmed REO departments, or the banks wanting to preserve asset valuations by not marking-to-market with a sale listing.

There might also be push-back from Realtors not wanting to scare away potential buyers by not flagging distressed listings as such. Agent Bubble can fill you in on how hard it is dealing with REO/short-sale departments these days. Many buyers just don't want the hassle.

Max said...

I have that _hair raising at the nape of my neck_ kinda feeling and am tryng to put my finger on it.

I do too. This data is so much worse than I thought it was. 50% of all listings losing money?!

BTW, there's buzz starting to build about market activity picking up around town. What's the word in Davis?

smf said...

Gwyn,

That hair raising is what I now call the 'Katrina Effect'.

If I recall correctly, when the hurricane passed, New Orleans thought that they were out of danger and breathed a sigh of relief. It was much later that the first levee broke.

Right now it seems as if after the first credit crunch, the worst of the storm has passed. Even the loan limits have been increased.

However, the levees have yet to break, and we know that it is inevitable.

You still have 50% of the houses sold (info from realtors) STILL going to investors, the high end is overbuilt to the max (and they still have not all felt the pain), and the economy still continuing its downward spiral.

As I told my wife, even the higher loan limits CANNOT help. If those who want them have to qualify fair and straight...well...most won't.

Darth Toll said...

Richard, in Max's defense, he's been saying for quite a while now that something wasn't adding up in the data and he had good reasons for saying that. So my guess is that once Max is happy with the data and there aren't unexplainable anomalies from his vantage point, then you should be able to trust the data. Personally, I'm OK with the data before and now - it paints a pretty bleak picture either way.

"BTW, there's buzz starting to build about market activity picking up around town. What's the word in Davis?"

Max, hard to fathom that with the credit markets imploding. Could be a lot of lookie-loos but they better bring a big wad of cash to the close or "no sale!"

Patient Renter said...
This comment has been removed by the author.
Patient Renter said...

Blogger chopped my comment in half :(

If matching on the address is a real weak point, I'd definately run it through a normalization algorithm. A common one is to strip all punctuation, capitalize everything, identify and standardize any street types (st., blvd., cir., etc.) and finally strip all the spaces and the city name and tack on the zip code at the end.

Gwynster said...

Davis,

See the thread below this. The schools are finally admitting to big-time distress. We're over 160 listing now (it was 110 over the winter) and lots of them are the large homes.

Also, the Cannery project is coming up again and since Davis needs to attract families badly to offset the school numbers, this fight is going to be brutal.

I track the homes in central Davis and it's not unusual to see a 100k reduction from last year. House down the street from me listed at 695k, sold for 635k with a large credit to the buyers for downs and closing.

The only thing that keeps me remotely looking at Davis is the cost of gas.

Sippn said...

Max, no data is perfect ... at least you work on and review yours.

Think about Zillow.... arrogant, minimal real estate experience, 50 states, thousands of municipalities and organizations to deal with and they have the gall to sell their stuff!

They probably have data reported to them 100 different ways, and I bet they just use 2 in each market that look similar and discard the rest.... at best.... maybe I'm optimistic.

Husmanen said...

Max, to be able to identify an anomalies and outliers in data is a unique skill.

Even more distinctive is the ability to find the cause and rectify it.

Analysts of this caliber are few and far between. Great work and looking forward to more in the future, thanks!

Richard said...

Max,

"$year_in_seconds = 31536000"

professional curiosity, perl?

Anonymous said...

This is a hugh "discovery".

Just last week I was talking to my wife (she does not have the time to look at the web as much as I do so I give her a weekly brief on real estate and the economy) and told her that one thing does not make sense to me: That according to a web site I read every week the number of flippers in trouble is going down.

We were thinking: Where did they all go.

We could not find an answer.

I now think that recovery in R/E has been pushed back few years.

btw another good source is :

http://www.lasvegasrealtor.com/stats/statindex.htm

Ed said...

Max,

Errors and restatements are a fact of life in any serious analysis. Especially when your datasets are fuzzy, as I believe some of this information you process must be.

I congratulate you on: producing stats that are unique, providing transparency into your mistake and doing so much work to make it right.

Keep it up - don;t let the nay-sayers change your approach or style.

Thanks!
Ed

Max said...

Thanks for all the kind words. I've always believed that in order for data to meaningful, it has to be understandable, and you can't understand something unless there's complete transparency as to how the data is derived. Sometimes that means you take some heat.

If I just reported what I wanted, I would be no better than the NAR. :)

Max said...

Oh, and I'm a php hack. Pearl is too much like a real programming language for me to use. :)

David said...

Max, even when there is an error, your data is much more accurate, complete, timely and helpful than anything I've ever received from realtors or the Sacramento Bee. Thanks for providing this free service to all of us. Keep up the great work!