Archive for Computers

Re: Cuteness

This is a response to chandrasekhar’s post, Cuteness.

There is plenty of material to criticize in Noam Scheiber’s recent TNR article, but one aspect of it annoyed me in particular. Scheiber claims that economics had a mid-life crisis in the 1980’s, and in response economists started thinking in an entirely different way…

By the ’80s, however, the data-crunchers had come down with a crisis f confidence. In one famous episode, the eminent economist H. Gregg Lewis reviewed several studies on unions. What he found was alarming: some papers reported that unions strongly increased wages; others reported exactly the opposite. The difference, in most cases, was simply the assumptions the authors had made.

Critiques like this tipped the discipline into a prolonged bout of soul-searching. The old approach had been sweeping in its ambition. But what good were ambitious goals if the best you could do was “on the one hand/on the other hand”style equivocation or, worse, plain jibberish? “People didn’t believe the estimates being produced,” recalls David Card, then a rising star at Princeton. “They felt the evidence in economics was not very credible.” Economists had long aspired to science. Suddenly they faced a harrowing thought: What if they were no better at pinning down truth than the average critical
studies major?

Yes, this all happened in the 1980’s. In the article, the time period has no relation to the change in methodology — it was the general sense of disappointment in the old ways that made people start looking for “clean ID”:

Having glimpsed this nihilistic vision, many economists ran screaming in the opposite direction. They concluded that the path to knowledge lay in solid answers to modest questions. Henceforth, the emphasis would be on “clean identification,” on sorting out what caused what.

The early practitioners of this approach–Angrist, Krueger, Card–had well-earned reputations as crafty researchers. But, by and large, all three men used their creativity to chip away at important questions. It was only in the late ’90s that the signs of overreach became apparent. To some professors at top departments, clean identification became a fetish. “Almost every student, myself included, had the terrible experience of getting up in front of the [professors] for whom identification is the Holy Grail, and getting cut to shreds when your identification strategy doesn’t pass muster,” recalls a recent Harvard Ph.D.

Actually, it’s a good thing that grad students fear getting cut to shreds if they have a bad identification stategy. When your results depend on your natural experiment being an actual natural experiment, establishing causality rests on good identification. Duh! Why should I bother trusting the conclusion of a paper with bad ID?

The problem is that there are only so many big questions that misgraded tests or arbitrary boundaries can shed light on. If you’re wedded to these techniques, eventually they lead you in obscure directions. “People think about the question less than the method,” says Berkeley professor Raj Chetty, one of the most sought-after Harvard graduates in recent years (and a notable exception to this trend). “They’re not thinking: What important question should I answer?’ So you get weird papers, like sanitation facilities in Native American reservations.”

I suspect Chetty did not intend for his criticism to have as far reaching implications for the discipline as Scheiber claims it does. But that’s beside the point. My sense is that Scheiber has completely brushed past a major cause of the change in the style of Economic research over the past few decades: computers.

After all, how likely is it that most economists would suddenly come to the same realization and change their methods in the same way? Sounds like a coordination problem! The increase in computing power and the ubiquity of the PC do a lot to explain these changes without the psychoanalysis.

PC’s everywhere mean data everywhere. It has become much simpler to keep records, and as a result governments, NGO’s, businesses, and people record tons of information electronically. Remember the Freakonomics bit about the average member of crack gangs making less than a McDonald’s worker? Legend has it that Sudhir Venkatesh, a PhD sociology student at U Chicago, obtained the data for the paper by infiltrating himself into a crack gang and winning over the leader. The gang ended up giving him an Excel spreadsheet file with their finances. Computers allowed the crack gang to easily keep records; these records were easily transferable and already in a format that was ideal for regression analysis.* (Correction at the end)

The processing power of computers has also increased substantially, doubling about every 18 months (It’s called Moore’s Law). In a decade, then, computers become about 100 times faster! If you’re thinking that the instrumental variables regressions that economists use for this “clean ID” stuff probably require lots of mathematical calculations, taking millions of clock cycles, then we’re on the same page.

I’ve worked with some of the data that Scheiber derides as “cute”, and it ends up being huge — tens of thousands of observations spanning several-hundred megabyte files. Even if you could find a hard drive big enough to hold that data back in 1985, and even if you could find enough memory to hold the matrix in temporary storage, no desktop computer from that decade could invert the matrix in a reasonable amount of time.

Yes, there may be other reasons that Economics has embraced the instrumental variables regression, with its holy grail of clean identification. But let’s acknowledge the advance in technology that occurred alongside the growth of these methods, and in my view, is at least partly responsible for the state of the discipline today.

* Oops! This is totally wrong. The gang’s books were physical books! A better example would be the paper on the parking tickets of UN diplomats. There were millions of dollars worth of fines and they were all tracked digitally by the New York City government. In a time without computers, it’s hard to imagine the government keeping such meticulous records or making them so easily available.

Tags: , , , , ,

Comments

Fact-Free Learning

A little bit of background on myself: I love computers, and I love economics. Unfortunately, the two disciplines mix pretty infrequently. That’s why I let out a little yelp when I found this paper, called “Fact-Free Learning” by Enriqueta Aragones, Itzhak Gilboa, Andrew Postlewaite, and David Schmeidler. The paper is trying to present a model of how we understand information that is already available to us. Sometimes, we figure out something new by looking at the data we have in a new light — fact-free learning!

But fact-free learning is tough. The real breakthroughs are often unexpected, and they may happen slowly. The authors argue that some familiar tools of economics and computer science, when brought together, can explain this phenomenon. Before I get into their argument, I need to explain something called complexity, because the central argument of the paper relies on it.

Computational complexity is the study of how computer algorithms scale as the size of the problem they are trying to solve increases. Usually we are looking for some sort of bound on the amount of time it will take the algorithm to finish, given a problem of size n.

Some algorithms may not scale so well. Seriously, they might not scale so well. We’re not entirely sure! (Why? Wikipedia explains.) This class of algorithms is called NP, and we are pretty sure that any implementation of an NP-hard problem will end up taking a few lifespans-of-the-universes to complete for even a small size input.

Now let’s get back to the paper. Remember how I said that it brought together tools of both computer science and economics? Well, the economic tool these fellows bring to the table is the regression. Say you have information on lots of variables, and you want to see which variables explain some phenomenon. If you pick out a few of these variables, you can regress the measure of the phenomenon on those variables to determine which variables are relevant.

But what if you can only use a few variables — say k — in the regression at once? Then you might want to run the regression with every possible combination of k variables, looking for the one that does the best job explaining the phenomenon. The authors argue that this process — finding the set of k variables that does the best job explaining a phenomenon in a regression — is a lot like fact-free learning. But there’s a catch:

Linear regression is a structured and relatively well-understood problem, and one may hope that, using clever algorithms that employ statistical analysis, the best set of k regressors can be found without actually testing all (mCk) subsets. Our main result is that this is not the case. Formally, we prove that finding whether k regressors can obtain a prespecified value of R2, r, is, in the language of computer science, NP-Complete. Moreover, we show that this problem is hard (NP-Complete) for every positive value of r.

The implication is that fact-free learning is really difficult for computers. And if it’s difficult for computers, it’s probably really difficult for people too!

Tags: , , , ,

Comments