Monday, August 9, 2010

Computers: Slightly Less Magic Than They Could've Been?

Our suspicions may have been confirmed: P may not have equaled NP all along.

The P=NP question has been the seminal problem in theoretical computer science since forever ago (for the computer value of forever: 40 years give or take). To simplify it past any resemblance of truth, P is basically the group of all 'easy' problems, and NP is the group of all 'easy to correct' problems. If I gave you a computer program and told you it could always win at chess, how would you go about confirming that's true? Not very easily. But if I told you there's a 3,000 mile car trip that visits every state capital in America, and tell you the roads to take, you'd just have to count up the miles to see if it's true. Proving a computer program does something is not in NP, but proving that there's a path of some length between cities is. The question, then, is if there's an easy way to confirm a solution is true, can you easily come up with the solution too? (If I give you a map, can you come up with the shortest path between all the state capitals?) Computer scientists have suspected 'no', but demonstrating this is true has been fraught with difficulty.

People try to prove this all the time, and they fail. Vinay Deolalikar apparently emailed a 100 page proof to a group of respected researchers, and the paper eventually ended up on the Internet. Then the Internet became abuzz, an article was written on slashdot, and tweets of 'Deolalikar' skyrocketed past any previous level he'd experienced. Daniel Lemire raised a good question: why? What about this proof has gotten such discussion, as opposed to countless other attempts at a proof? Which is a really interesting question in general.

News transmission requires two participants: someone to tell the news and someone to hear it. You're hearing, or rehearing, this news from me. All in all, I'll probably end up telling 15 or so individuals about this event. Each of you may pass the news on to someone new, or let this trail of the story of Deolalikar's proof attempt end.

Coincidently, just like in my last post linking auto-immune diseases, SEO and planted evidence there's a illness connection here. The study of news transmission began in the study of epidemics. Instead of giving you a disease, I'm giving you a fact. Sorry, antibiotics won't help this time. Maybe enough alcohol, consumed very quickly, will wipe the fact from your mind? On the plus side, P = NP would have been great news for robots, so Computer Science is arguably less likely to kill you now.

What was discovered in epidemics is that the crucial value for deciding if a sickness will turn into a pandemic is the average number of people a sick person passes the disease onto before dying or getting better. If they average, say, 0.9 people, then the disease will dwindle and vanish. But if the average is, say, 1.2 people, then it'll explode through the population, each day the legions of sick increasing. Same thing with zombies. Same things with news and memes.

The P=NP? proof is a particularly good example of this. The knowledge of this paper started out just in Deolalikar's head. He gave the paper to a few other researchers. If the proof was poorly done, it would have ended there. But it was exciting enough that the researchers wanted to share it with > 1 other person each. There was an initial outbreak of this knowledge, eventually reaching slashdot.

Slashdot, having a vast readership, is particularly apt at transmitting an idea. For an instant the average person told skyrocketed by its introduction. The analogy is that the virus has reached the water supply, or the zombies have entered New York City. There was a phase change in outreach.

Sometimes news sources provide stories that don't excite us, that don't have a viral spread > 1. These peter out of the public discourse. But the blog posts and tweets since the slashdot story demonstrate again that there's a viral nature to this story. And here I am telling you.

Viral spread isn't a constant value, as eventually the susceptible are all infected. And not everyone is susceptible, by genes or geographic isolation. Or in this case, interests. Not all the readers of this blog are computer science aficionados, so this is more likely to be the end of the line then the computer science-centric blogs I've read this story on. It still might have another major outbreak if it reaches a mainstream news source. Or alternatively, if other theoreticians can't find any problems with the proof the story will 'mutate', and grow even more viral. Million dollar prizes always make good stories.

Returning to Dan's question, what made the story viral? P=NP is about as pop-culture as unsolved mathematics can be. The proof is apparently plausible. Releasing the paper by email is novel and interesting. And "I am pleased to announce a proof that P is not equal to NP, which is attached in 10pt and 12pt fonts" is a great way to announce one of the pinnacle proofs of our time.

One of the interesting things about viral spread is how small the critical value is. 0.95 people per iteration is a non-story, 1.05 can spread across the world. Thus it may be that the font comment was enough to push it across, and without it we'd have had to wait for more actual confirmation of the proof before hearing about it.

I find networks to be an endlessly interesting topic, and this is a great practical example of it. It also shows a news source acting as a facilitator of a story with wide popularity rather than pushing a story because of the source's reach. It'd be an interesting metric to see what news sources respond to and create viral spread, and which just push ideas that couldn't hold up in an organic global discussion.

As cool as a P=NP world would be, best of luck to Deolalikar, I hope you've earned your way into the pantheon of computer science greats.

Saturday, August 7, 2010

It was him.

The human body is equipped with a remarkably effective defense system. Once a dangerous substance has gotten into your body, it needs to push its way past the cell walls containing Defensin, a protein designed to keep bad things out. While the invader struggles to get in, it is liable to be eaten by Granulocytes and Macrophages, or marked for attack by a passing B cell. Even once infiltration is achieved, the body will seek out the compromised cells and destroy them with Killer T and NK Cells. Infections and illnesses come, but the body almost always wins in the end.

But such a powerful defensive weapon has a cost, as the HIV virus reveals. A powerful immune system can be turned against the body, destroying what it was built to protect. Some allergies exhibit a similar phenomenon, where the immune system gets over eager in destroying foreign particles. A lack of defenses leaves a system vulnerable to attack, but excessively capable defense systems are liable to be misused by outside forces.

The lesson doesn't just apply to the body. Google is caught in a war against 'black hat' search engine optimization. Do you ever see spam in the comments of an article? They often don't expect you to click on the link to purchase cheap knock-off purses, but they do expect Google to see the link, and return the target website higher in search results. Huge swathes of the internet are made up of dummy webpages, computer generated, intended only to look legitimate so the links to a real webpage are followed by Google and considered more authoritative.

When these tricks work, Google is left redirecting its users to scams and overpriced stores. So it works hard to identify these tricks and nullify them. If Google catches you gaming its system, it will remove you from its indexes, forever cutting you off from the biggest provider of pageviews around. The threat of banishment from its indexes is a very effective deterrent.

It also creates a powerful weapon to be abused. If you can't rank your website higher, causing your competitors to get kicked out of the index is the next best thing. Performing black hat search engine optimizations on another website can trigger Google's defenses against an innocent target, bringing advantages to the perpetrator. This is making Google's job harder.

And then there's law. Allegations of police officers planting drugs on suspects has a long history. It's difficult to catch someone in the act of a crime like drug use, so possession is used as a reasonable marker of intent and criminalized. But possession can be faked by an enemy, turning the sizable defensive weapon of the criminal justice system to twisted ends. Very recently, after 8 months of being abused and ostracized, a man was cleared of possession of child pornography after an employee of his bragged about planting the photos on his computer. In this case justice was eventually served, but its certainly the case that not every framer is foolish enough to leave behind a trail or brag about his crime.

The improper triggering of defensive mechanisms is, to some extent, unavoidable. The cost of letting an attack succeed in any setting can be too great to sustain. Thus a balance must be found that will inevitably include false positives, be it the creation of allergies or the imprisonment of the innocent. Still, it's a very important lesson to remember that evidence is easy enough to manufacture. When the penalties for a crime can ruin a life, especially based on the ultimately circumstantial evidence of possession of outlawed goods, some extra care must be taken. Things aren't always what they seem.