What would happen if you hacked into a library?

Wed Jul 20, 2011 5:08pm EDT

We usually think of university libraries as a bastion of free thought, with scholarly publications that are freely shareable by all, but former Reddit staffer and digital activist Aaron Swartz has been arrested by federal prosecutors and accused of hacking into the library at the Massachusetts Institute of Technology computer network and downloading almost 5 million academic documents. If he is found guilty, Swartz could face up to 35 years in prison and a fine of up to $1 million — penalties that seem inappropriate at best for a crime that appears to have no real victims.

According to the indictment that was filed in Boston (PDF link), the 24-year-old programmer — who is the co-founder of a non-profit political action group called Demand Progress, and also co-authored the RSS specification when he was still a teenager — used a laptop and a number of software tools to hack into the MIT computer system and download more than 4 million scholarly papers and journal archives. The indictment notes that when these alleged offences occurred, Swartz was a fellow at Harvard’s Center for Ethics.

The journals and documents that Swartz is alleged to have downloaded are held in the so-called JSTOR archive, which is a database of thousands of scholarly journals maintained by a non-profit organization created in 1995 to allow institutions to share these publications easily. According to a statement from JSTOR, the organization is not involved in the indictment against Swartz. Its statement says that after it noticed unauthorized access to its documents occurring at MIT late last year:

We stopped this downloading activity, and the individual responsible, Mr. Swartz, was identified. We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed.

Although academic institutions such as MIT pay JSTOR an annual fee for maintaining the archive, most of the documents in it are freely available to students and anyone else at an accredited university. So what harm did Swartz cause by downloading these journals and archives? That’s not clear. Demand Progress released a statement about the indictment in which executive director David Segal said that arresting the programmer for doing this was like “trying to put someone in jail for allegedly checking too many books out of the library.” The statement also quotes a librarian at Stanford University as saying Swartz’s prosecution “undermines academic inquiry and democratic principles.”

Aaron Swartz

The federal prosecutors office, of course, seems more interested in the fact that Swartz illegally accessed a computer network — in this case (according to the indictment) by gaining unauthorized access to the MIT computer network’s main server space, where he hooked up a laptop to one of the servers and then hid it under a shelf, and repeatedly changed his computer’s hardware address in order to get around the barriers that JSTOR and the university set up. The indictment also says that the young programmer “intended to distribute a significant portion of JSTOR’s archive of digitized journal articles through one or more file-sharing sites.”

As Jason Kottke noted in an overview of the case, Swartz has shown an interest in doing similar things in the past: in 2009, for example, he downloaded 19 million pages worth of federal documents from the Pacer archive — which was set up by the government in an attempt to improve access to electronic files from the federal courts. Swartz and others wanted to download all of the archives (19 million pages reportedly represented about 20 percent of the total) and then upload or share them online. Swartz also helped develop the technology behind the Open Library project.

The government’s indictment of Swartz is more than a little disturbing, if only because the documents that he allegedly took were academic publications that were freely available to anyone studying at a university — in other words, not commercially or politically sensitive in any way. Even the non-profit organization in charge of this archive declined to proceed with any case against the programmer.

Assuming the federal indictment is correct, what Swartz did seems no more threatening than what Mark Zuckerberg did when he set up a script to download photos from the Harvard computer system to create the precursor to Facebook. It’s certainly nowhere near the kind of espionage that the government is alleging occurred in the case of Wikileaks and the diplomatic cables it published, or the hacking that groups such as Anonymous and Lulzsec are accused of being involved in. What could possibly gained by going after a young programmer for trying to liberate academic research from a library?

(Note: Although Swartz describes himself as a co-founder of the link-sharing community Reddit, Alexis Ohanian noted on Twitter and on Google+ that he and Steve Huffman created the company and then acquired Swartz’s company six months later).

Post and thumbnail photos courtesy of Flickr user Eliot Phillips and Wikimedia Commons

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

  • Infrastructure Q1: IaaS Comes Down to Earth; Big Data Takes Flight
  • Report: NoSQL Databases – Providing Extreme Scale and Flexibility
  • Flash analysis: prospects for Google+