Caselaw Access Project launches API and bulk data service

Project gives free access to 360 years of American court cases

The Library Innovation Lab at the Harvard Law School Library announced the launch of its Caselaw Access Project API and bulk data service, which puts the full corpus of published U.S. case law online for anyone to access for free.

Between 2013 and 2018, the Library digitized over 40 million pages of U.S. court decisions, transforming them into a dataset covering almost 6.5 million individual cases. The CAP API and bulk data service puts this important dataset within easy reach of researchers, members of the legal community and the general public.

The Caselaw Access Project API, is a project of the Harvard Library Innovation Lab at the Harvard Law School Library. The Caselaw Access Project digitization effort was completed through the partnership and support of Ravel Law, a legal research and analytics platform.

Harvard Law School’s collection is the most comprehensive and authoritative database of American law and cases available anywhere except for the Library of Congress, containing judicial decisions from the federal government and each of the fifty states, dating back to the founding of each respective jurisdiction. The Harvard Law School Library—the largest academic law library in the world—has been collecting these decisions over the past two hundred years.

In an article in Fortune Magazine,  Adam Ziegler, director of the Library Innovation Lab, said, the Caselaw Access Project will be a treasure trove for legal scholars, especially those who employ big data techniques to parse the corpus. “It’s an opportunity to reconstruct the law as a data source, and write computer programs to peruse millions of cases.”

The browsable API, available at api.case.law, offers open access to descriptive metadata for the entire corpus. The API documentation is written to be friendly to experts and beginners alike.

“Libraries were founded as an engine for the democratization of knowledge, and the digitization of Harvard Law School’s collection of U.S. case law is a tremendous step forward in making legal information open and easily accessible to the public,” said Jonathan Zittrain, the George Bemis Professor of International Law at Harvard Law School, and Vice Dean for Library and Information Resources. “The materials in the library’s collection tell a story that goes back to the founding of America, and we’re proud to preserve and share that story,” said Zittrain, who also holds appointments as Professor of Computer Science at the Harvard School of Engineering and Applied Sciences, and Professor at Harvard’s John F. Kennedy School of Government.

In a recent blog post, John Bowers, a research associate at Harvard Library Innovation Lab, described how he worked with the Caselaw Access Project API and bulk data service to uncover the story of Justice James H. Cartwright, the most prolific opinion writer on the Illinois Supreme Court.

Using the Illinois case law dataset, Bowers generated a plot tracking the number of opinions Illinois judges had published per year between 1850 and the present. When he noticed a trend – in a window of time between about 1890 and 1930 – where many justices were publishing upwards of 50 opinions per year, he plotted yearly publication volume for the five Illinois judges who wrote the most opinions over the course of their careers.

According to Bowers, further searches revealed that Cartwright was “firmly in the lead as the most prolific publisher of legal opinions in the history of the state of Illinois” because of his yearly rate of production and his consistency of producing opinions over the course of his lengthy career.

Bowers said: “In the hands of an interested researcher with questions to ask, a few gigabytes of digitized caselaw can speak volumes to the progress of American legal history and its millions of little stories.”

By digitizing these materials, the Harvard Law School Library aimed to provide open, wide-ranging access to American case law, making its collection broadly accessible to nonprofits, academics, practitioners, researchers, and law students—anyone with a smartphone or Internet connection.

Since CAP launched in 2013, the Harvard Library Innovation Lab has scanned 39,796 volumes and 38.6 million pages of material covering 334 years of American caselaw. In January 2017, they announced they had scanned their last volume in the collection. Since that time, they converted the scanned images into machine-readable text files, extracted individual cases into individual files, redacted headnotes and other editorial content, and continued quality control measures.

See video below which describes the project and digitization process.

To learn more about the project, the data and how to use the API and bulk data service, please visit case.law.