JURIS Text Corpus from LDC

From      owner-tei-l@LISTSERV.UIC.EDU Wed Sep 30 18:47:02 1998
Date:     Wed, 30 Sep 1998 18:17:45 -0500 (CDT)
From:     LDC Office <ldc@unagi.cis.upenn.edu>
Subject:  New Corpus: JURIS (Justice Department Retrieval/Inquiry System)

Announcing a NEW CORPUS from the LDC

JURIS (Justice Department Retrieval and Inquiry System) Text Corpus

The text data contained on this two-CD-ROM set represent a release of the JURIS (Justice Department Retrieval and Inquiry System) data collection that has been made available to the Linguistic Data Consortium (LDC) by the U.S. Department of Justice. The time span of the text ranges from the 1700's to the early 1990's.

There are 1664 individual text files in the corpus, 1011 on the first CD-ROM, and 653 on the second. The original archive consisted of 219 files ranging between less than 1 MB and nearly 70 MB in size. In order to make the data more accessible for research use, we chose to divide the larger files into pieces, such that the average file size was about 2 MB when uncompressed (the largest uncompressed file size is about 4.5 MB). Divisions of the files were done at document boundaries, so all files contain whole documents.

There are a total of 694,667 document units in the corpus, and these can be categorized to some extent with regard to their content. The following is a partial list of categories and their descriptions drawn from JURIS documentation contained in the corpus. The terminology and organization of categories are those used in the JURIS documentation:


Published Comptroller General Decisions; Unpublished Comptroller General Decisions; Opinions of the Attorney General; Office of Legal Counsel (US Dept. of Justice Board of Contract Appeals; ADP Protest Report (Summary of ADP Procurement Protests before the GSBCA); Federal Labor Relations Authority Case Decisions; FLRA Administrative Law Judge Decisions; Federal Service Impasses Decisions; Decisions and Reports on Rulings of the Assistant Sec. of Labor for Labor Management Relations; Federal Labor Relations Council Rulings on Requests of the Asst. Sec. of Labor for Labor Management Relations; HUD Administrative Law Decisions; Merit System Protection Board Decisions; Decisions under Immigration and Nationality Laws; Environmental Protection Agency General Counsel Opinions; Equal Opportunity Commission Decisions; Equal Employment Opportunity Commission Policy Statements; US Office of Government Ethics Decisions; HHS Department Appeals Board Decisions.


Office of the Solicitor General; Civil Division; Civil Division Trial; Environmental and Natural Resources Division; Tax Division Criminal Appellate; US Attorney's Offices; US Trustees' Offices.


U.S. Supreme Court; Federal Reporter, 2nd Series; Court of Appeals Unpublished Decisions; Federal Supplement; Federal Rules Decisions; Atlantic 2nd Reporter (DC only); Bankruptcy Reporter; Courts of Military Review; Military Justice Reporter; Court of Claims.


FOIA Update Newsletter; DOJ Guide to the FOIA Case List Publications.


Code of Federal Regulations; Unified Agenda of Federal Regulations; Defense Acquisition Regulations.


United States Treaties and Other International Agreements; Department of Defense Unpublished International Agreements.


Opinions of the Solicitor (Dept. of Interior); Ratified Treaties; Unratified Treaties; Presidential Proclamations; Executive Orders and Other Orders Pertaining to Indians.


Decisions Under Immigration and Nationality Law; Title 8 - Code of Federal Regulations; Immigration Reform and Control Act of 1988, Legislative History; Equal Access to Justice Act, Legislative History.


Public Laws; United States Code; Executive Orders; Anti-Drug Abuse Act of 1988; Section-by-section analysis of anti-drug abuse act of 1988; Criminal Division Handbook on CCCA; The Organic Laws of the United States.


US Tax Court Decisions; US Board of Tax Appeals Decisions; Tax Division's Summons Enforcement Decisions; Tax Division's Tax Protester Case List; Tax Division's Criminal Tax Manual; Tax Division's Criminal Tax Indictment/Information Forms; Tax Division's Standardized Criminal Tax Jury Instructions; Tax Division's Criminal Section Newsletter; Tax Court Memorandum Decisions; IRS Cumulative Bulletin; Tax International Acts; IRS News Releases; IRS General Counsel Memoranda; IRS Actions on Decisions; IRS Technical Memoranda.


United States Attorney's Manual; United States Trustees' Manual; Federal Personnel Manual; Federal Acquisition Regulations; Federal Acquisition Circulars; Federal Travel Regulation; Federal Information Resources Management Regulation; Federal Property Management Regulations; Principles of Federal Appropriations Law; Justice Department Acquisition Regulation; Justice Property Management Regulations.


Civil Division Monographs; Civil Division Torts Branch Handbook on damages under FTCA; Criminal Division Monographs; Criminal Division Forms; Criminal Division Guidelines for Drafting Indictments; Criminal Division Narcotics; Forfeiture, Prosecution Manual; Criminal Division Directory of Services; Asset Forfeiture Manuals; Obscenity Enforcement Reporter; Environmental and Natural Resources Division Monographs; US Sentencing Commission's Guidelines Manual; Sentencing Guidelines Updates.

The text files are all formatted using a set of SGML tags to mark document boundaries, and to mark major structural features within documents. As with file organization, the markup is derived from the document structures as provided by the Justice Department.

Institutions that have membership in the LDC during the 1998 Membership Year will be able to receive this corpus in the same manner as all other text and speech corpora published by the LDC. Nonmembers may purchase JURIS for $1500.

If you would like to order a copy of this corpus, please email your request to ldc@unagi.cis.upenn.edu. If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email or call +1 (215) 898-0464.

Further information about the LDC and its available corpora can be accessed on the Linguistic Data Consortium WWW Home Page at URL:


Note: For more information on LDC, see the main database entry "Linguistic Data Consortium (LDC)."