Methodology
OCCA Precedent Analysis
AI-powered retrieval and analysis of Oklahoma Court of Criminal Appeals opinions
Abstract
OCCA Precedent Analysis is a retrieval-augmented generation (RAG) system designed to search, analyze, and synthesize opinions from the Oklahoma Court of Criminal Appeals. The system addresses a persistent challenge in Oklahoma criminal appellate practice: efficiently identifying relevant OCCA precedent across decades of published and unpublished opinions, understanding the court’s treatment of specific legal issues, and verifying that cited authorities remain good law.
The methodology combines semantic vector search over indexed OCCA opinions with keyword-based filtering, citation graph analysis, and automated treatment verification. This document describes the indexing pipeline, retrieval methodology, analysis generation process, and the verification steps applied to every AI-generated result.
Background
The Oklahoma Court of Criminal Appeals (OCCA) is the court of last resort for all criminal matters in Oklahoma. Unlike most states, where the state supreme court handles criminal appeals, Oklahoma vests exclusive criminal appellate jurisdiction in the OCCA. This means that OCCA opinions constitute the highest binding authority for Oklahoma criminal law.
OCCA opinions are published through the Oklahoma State Courts Network (OSCN), which provides free public access to the full text of published and some unpublished opinions. However, OSCN’s search functionality is limited to basic keyword matching, making it difficult to conduct sophisticated legal research. Attorneys and pro se litigants often struggle to identify all relevant precedent, particularly when the relevant legal principle is expressed in different terminology across different opinions.
OCCA Precedent Analysis was developed to provide semantic search capability over OCCA opinions—the ability to find cases based on legal meaning rather than exact keyword matches. By combining semantic search with citation graph analysis and treatment verification, the system enables more comprehensive and reliable precedent research than keyword-only approaches.
Methodology
Step 1: Corpus Indexing
OCCA opinions are ingested from OSCN and processed through an indexing pipeline that prepares them for both semantic and keyword retrieval:
- Text extraction: Full-text opinions are extracted from OSCN, preserving paragraph structure, headings, and citation references.
- Chunking:Opinions are divided into semantically coherent chunks of approximately 500–1000 tokens, with overlap to preserve context across chunk boundaries. Chunking respects paragraph and section boundaries to avoid splitting logical units of analysis.
- Embedding generation: Each chunk is converted to a 1536-dimensional vector embedding using a legal-domain-optimized embedding model. The embeddings capture the semantic meaning of the legal text, enabling similarity-based retrieval.
- Metadata extraction: Case name, citation, decision date, authoring judge, opinion type (published/unpublished), and legal topic classifications are extracted and stored as filterable metadata.
- Citation graph: Citations within each opinion are extracted and indexed in a citation graph (Neo4j), enabling traversal of citation relationships between cases.
Step 2: Query Processing
When a user submits a research query, the system processes it through multiple retrieval strategies in parallel:
- Semantic search: The query is embedded using the same model used for corpus indexing, and the nearest-neighbor chunks are retrieved using HNSW (Hierarchical Navigable Small World) index over pgvector. This finds cases that address the same legal concepts even when using different terminology.
- Keyword filtering: Statutory references, case names, and legal terms of art extracted from the query are used for precise keyword matching, ensuring that specific authorities mentioned in the query are retrieved.
- Citation graph traversal: When the query references a specific case, the citation graph is traversed to identify cases that cite, distinguish, follow, or overrule the referenced authority.
- Metadata filtering: Results are filtered by jurisdiction (OCCA), date range, opinion type, and topic classification as specified by the user.
Step 3: Analysis Generation
Retrieved chunks are provided as context to the language model, which generates a structured analysis of the relevant precedent. The analysis is constrained to cite only authorities present in the retrieved context—the model is not permitted to generate citations from its parametric knowledge. This constraint significantly reduces the risk of hallucinated citations.
The generated analysis includes: a summary of the applicable legal standard, identification of the most relevant OCCA opinions, a synthesis of how the OCCA has treated the legal issue over time, and identification of any splits or evolution in the court’s approach. Each citation in the analysis is linked to its source chunk, enabling the user to verify the AI’s interpretation against the original opinion text.
Step 4: Citation Verification
Every citation in the generated analysis is verified through CiteRight’s CITADEL pipeline before being presented to the user. This verification confirms that each cited case exists, is correctly cited in Bluebook format, and has not been overruled or negatively treated. Citations that fail verification are flagged with the specific failure reason.
Technical Architecture
- Vector store: Supabase pgvector with HNSW indexing (m=16, ef_construction=64) for sub-100ms nearest-neighbor retrieval over the OCCA corpus.
- Citation graph: Neo4j Aura for storing and traversing citation relationships between cases. Graph queries identify citing cases, subsequent history, and treatment patterns.
- Orchestration: LangGraph workflow with typed state management. The RAG pipeline executes as a multi-step graph: query analysis, parallel retrieval, context assembly, constrained generation, and citation verification.
- Caching: Frequently accessed opinions and citation verification results are cached in Redis (Upstash) to reduce latency for common queries.
Data Sources
- OSCN (Oklahoma State Courts Network): Primary source for OCCA opinions. Provides full-text access to published opinions and selected unpublished opinions. Coverage includes opinions from the modern digital era, with expanding coverage of historical decisions.
- CourtListener (Free Law Project): Supplementary source providing additional OCCA opinions, federal opinions cited by the OCCA, and cross-jurisdictional authorities. Used for citation verification and treatment analysis.
- Oklahoma Statutes (Title 21, Title 22): Oklahoma criminal statutes (Title 21 — Crimes and Punishments) and criminal procedure statutes (Title 22) are indexed for statutory cross-referencing. Statute references in OCCA opinions are linked to current statutory text.
- CITADEL Citation Pipeline: All citations are verified through CiteRight’s citation verification system for format compliance, existence confirmation, and treatment status.
Limitations and Disclaimers
- Corpus coverage: The indexed corpus includes OCCA opinions available through OSCN and CourtListener. Historical opinions predating digital publication may not be indexed. Unpublished opinions have inconsistent availability. The absence of a case from search results does not mean it does not exist.
- Semantic search limitations: Vector similarity search excels at finding conceptually related cases but may miss cases that use highly specific or unusual terminology. Users should supplement semantic search with keyword searches for specific statutory provisions or legal terms.
- AI-generated analysis: The analysis generated by the system is AI-assisted and must be independently verified by the user. AI-generated summaries of case holdings may not capture every nuance of the original opinion. Always read the full opinion text before relying on a case in a filing.
- Oklahoma Rule 1.17 compliance: Any AI-generated content produced by OCCA Precedent Analysis that is incorporated into an OCCA filing must be verified in accordance with Oklahoma CCA Rule 1.17. The verification responsibility rests with the filing party.
- Not legal advice: CriminalAppeal.app is not a law firm. OCCA Precedent Analysis provides research assistance, not legal advice. All research results should be independently verified by a licensed attorney. See our UPL Disclaimer.
Last updated: March 2026
See also: Readiness Checklist Methodology | CREAC-AI Framework