regex - Extracting U.S. common law case names (e.g., Smith v. Jones) systematically using off-the-shelf NLP software? -
i'd find way extract case names u.s. courts sentences. take predictable pattern, although think may varied capture regexs, thinking using nlp locate them.
here few examples of case names (bolded) might used in partial sentences:
- in united states v. george, court held that...
- in re. bankruptcy of sir walter williams, iii case southern district of new york...
- not ashcroft v. iqbal, 556 u.s. 662 (2009) incorrectly decided, also...
- the court's recent decision in burwell v. hobby lobby stores, no. 13-354 (u.s. jun 30, 2014) implicates first amendment rights...
- the case of trans world airlines, inc. v. flight attendants correctly decided...
i've been experimenting off-the-shelf packages (like textblob python), helps things extract noun phrases -- don't know how take next step , recognize case names unit.
how about:
((re\.).*?,.*?\b(?<=\s)(?=[a-z]))|(?!\r|\n|\.)((\s\m[a-z][a-z]+?\m\s).*?v\.\s.*?\b[a-z].*?[a-z]\m)(?!\s[a-z])|ex\sparte\s\b[a-z].*?[a-z](?=(\.|,|;|\s))
it's imperfect in doesn't capture only bolded text (it might grab little more, won't match false-positive (as needs find v.)), it's guaranteed find provided examples, plus ex parte cases gleaned wiki
there 3 capture groups in regex:
1. matches v.
2. matches re.
3. matches ex parte
ps: generic pcre regex pattern syntax. program/scripting languages , many of more advanced text editors should find matches using this.
Comments
Post a Comment