The eDiscovery Problem is simply that you either find too few docs or too many docs or too much garbage and not enough gold. What has NIMBLE done recently to solve these problems?
Of special note- all three of these initiatives occurred with a mix scanned images with OCR and native files. In a majority of cases the images were moderate to poor and the OCR was bad. Our engine overcame these problems and made use of both natives (particularity spreadsheets and other data tables) and images and OCR. Further, when we dealt with the unitization problem we found that pages were found in top to bottom order, reverse order, and scattered order. Thus our solutions are robust enough to handle such diverse and unpredictable patterns. .
3 Different Types of Cases with 3 Impressive results!
Bid Rigging in the Chemical Industry: The litigation team started with 2 million docs and a long list of search terms. They ran many searches and coded 40,000 docs for relevance. Out of the 40K docs they found 200 that were of potential value. The subject matter expert declared none of them were any good. Next the team used Aurora’s SMART SEARCH and coded only 1000 iterative samples to come up with a result set of 100,000 relevant docs. They sampled thru these results using only those docs with a probability of YES at 90% or above. They fed back their own evaluations as samples in the model and the number of results dropped to less than 50,000.
State Wide Groundwater Contamination: A state environmental agency and a companion self-insurance fund had over 20 million pages of docs in the form of technical reports, claim files, letters, memos, invoices, court orders and the like. Over 12 million of these pages were not divided into docs (unitized). We got a quote of $500,000 to send the records off shore to have one field coded- the document boundaries. Our team modified our generic machine learning model to: (1) find the different doc types and then the boundaries between them .This process is called unitization.. (2) Find those reports with real scientific measurements of the contaminant; (3) Find the exact pages with the data tables necessary to code the concentrations; (4) produce a state wide inventory of contaminated sites. The entire process took less than 8 months from the issuance of a court order to the submission to the Court at the year end. Our costs were multiples less than $500K for the entire job.
Ponzi Scheme: An off shore litigation arising out of the liquidation of illegally gotten gains produced a massive number of unorganized, un-coded, and non-unitized docs. The review team had several needles and several haystacks to go thru. They provided us with a list of 40 doc types some of which had very low occurrence but were very important. They needed to know if they had gotten what they requested from the defendants. We built a SMART SEARCH model for each doc type and produced an inventory of every desired doc type. We further extracted metadata so the receiving party could us our time line application to check for compliance to the temporal and as well as subject matter requirements of their document requests.