Natural Language Processing

Identify paragraphs, sentences, or other language elements which convey a specific meaning. Based on supervised machine learning – simply train examples. Powerful n-gram analysis ensures reliable interpretation – regardless of wording differences.

Capture information from sentence flows

Grooper can read documents in paragraphs and sentences just like a human. This allows it to understand and accurately recommend correct values from the body of documents by considering the surrounding flow of language.


  • Language Element Recognition:
    Locate all paragraphs or sentences in a document.
  • Language Element Classification:
    Determine if a contract contains a non-solicitation clause.
  • Document Flow Detection:
    Extract ‘Monday, May 27, 2009’ across multiple lines.
  • Context-Based Data Capture:
    Determine if a date value is the Maturity Date or the Loan Date.
  • Powerful Language Parsing:
    Distinguish “SW ¼ of the NW ¼” from “SW ¼ and the NW ¼”.

This technique has made it possible for us to identify all the values that make up a legal description, then break the full description into the individual tracts of land contained within the lease. There was no way we could overcome this challenge without resorting to custom development, and even then, the results were not something we could trust in a production scenario.

Paragraph Detection & Analysis

Grooper's paragraph ranking engine assesses a document's structure, intelligently Groops words into paragraphs, then compares them against training samples to find the "best match". Then the user is presented a recommendation list.

    Paragraph Isolation

    Indentions, double spacing, bullets, key phrases, line length and many other factors must be considered to determine where each paragraph starts and stops. Grooper provides an easy to configure console to tune paragraph detection settings for each project.

    Lexical Analysis

    Use the full power of Grooper's data types to collect features from within each paragraph. These can be n-grams, entries from a lexicon, or a non-value feature count Grooped by data types like: address, phone number, name, etc. The analysis spans lines of text to ensure accurate multi-word feature collection regardless of line wrap.

    Data Merge

    Once the paragraphs with an ideal match are recognized, they can be Grooped together as a single paragraph for further analysis or text export - even when they are not adjacent in the original document or are spread across multiple pages.

Lease and contract analysis is a major strength for Grooper. We have successfully built a working model that finds all of the key provisions throughout the body of the main document. Then it automatically searches for modifications to each within addendums/exhibits and brings them in-line with the main provision. This speeds up our analysis and ensures we are correctly interpreting the data.

Spacial Analysis

Pattern-matching on its own is great for efficiently finding common values like dates, amounts, and phone numbers. But when multiple choices are found on a document, how will the system know which one is the best match? The answer is spacial analysis. Each choice is ranked by analyzing the words and features nearby.

In the example above, we are easily able to differentiate information pertaining to the borrower vs. co-borrower through radial spacial analysis.

Label/Value Pairs

In structured documents, most information is recorded in label/value pairs - meaning a value has a corresponding label indicating its meaning. And field labels are generally written above and/or to the left of the value they define. Grooper can rank possible values by looking spatially in one or more general directions to determine meaning.

Radial Analysis and Geotagging

Consideration of not only the words nearby, but also the direction each word is located in relation to the candidate can lead to more accurate identification of field values from documents. Geotagging adds in this consideration, and it allows for filtering features based on direction as a simple way to remove features not likely to define a value.

Previous Classification
Next Design Studio