Jump to content

Refining Keyword List Search Results


Recommended Posts

Hello, can anyone offer advice on how to narrow the results of a keyword list search? I am using the keyword lists to identify recurring names and addresses in a large group of pdf documents. The lists are doing a great job of highlighting the names and addresses that I need to redact.

However the search results are also giving me dozens of common words like "the" "to" "wall" "and" "door" etc. that are completely not what I am looking for. Using the redaction tool is not saving me any time with all these false hits.

I have tried creating a list of words to exclude, adding that as an additional keyword list, and searching with that list in along with the others, but that list is not giving me any hits at all. I also tried adding the word NOT in front of all the common words to exclude and searching with both the require and exclude options to no avail.

Hoping someone out there has more experience and insight on this. Thank you!

Link to comment
Share on other sites

Hello Susan,

When you use queries containing multiple words (as typical for names and addresses), Intella finds all entries of these words individually. For instance, a query The Wall Street (no quotes) highlights all individual words "the", "wall" and "street" in the result documents. Probably, this is why you see the common words like "the" highlighted in the redaction editor.

For finding exact word sequences, you can try to use phrase queries in your keyword list. For this, every multi-word query should be quoted.

So instead for searching for:

The Wall Street
The Drive Street
...

you would search for:

"The Wall Street"
"The Drive Street"
"..."
 
In this case, only exact phrase matches will be highlighted in the search results.
Link to comment
Share on other sites

Thanks very much for your reply Alex! I did try enclosing the full names and the addresses in double quotes, but Intella did not recognize them. What did finally work was adding single quotes ('John Smith') around the names and addresses. That part is working just fine. What I cannot seem to filter out are the non-responsive words like "and"   "the"   "of" etc.

I finally starting processing the documents in smaller batches and using redaction window at the bottom to remove all the nonsense words before starting my review. It's a pain, but I can eliminate most of them before starting to mark them for redaction. But it's very redundant to have to go through and remove them again before each batch.

Any other ideas that I might try?

 

Link to comment
Share on other sites

Susan,

If phrase search (with double quotes) doesn't have results, this means that no exact phrase matches (the same words in the same order) can be found. Perhaps, it needs to refine the queries to include the names and addresses variations.

Single quotes can be used only within the double ones for nested phrase searches. If used alone, they are ignored and the query works the same way as without them (so it finds the individual words from the query, as before).

You can try also to remove the common words directly from the keywords list file and repeat the search (you'll need to re-import the file after edits).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...