Jump to content

Keyword testing tips


Recommended Posts

How to Test Keywords Queries

                                                                                 

Our Support Team is often asked by our users how to structure a search query (s) using the available Intella search options. Often the user will send in a long list of queries and ask us to help them understand which ones they can use, or ask us to come up with a list of search terms based on a description of what they are hoping for.

 

Unfortunately we are unable to test queries for customers as part of support.  We don't wish this to sound unhelpful however, there is usually not one single query but a range of queries and methods needed to get to the final goal. Experience tells us that trying to do a search in one string can often times be more time consuming (working out the query) and much less effective than doing it with a number of searches, filters and tags. This is also consistent with the approach of searches, filters and tags we teach in our training class. 

 

Secondly without access to the data it would take Vound support hours to try and recreate the customers scenario with no guarantee that we can supply the answer the customer is looking for. Hence we cannot justify the time and effort of trying to find one Über query.

 

Whether you are looking for syntax advice, how to structure a query or what steps to take to find a set of results.  Our suggestion is that you do the following tests. The tests only take a few minutes to do and will lead to much better results and understanding of Intella.  

 

 

Typical Support Question

Dear Support, we need your advice on creating a search that will only find documents that contain all of the following:

 

As a single phrase

business analyst

 

anywhere that the words

business OR analyst 

 

are within 10 words distance of

responsibilities

 

We tried the query below but it failed as invalid! And we don’t understand why?

 

(("business analyst" " analyst responsibility) /6 (response*)) OR business? AND analyst

 

Suggested Test Method

 

To begin with, we need to perform a test to check the validity of our search syntax. To start the test we create four or more text files. In three files we ensure that the single term appears amongst other text.  We add extra text to each document because if you need to find car within 10 words of bus, “car bus”~10 you will need a document with 1 – 10 words between those terms to test effectively.  It also makes the test more realistic and possibly shows false positives if you use similar terms in the test, eg  car-port, bus stop.  In a fourth text file we add all four terms, again amongst other text.

We find the easiest way to generate text is to Google the term and select a Wikipedia article. Then copy and paste a paragraph that has the terms we want, eg for the term business analyst see http://en.wikipedia.org/wiki/Business_analyst

 

Once we have all four text files we index the four files in a folder and test the best options.  

 

Then we can scroll through looking for the perfect search, and testing for validity of a term.

Example:

 

Result                                                                                                                                      Total Count returned

"business analyst"  "business responsibilities"  "analyst responsibilities"                             0

"business analyst"  (business OR Analyst responsibilities)                                                   0 Fails as the OR in () is invalid

"business analyst"  business OR Analyst responsibilities                                                      4

"business analyst"  "business responsibilities"~10  "analyst responsibilities"~10                 1 –Winner!

 

Notes

Using Correct Search Language

 

Commonly users will send a list of queries that use search language other than that used in Intella. For example:

” /n, eg duty /5 care –

This is a proximity search for a different tool.  It will fail to work with Intella.  Do use the manual to find and use the correct queries.

 

The Words Tab Advantage

 

Another useful keyword testing method is using the Words tab to see what specific words Intella has listed in its index.

For example, if you review  the 4 documents produced for our search above you can see that the Words tab lists the word ”BA” short for Business Analyst.  Very useful to know as it now changes the search to something like

 

BA  "business analyst"  "business responsibilities"~10  "analyst responsibilities"~10

 

Take advantage of the Words tab to help you understand how Intella has indexed specific words.

 

The Words tab is also useful for looking for numbers, and to understand how punctuation and special characters are indexed. For example sarbanes–oxley act

These tests can be used to test the validity of a search, find which words you can search for or find the best single query to use specific to your evidence.

 

Attached are the four text files I used in the example above. Please feel free to test this method for yourself.  It will also help you to learn the types of searches Intella can perform. 

Please do feel free to add your tips to this thread for other to use. 

 

ALL-TERMS.txtPART1-TERMS.txtPART2-TERMS.txtPART3-TERMS.txt

Link to comment
Share on other sites

  • 3 weeks later...
Regular text words and special characters search
 
One of the more frequent questions that hit our Support Team are related to searching keywords separated by special characters.
 
Typical Support Question

Dear Support, we need your advice on creating a search that will only find documents that contains exactly this phrase:

happy-day

 

We tried with following queries but none of them seems to produce the result we want:

happy-day - It looks like it's evaluated as happy AND day.

happy\-day - It looks like it's evaluated as happy AND day.

 

Answer
 
Note that during indexing, some of special characters will be filtered out and will never make it into the index. 
The rules of handling specific characters depend on the context where they occur. For instance, the punctuation 
characters like dots ('.') or dashes ('-') are significant within numbers, email addresses or host names, while being 
ignored (i.e. interpreted as whitespaces) between regular text words. In the latter case, escaping those characters 
will not make them searchable.
 
Exactly the same happens in the case of happy-day phrase where dash is interpreted as white space so representation 
inside the index is the same as for happy day - that is a reason that all instances of happy day and happy-day
are returned when you search with the phrase search "happy day". Actually this is the closet you can get to what you want.
 
Two search queries you provided are represented as:
happy day
which is actually the same as
happy AND day
 
In general you can think of this as if all special characters between text words, also in search terms, would be replaced with spaces
so it's actually not possible to search for special characters between regular text words.
 
Now the question arise where special characters can actually be used:
- [+-.,%$] are significant in numerals.                      Example search term: -100.0
- [.-@] in email addresses and host names.            Example search term: info@vound-software.com
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...