Jump to content

Proximity Searches in Intella - A Better Understanding

Recommended Posts


We receive numerous support tickets from our customers in regards to advice for using Proximity searches. The user manual provides the basic syntax and there is additional information at these Forum posts.


In most cases we are provided with examples of the syntax which the customer has used. In some cases the syntax is very complex and, often the syntax is incorrect.


Some customers ask us whether the syntax is correct or ask why their proximity search is not working. This is something that we cannot answer on an individual basis. The point of this document is to provide examples to help our customers to get a better understanding of proximity search syntax so that they can create the correct search syntax for the search that they want to perform.


Note: Most of this information applies to all versions of Intella which support Proximity searching. There is a known issue with hit highlighting in versions prior to 1.9.1. We recommend that you update to version 1.9.1 if you encounter this issue.



What is a proximity search?

Proximity searches are search syntax specifically crafted to find items based on words that are within a specified maximum distance from each other in the item’s text. For example, if I wanted to find all items that have the words 'desktop' and 'application' within 10 words of each other then I would use the following:


“desktop application”~10


A proximity search differs from a phrase search in that it does not matter whether 'desktop' is before or after the term 'application' in the text. For example, documents containing either of the passages of text below will be respondent to the proximity search above.


"You must turn on your desktop computer before you can open an application."


"I have copied the shortcut for the application onto the desktop."



Using the Correct Proximity Syntax

As mention above we receive proximity search syntax from customers. A lot of the time we see that the customer has created search strings such as the examples provided below:

  1. (Baxter Jason) ~20 (article) OR (paper) OR (presentation) OR (public) OR (report)
  2. "national OR fire OR service"~30 (truck) OR (department)

These examples have been sanitized and shortened however, the original search strings contained several lines of OR statements. This makes the search string complex, cumbersome, prone for errors and difficult to troubleshoot.


Example 1
If we look at the first example above, we can see immediately that there are several issues which make this syntax incorrect. One issue is that the terms to be searched are not encased in double quotes. Another issue is that the number of words to be within (~20 in this case) is not at the end of the proximity search syntax as there are several OR statements after this number.


The user manual shows a basic example of the syntax “desktop application”~10. Note that the structure is to have two (or more) search terms encased in double quotes followed by the number of words that the terms must be within.


The proximity string can be made more useful for larger queries by adding more search terms. The additional search terms need to be separated by the OR operator and encased in parentheses. For example, the first example above could be rewritten this way: "(Baxter OR Jason) (article OR paper OR presentation OR public OR report)"~20. Because the user is looking for one of two terms within 20 words of one of several other terms, we have grouped the keywords by placing them in parentheses and separating the terms with the OR operator, e.g: (Baxter OR Jason) and (article OR paper OR presentation OR public OR report).


Note: All of the search terms are still encased in double quotes, followed by the number of words that the terms must be within. This syntax will return any items where Baxter or Jason is within 20 words of article, paper, presentation, public or report.


Example 2
Again we see that there are issues with the search syntax in example 2. This time double quotes are used however, they do not encase all of the search terms. Also, we see a similar trend to example 1 where there are several search terms within parentheses and separated by the OR operator. We see a lot of samples like this and wonder whether this format of proximity search has come from another tool.


The way I read this example is as follows: Find all items that have national, fire, or service within 30 words of truck or department. The syntax can be rewritten this way:  "(national OR fire OR service) (truck OR department)"~30. Again we use the parentheses to group the search terms into the two groups and make sure that all terms are encased in double quotes.




  • Because the double quotes need to encase all of the search terms, you cannot have a search phrase within a proximity search. A search phrase would require double quotes and you can't have nested double quotes within a proximity search. That said, you can use phrases in keyword lists (see below).
  • In the past we have been provided with proximity search strings where the syntax contained over 40 words separated by the OR operator. As mentioned above, this format is not correct. Even if we corrected the syntax, 40 words in a proximity search makes the search string complex, cumbersome, prone for errors and difficult to troubleshoot.
  • We have also received extremely long search syntax where all search terms contained wildcards. Such complex queries with many wildcards are known to have very poor performance, especially for hit highlighting in the Previewer window.




There are a couple a methods one could use to manage complex proximity searches that contain a large number of search terms separated by the OR operator. One is to break down the search string and two is to use keyword lists.


Breaking down the search string
A complex search string can be broken down into several shorter proximity search strings. The shorter search strings are then placed into a keyword list. E.g.


“Baxter article”~20
“Baxter paper”~20
“Baxter presentation”~20
“Baxter public”~20
“Baxter report”~20

Intella will be able to process the list of shorter proximity searches more efficiently than one large complex search string.


With a small amount of Excel work you can create a keyword list that includes all of your shortened proximity searches in a single list


Using keyword lists
The idea behind using keyword lists is to reduce the number of items that your proximity search needs to search across. Two keyword lists can be created, one list which contains the search terms in the left group of a proximity search and a second list which contains all the other terms in the right group, e.g.


Keyword list 1        Keyword list 2
Baxter                    article
Jason                     paper


Next, run the two keyword lists and Tag the overlapping cluster. This cluster will contain the items that have search terms from both keyword lists.


Set this Tag as an 'Include' search and run the proximity search. This provides faster searching as you are not searching over the entire dataset. However, be aware that hit highlighting can still be slow or hang Intella if the proximity search is complex and contains wildcards.


The advantage of using keyword lists is that you can use the following types of searches and operators:

  • Wild cards (article*, paper* etc)
  • Phrases ("national fire", "fire service" etc)
  • Other search operators
  • Like 1
  • Thanks 1
Link to post
Share on other sites
  • 1 year later...



Since phrases are not currently supported in proximity searches (fingers crossed that's on the way!), the idea of grouping terms is intriguing.


Your example of  

"(Baxter OR Jason) (article OR paper OR presentation OR public OR report)"~20 

only uses the OR operator.


If what I needed to find was actually any item with BOTH Baxter AND Jason within 20 words of any of the others, would an AND operator in the first group suffice?:

"(Baxter AND Jason) (article OR paper OR presentation OR public OR report)"~20 

Link to post
Share on other sites


If what I needed to find was actually any item with BOTH Baxter AND Jason within 20 words of any of the others, would an AND operator in the first group suffice?:

"(Baxter AND Jason) (article OR paper OR presentation OR public OR report)"~20 


Unfortunately, it will not work. The AND operator has no meaning within the phrase and proximity searches.  

Link to post
Share on other sites
  • 2 years later...
25 minutes ago, CEG0 said:

And what if we need to search for combinied proximity, is this example correct ?

"("Baxter Jason"~10) (article OR paper)"~20


No. Proximity or phrase query may not include another one.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...