admin Posted April 30, 2014 Report Share Posted April 30, 2014 How to Test Keywords Queries Our Support Team is often asked by our users how to structure a search query (s) using the available Intella search options. Often the user will send in a long list of queries and ask us to help them understand which ones they can use, or ask us to come up with a list of search terms based on a description of what they are hoping for. Unfortunately we are unable to test queries for customers as part of support. We don't wish this to sound unhelpful however, there is usually not one single query but a range of queries and methods needed to get to the final goal. Experience tells us that trying to do a search in one string can often times be more time consuming (working out the query) and much less effective than doing it with a number of searches, filters and tags. This is also consistent with the approach of searches, filters and tags we teach in our training class. Secondly without access to the data it would take Vound support hours to try and recreate the customers scenario with no guarantee that we can supply the answer the customer is looking for. Hence we cannot justify the time and effort of trying to find one Über query. Whether you are looking for syntax advice, how to structure a query or what steps to take to find a set of results. Our suggestion is that you do the following tests. The tests only take a few minutes to do and will lead to much better results and understanding of Intella. Typical Support Question Dear Support, we need your advice on creating a search that will only find documents that contain all of the following: As a single phrase business analyst anywhere that the words business OR analyst are within 10 words distance of responsibilities We tried the query below but it failed as invalid! And we don’t understand why? (("business analyst" " analyst responsibility) /6 (response*)) OR business? AND analyst Suggested Test Method To begin with, we need to perform a test to check the validity of our search syntax. To start the test we create four or more text files. In three files we ensure that the single term appears amongst other text. We add extra text to each document because if you need to find car within 10 words of bus, “car bus”~10 you will need a document with 1 – 10 words between those terms to test effectively. It also makes the test more realistic and possibly shows false positives if you use similar terms in the test, eg car-port, bus stop. In a fourth text file we add all four terms, again amongst other text. We find the easiest way to generate text is to Google the term and select a Wikipedia article. Then copy and paste a paragraph that has the terms we want, eg for the term business analyst see http://en.wikipedia.org/wiki/Business_analyst Once we have all four text files we index the four files in a folder and test the best options. Then we can scroll through looking for the perfect search, and testing for validity of a term. Example: Result Total Count returned "business analyst" "business responsibilities" "analyst responsibilities" 0 "business analyst" (business OR Analyst responsibilities) 0 Fails as the OR in () is invalid "business analyst" business OR Analyst responsibilities 4 "business analyst" "business responsibilities"~10 "analyst responsibilities"~10 1 –Winner! Notes Using Correct Search Language Commonly users will send a list of queries that use search language other than that used in Intella. For example: ” /n, eg duty /5 care – This is a proximity search for a different tool. It will fail to work with Intella. Do use the manual to find and use the correct queries. The Words Tab Advantage Another useful keyword testing method is using the Words tab to see what specific words Intella has listed in its index. For example, if you review the 4 documents produced for our search above you can see that the Words tab lists the word ”BA” short for Business Analyst. Very useful to know as it now changes the search to something like BA "business analyst" "business responsibilities"~10 "analyst responsibilities"~10 Take advantage of the Words tab to help you understand how Intella has indexed specific words. The Words tab is also useful for looking for numbers, and to understand how punctuation and special characters are indexed. For example sarbanes–oxley act These tests can be used to test the validity of a search, find which words you can search for or find the best single query to use specific to your evidence. Attached are the four text files I used in the example above. Please feel free to test this method for yourself. It will also help you to learn the types of searches Intella can perform. Please do feel free to add your tips to this thread for other to use. ALL-TERMS.txtPART1-TERMS.txtPART2-TERMS.txtPART3-TERMS.txt Quote Link to comment Share on other sites More sharing options...
Primoz Posted May 22, 2014 Report Share Posted May 22, 2014 Regular text words and special characters search One of the more frequent questions that hit our Support Team are related to searching keywords separated by special characters. Typical Support Question Dear Support, we need your advice on creating a search that will only find documents that contains exactly this phrase: happy-day We tried with following queries but none of them seems to produce the result we want: happy-day - It looks like it's evaluated as happy AND day. happy\-day - It looks like it's evaluated as happy AND day. Answer Note that during indexing, some of special characters will be filtered out and will never make it into the index. The rules of handling specific characters depend on the context where they occur. For instance, the punctuation characters like dots ('.') or dashes ('-') are significant within numbers, email addresses or host names, while being ignored (i.e. interpreted as whitespaces) between regular text words. In the latter case, escaping those characters will not make them searchable. Exactly the same happens in the case of happy-day phrase where dash is interpreted as white space so representation inside the index is the same as for happy day - that is a reason that all instances of happy day and happy-day are returned when you search with the phrase search "happy day". Actually this is the closet you can get to what you want. Two search queries you provided are represented as: happy day which is actually the same as happy AND day In general you can think of this as if all special characters between text words, also in search terms, would be replaced with spaces so it's actually not possible to search for special characters between regular text words. Now the question arise where special characters can actually be used: - [+-.,%$] are significant in numerals. Example search term: -100.0 - [.-@] in email addresses and host names. Example search term: info@vound-software.com Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.