markjrouse Posted February 22, 2016 Report Posted February 22, 2016 I'm currently using a subset of the Enron data to test the new features of Intella 1.9. I'm using the following keywords, in a keyword list, which I have uploaded: andersen litigation bankruptcy chewco jedi whitewing ljm raptors compensation Under "Statistics" and the "Keywords" tab, I run the stats and get the following: andersen 219 litigation 2915 bankruptcy 4465 chewco 39 jedi 156 whitewing 28 ljm 23 raptors 0 compensation 2257 When I auto tag with the same keywords, it give me the same number of items as per the second column above. These items equal to 10,102. When I run the keyword search, I only get 7,420 items back in my cluster map. At this stage, I don't understand why I'm getting two different figures. I have not applied any additional includes or excludes, nor am I deduplicating, or hiding irrelevant items from my search.
ŁukaszBachman Posted February 22, 2016 Report Posted February 22, 2016 Hi Mark! The difference can be explained by not accounting for items that are containing multiple keywords. When you make use of Keyword List facet to combine multiple queries into one, you are essentially querying the database for items that contain ANY of specified keywords. If you now tag items using auto-tag and then look into Tags facet or make use of Keyword statistics, you are analyzing hit counts for one keyword at a time. To show it to you on a concrete example: Items containing: only 'look': 5 only 'find': 4 both 'look' and 'find': 2 If you then auto-tag your set using KW list containing "look" and "find" keywords, then you'll see following results: tag 'look': 7 (5 + 2) tag 'find': 6 (4 + 2) And Features facet will show you 'Tagged' items: 11 (5 + 4 + 2)If you evaluate this keyword list search from Keyword List facet while having "Combine queries" option checked, then you'll see a cluster with 11 items (because this matches "look" OR "find"). Now, if you uncheck the "Combine queries" option or you switch to Keyword Statistics, then you'll notice the following counts: 'look': 7 (5 + 2) 'find': 6 (4 + 2) .. which of course is consistent with the item counts reported via auto-tagging.
markjrouse Posted February 24, 2016 Author Report Posted February 24, 2016 The keyword stats are clear that there are a total of 10,102 items that have met my keywords. If I now report back to the reviewers that there are 10,102 items, then the will expect to see 10,102 items for review and not 7,420. In Pro, if I untick the "combine queries" option I lose my cluster map, and it becomes a set. In connect, I can only bring up the cluster map, I don't seem to be able to untick the "Combine queries" option. As it stands, I don't know what figure I should report back to the reviewers.
Andrej Posted February 25, 2016 Report Posted February 25, 2016 Mark, if you would like to know the amount of items that need to be reviewed, then I would suggest to evaluate the keyword list with "Combine queries" ticked. If you would like to know the number of items responsive to individual keywords, then I would suggest to use the keywords statistics panel. If none of the above is what you expect, then could you please describe what kind of report you have in mind?
Recommended Posts