Jump to content

Recommended Posts

Posted

I'm currently using a subset of the Enron data to test the new features of Intella 1.9.

 

I'm using the following keywords, in a keyword list, which I have uploaded:

 

andersen

litigation

bankruptcy

chewco

jedi

whitewing

ljm

raptors

compensation

 

Under "Statistics" and the "Keywords" tab, I run the stats and get the following:

 

andersen              219

litigation              2915 

bankruptcy         4465 

chewco                   39 

jedi                       156

whitewing               28

ljm                          23 

raptors                     0

compensation    2257

 

When I auto tag with the same keywords, it give me the same number of items as per the second column above. These items equal to 10,102.  When I run the keyword search, I only get 7,420 items back in my cluster map.

 

At this stage, I don't understand why I'm getting two different figures. I have not applied any additional includes or excludes, nor am I deduplicating, or hiding irrelevant items from my search.

Posted

Hi Mark!

 

The difference can be explained by not accounting for items that are containing multiple keywords. When you make use of Keyword List facet to combine multiple queries into one, you are essentially querying the database for items that contain ANY of specified keywords. If you now tag items using auto-tag and then look into Tags facet or make use of Keyword statistics, you are analyzing hit counts for one keyword at a time. To show it to you on a concrete example:

 

Items containing:

  • only 'look': 5
  • only 'find': 4
  • both 'look' and 'find': 2

If you then auto-tag your set using KW list containing "look" and "find" keywords, then you'll see following results:

  • tag 'look': 7 (5 + 2)
  • tag 'find': 6 (4 + 2)

And Features facet will show you

  • 'Tagged' items: 11 (5 + 4 + 2)

If you evaluate this keyword list search from Keyword List facet while having "Combine queries" option checked, then you'll see a cluster with 11 items (because this matches "look" OR "find").

 

Now, if you uncheck the "Combine queries" option or you switch to Keyword Statistics, then you'll notice the following counts:

  • 'look': 7 (5 + 2)
  • 'find': 6 (4 + 2)

.. which of course is consistent with the item counts reported via auto-tagging.

Posted

The keyword stats are clear that there are a total of 10,102 items that have met my keywords. If I now report back to the reviewers that there are 10,102 items, then the will expect to see 10,102 items for review and not 7,420. In Pro, if I untick the "combine queries" option I lose my cluster map, and it becomes a set. In connect, I can only bring up the cluster map, I don't seem to be able to untick the "Combine queries" option.

 

​As it stands, I don't know what figure I should report back to the reviewers.

Posted
Mark,

 

if you would like to know the amount of items that need to be reviewed, then I would suggest to evaluate the keyword list with "Combine queries" ticked.

If you would like to know the number of items responsive to individual keywords, then I would suggest to use the keywords statistics panel.

 

If none of the above is what you expect, then could you please describe what kind of report you have in mind?
×
×
  • Create New...