Jump to content

Variance in un-deduplicated numbers between keyword list auto-tag results and resulting Facet>Tags


Recommended Posts

Hi all,

I'm using 2.4.2 Pro.

I used the "Keyword Lists" facet to add a txt file with about 10 search terms. Nothing too hardcore; a mixture of single words, double-quoted strings and one double-quoted string with an asterisk in the middle, ie


"some words"

"more* words"


I noted the raw results in a spreadsheet to give to the investigator and then went to the Tags facet to load each keyword tag subset and deduplicate them to also get that number (not realising the Keywords tab also gives a deduplicated number at this stage).


What I've found is that the numbers in the initial results/available in the Keyword tab are different to what the Tags facet shows, but the deduplicated number is the same in both views.


When I did the initial auto-tag, I chose "Only tag the selected item" under tagging options. I get the same numbers whether I check or uncheck the "Tag all duplicates" box down the bottom of the tagging options.


Some examples:

[keyword 1] (via the Keywords tab, all options checked under "search In") Items: 22422, Deduplicated: 3759, Hits: 91674, Families: 1886, Family Items: 193709

Under the Tags facet, it shows 25413 items, 3759 deduplicated


[keyword 2] Items: 629, Deduplicated: 260, Hits: 3083, Families: 516, Family Items: 13607

Under the Tags facet, it shows 1746 items, 260 deduplicated


If I "search" all tags I get 194363 items, 31895 deduplicated. Keywords tab shows same deduplicated but 193605 items.


So essentially, the un-deduplicated numbers in the Tags facet is higher than the Keywords auto-tag or Keywords tab search results.


Any thoughts here - which is the "correct" number?

Are they both correct and I just don't understand the variance?

Is there a setting I need to check/change somewhere?




Link to comment
Share on other sites

Hi Shaun,

We are trying to understand the issue that you have. Note that the 'Tag all duplicates' option wont make a difference if you have run a KW search. This is because the hits from the KW search will include all duplicates. If this is not the issue you are having, can you submit a support ticket with screenshots of what you are trying to do and the results that you are getting please. You can submit a ticket on our support page here.



Link to comment
Share on other sites

  • 2 weeks later...

For anyone looking at this post in the future, I installed v2.5 and re-ingested the case data and it all worked as expected.

I was unsatisfied with still not properly understanding what the issue was, and I investigated further.

While I don't understand exactly how I did it, especially as I had performed a very limited number of actions in the case before seeing the variance, I believe I invariably added extra items to all of the tag groups in the case.

Link to comment
Share on other sites

Nope, scratch that. I remembered the "Export>Event log" feature and ran that and it clearly shows (in 2.4.2) one group of actions adding the tags and getting the wrong/higher number of results, and then me loading the case in 2.5, deleting the tags altogether and auto-tagging again and getting the "correct" number (the number that matches the keywords tab search.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...