Jump to content

Search the Community

Showing results for tags 'near-duplicate'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • W4
    • W4 - Forensic Triage
    • Wishlist Forum W4
  • Intella
    • Intella 10, 100, 250, Pro and TEAM
    • Wishlist Forum Intella
  • Intella Connect
    • Intella Connect/Node
    • Wishlist Forum Connect
  • Talking Tech with Vound
    • Webinars

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


AIM


MSN


Website URL


ICQ


Yahoo


Jabber


Skype


Location


Interests

Found 2 results

  1. Intella does paragraph-level deduplication. What we'd like to stipulate here is the identification of near-duplicate items (and paragraphs). This could be done using shingles, calculating the ratio of shared shingles amongst items (shingles from item A contained in item B and vice-versa). See also "Jaccard Similarity."
  2. We are in a situation where would like to identify the near-duplicates of files of varying type, based on the file's content alone. Intella's Smart Search feature will allow us to do this one file at a time, but not in mass. For example, we would like to compare all files in Set A to all files in Set B and identify which files are near duplicates of one another. Has anyone successfully tackled a problem like this using Intella?
×
×
  • Create New...