Jump to content

Cancelling indexing process


AdamS

Recommended Posts

Cancelling an indexing process seems to take a long time. I'm not sure if this is something that can be avoided but in some situations maybe it could be sped up..

 

I started to index a case a short time ago and cancelled the process almost instantly, it had only detected a single folder and had not processed any content. So far the cancelling process has taken 30 mins and it's only at stage 8 of 10, where as I've had entire indexing processes of 3 or 4 GB complete in a similar amount of time.

 

Edit: just as a point of interest this entire process took 1 hr 20 mins. I understand the database is a complicated beast so this is more to make you aware of this in the hope that the ability to stop can be improved, or even better the ability to pause an indexing process and resume it at a later stage.

Link to comment
Share on other sites

  • 1 year later...

I suspect this will be a difficult one but I can't express how much I would like this ability.

 

I have several times been caught in the field where I'm using my laptop to run large indexing over 1TB or more and due to the fact I have to use USB drives it takes days to index. In many instances I have to use the laptop for other processes as well and quite often it means I have to cancel the indexing process and then re-index at a later date.

 

Part of me thinks pausing shouldn't be that complicated as the software should know exactly where it's up to in the source data and it should be able to pick up where it left off, but if it was that simple you would have already implemented it.

 

In any case, a huge please for the road map and sooner rather than later if it is possible :)

Link to comment
Share on other sites

Hi Adam!

 

Indeed it's not exactly a trivial feature to add. We have such feature on our long-term road map, but before we get to that point in time, maybe you would have some luck with the "Index new data" feature? You could try "Stopping" your indexing process and when the time is right make use of the aforementioned option. It should scan all sources in search of new evidence items and process the ones which haven't been processed yet. Depending on the nature of the source and where you have stopped the original indexing task it can take less time than full re-indexing of the entire case. Of course it would be good to give it a try on some test cases and compare the outcome.

Link to comment
Share on other sites

Thanks Lukasz, I suspected it wouldn't be a simple matter. I considered the 'index new data' feature but wasn't sure how closely it would look to determine if it is 'new' or not, ie will it drill right down to all the files and check if they have been indexed or not, or will it simply look at the parent folder name.

Link to comment
Share on other sites

I ran a few tests on a very small data set (1.6GB) and got some slightly conflicting results. This is far from definitive but interesting none the less.

 

First Test:

Ran indexing until it hit about 60% then stopped, process continued to run and finalize with 1905 items indexed.

Indexed 'new data' only and a further 1 item was indexed bringing the total to 1906

Deleted the data and setup a new case and indexed the data, final result 1906 items.

 

So far so good.

 

Second Test (same data set):

Ran indexing and almost immediately stopped the process, finished with 45 items indexed.

Indexed 'new data' and after process finished a total of 1901 items indexed (5 missing/not indexed?)

Deleted the data and setup a new case, indexed and final result 1906 items.

 

With eagle sharp hindsight I realised later that I should have created some MD5 lists to try and identify which 5 files were not indexed. However I will repeat the test exactly the same again and see if this was a one off, or if I can duplicate the discrepancy.

Link to comment
Share on other sites

Hi Adam,

 

"Index new data" will only pick up new top-level files and folders. It would not "refresh" existing items such as PSTs, even if they were partially indexed. Therefore it's not a proper solution for the "Pause indexing" and that explains the results you had.

 

We may improve how it works in a future version.

Link to comment
Share on other sites

  • 1 year later...

Yes, we need really a "Pause/Stop Indexing of this source" button and "Abort Immediately indexing of this source" button.

 

The first should only stop at proper "borders" and complete the indexing of the "current items".

The second - I would say abort immediately, leaving the source being indexed in a consistent state (before this round of indexing was started, i.e. throw out all new data since it hasn't been merged yet). If many sources were re-/indexed, this should only affect the current source.

 

The "Pause" may take a long time, e.g. indexing 20GB PST file, but... this can be improved, depending on how granular is the re-index function (e.g. will it stop at folders in PST, despite the PST file having same MD5 (due to being interrupted)) I guess, introducing partially_indexed flag for each item can be a saver.

Link to comment
Share on other sites

@Kalin, I agree with you, we have exactly such extensions in mind. It's just a matter of finding the right time for them in our release cycle, which is already quite tight. We can't promise when these features will be delivered, but I'll make sure that your feedback is captured in our internal systems.

Link to comment
Share on other sites

×
×
  • Create New...