@AdamS and @jon.pearse, firstly many thanks for responding to my query below. I'd neglected this project a little but I'm looking at it again and I'm almost there. Just one query though.
So, I have used an AI translation provider to translate documents and have the translated files named by their ItemID ready for importing via the --importText CLI option. This works fine however the content isn't as I hoped when verifying it in Intella. Checking one of the .txt files that has been imported, it has the below phrase in:-
"Dans les années 2000, la société pharmaceutique"
however when this is imported via the --importText CLI option, it reads as the below in the 'Imported Text' tab:-
"Dans les ann es 2000, la soci t pharmaceutique"
It would appear that the handling of foreign characters such as the 'é' aren't being imported correctly and are being replaced by a whitespace. I'd imagine this is an encoding issue. Is there anything I can do to address this? I'm just conscious that if a reviewer searches on any 'foreign' characters, it may not return hits, so for example, if the reviewer searches on 'société pharmaceutique', there would be no hits when technically this is correct but isn't correct at the same time too (if that makes sense).