This article attempts to explain why a scan takes longer than desired to finish.
When scanning large volumes of data, it is normal for such scans to run for extended periods of time.
This is due to the underlying scanning engine being designed to fulfill 2 important requirements:
- The scan must be accurate - all files must be scanned. Nothing can be ignored.
- The scan must be gentle on system resources resulting in minimal or no impact to production applications or users.
It is for this reason why sometimes a scan may take longer than expected.
- If system resources are not a concern, run the scan in Normal Priority mode.
This will instruct the scanning engine to compete for system resources on equal terms to other software.
- If the system being scanned contains large complex files such as databases or mail stores, you may wish to consider running a separate scan for these data store types independently.
This will enable a general scan of the target system to complete in a shorter time.
A special scan can then be executed focusing only on the complex data types.
In this section, we shall take a look at the main fundamental factors that affect raw scanning performance:
- The rate at which data is fed to the scanning engine
- The rate at which the scanning engine processes the data
The limitations that goes into point 1 would include sub-factors such as disk I/O, network I/O (for remote/proxy scans), CPU/RAM availability, etc.
To give a simple example, scanning on an SSD would perform better than on a regular HDD.
The next layer of limitation would be point 2 - specifically, the rate at which the scanning engine decodes & parses files.
Examples include; unzipping archives, parsing documents, traversing PST files, etc.
The more complex the file type, the longer it takes for the scanning engine to process.
The final layer of limitation would be the stage where the scanning engine pattern-matches (to determine what is sensitive data and what isn't).