This article explains how Card Recon, Data Recon, and Enterprise Recon allocate and use system resources during scanning. In the case of Enterprise recon this information applies to the machine that is being scanned.
Ground Labs has developed a high performance pattern matching engine specifically designed for identifying PCI and PII data. This underlying foundation enables our products to operate at extremely high levels of efficiency. Our underlying engine is capable of scanning in excess of one gigabyte per second on a 2.8GHz single-core CPU.
It is important to note that in real world use the scanning speed will be significantly slower. The primary factor impacting scanning speeds is disk I/O performance. Magnetic hard drives can typically deliver around 50MB per second. Solid State drives (SSDs) are considerably faster, with speeds in excess of 500MB per second being typical.
The CPU will be used throughout a scan to decode the contents of each file in an attempt to identify genuine PCI/PII data. Our scanning engine is designed to utilise only a single core of a CPU to ensure other cores remain available for use by other applications.
By default, the scanning engine will run in low priority mode ensuring any other applications requiring CPU resource will be given priority. On Windows machines we set the priority of the scanning thread to THREAD_PRIORITY_LOWEST. On all other operating systems, we set the nice level to 15.
It is possible to change this setting in the software UI. Under normal priority mode we set the scanning thread on Windows machines to THREAD_PRIORITY_NORMAL, and on all other operating systems we set the nice level to 0. This is the default setting which means our software will compete on an even playing field with all other processes. More information on this can be found in a separate article.
We have benchmarked our scanning engine on a 2.8GHz single-core CPU and it has achieved throughput in excess of one gigabyte per second with maximum CPU utilisation.
Memory is used by the scanning engine throughout a scan to temporarily store data being read from disk and to index complex file types when required. Our products have been designed with the ability to read files of any size without excessive memory usage. Where possible large files are read incrementally in small chunks to minimise the amount of memory consumed.
The memory usage for a scan will typically vary between 50MB and 100MB. Compressed archives and OCR scanning may require additional memory for short periods.
Disk I/O is the speed at which data can be read from a disk by the scanning engine when attempting to identify stored cardholder data. This is the most important factor that will determine the speed of a scan.
Whilst a scan is being performed, each and every file on the target file system is opened, decoded, scanned, and closed. Disk I/O metrics will increase and decrease throughout a scan depending on complexity and size of each file being scanned. These spikes will be a reflection of the data types being read.
Proxy Scanning ("Agentless Scanning")
How Agentless Scanning works
Enterprise Recon supports scanning of devices without local installation of a Node Agent with some limitations. This feature is designed for situations where local installation is not feasible. There are a few steps to an agentless scan:
- The Proxy Node Agent transmits the scanning engine to the scan target. It is necessary to transmit the scanning engine prior to each scan, which will use a small amount of bandwidth and disk space; at present the engine is approximately 10MB.
- The scan commences.
- Results and status updates are transmitted from the scanning engine to the Proxy Node Agent, which forwards them on to the Master Server.
- The scan completes, and waits for all results to finish transmitting to the Proxy Node Agent.
- The scanning engine self-destructs.
Most system resources (CPU, RAM, and Disk I/O) will be used by the scanning engine on the scan target. The Proxy Node Agent will require network bandwidth to stream data between the target and Master Server. The requirements for the Proxy Node Agent will scale linearly with the number of scans being handled. Multiple scans directed at the same target will run sequentially; scans directed at different targets will run in parallel.
Estimated usage of Proxy Agent resources
Example 1: 32GB available RAM, 10 processor cores, 10 concurrent scans
If 10 concurrent scans are running through the proxy agent, then we expect there to be no RAM shortages and no CPU over scheduling.
Example 2: 32GB available RAM, 10 processor cores, 16 concurrent scans
If 16 concurrent scans are running through the proxy agent, then we expect there to be no RAM shortages but there may be some CPU over scheduling, especially if all content scanned is of a complex nature (Eg. zip files and PDF documents). In this case however we would only expect a moderate impact to overall scanning performance.
Example 3: 32GB available RAM, 10 processor cores, 30 concurrent scans.
If 32 concurrent scans are running through the proxy agent, then there may be some RAM shortages and perhaps some RAM swapping, but only if the network can provide data faster than the scanning engine can scan that data. The system could also use swap memory which could slow the scan down significantly. Note that if the system is low on RAM, then the scanning process may be halted.
Content/data types are also a key determination factor when attempting to understand the potential duration of a scan. As a guide, more complex content types can be classified into the following categories:
Simple content types (Low CPU/memory required)
- MS Office documents
- PDF files (where content within is mostly plain text)
- TXT, RTF, CSV, XML, HTML files
- TAR, and other uncompressed archive formats
- File formats that do not store data in methods requiring the use of lookup tables or complex indexes, or require intense mathematical calculation to extract the raw data contained within
Complex content types (Higher CPU/memory required)
- ZIP, GZ, RAR, and other compressed archive formats
- Email storage
All information in this article is accurate and true as of the last edited date.