[ER] Reported Match Counts from Scans of Databases with BLOBs do not tally

Scanning Database Tables with BLOBs

Databases tables with BLOBS  are scanned in ER2 first as text  for PCI or other custom data profiles, and then once complete, the scanning engine will decode the BLOBs and inspect their contents.

Match counts viewed in a report may be construed as double counting on a database table scan but this is incorrect.

There is a count of matches in the text  columns and rows, and then a count of the BLOBs  in the table.

Why does ER2 count matches this way?

While this method may appear counter intuitive at first glance, the count is due to the fact that BLOBs  are treated separately from the overall table scans in ER2. For efficiency considerations, database scanning is performed by the engine in the following manner:

  • The aggregated number for the table in the report is a text scan of the table. This will include BLOBs if the BLOBs themselves contain text.
  • The BLOBs are treated as separate stream scans, so they are logged differently. It is resource intensive to group them together as the engine scans one stream after the other without much associative knowledge.
  • There might be duplicates of the text scan and the BLOB scan if the data is stored in text and decoded as html/xml.

The total number of matches therefore is all the values summed together.

An example to illustrate:

A customer may store xml in blobs which could possibly be highlighted in the table text scan and then in the BLOBs directly when ER2 decodes it as html/xml.

However, in another case, a binary file will be decoded and scanned, but would not be picked up in the table text scan.

This will result in different match counts for both cases.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.