14. Keyword Statistics

The Keywords tab gives detailed statistics about the keywords in a keyword list. The workflow is as simple as choosing a keyword list, specifying several calculation options, and clicking Calculate. This will produce a table showing the keyword list and several statistics for every keyword query in the list.

The nature of the information shown here potentially goes beyond what can be established in the Search tab.

Keyword stats

14.1. Configuration

All controls for configuring the calculation are placed on the left side of the tab. The options are divided into three groups:

  • The keyword list to use.
  • The document fields to search in.
  • The statistics to be calculated.

At the top, the user can choose a previously uploaded keyword list or add one here. This uses the same collection of keyword lists as the Keyword Lists facet in the Search tab. Any list added in the facet can be used here and vice versa.

Although we call this functionality “keyword statistics”, the user can use the complete full-text search syntax here: wildcards, Boolean operators, phrase queries etc. are all available. Field-specific searches are also possible. When used in a query, these overrule the field settings set in the second panel.

The second panel offers the available search fields. These are the same as offered in the Search tab. By default, all fields are searched, but the user can choose to restrict searches to e.g. the document text, email headers, etc. Any combination of fields can be used.

The last panel offers five checkboxes that determine what information the table will contain:

  • The Items option adds columns indicating:

    • the number of items containing the keyword,
    • the corresponding percentage of items, and
    • the deduplicated number of items.
  • The Hits option counts the number of occurrences of the search term in the texts. For example, when a keyword produces a document that contains the keyword 3 times and another document that contains the keyword 5 times, this column will show 8. The hits are counted across all the selected search fields, but only on the deduplicated items.

  • The Custodians option adds a column for every custodian in the case. Each custodian column indicates how many of the matching items originate from that custodian.

  • The Families option adds two columns: “Families” and “Family items”. A family is an item set consisting of a top-level item (e.g. a mail in a PST file) and all its nested items (e.g. attachments, embedded images, archive entries). Families are detected by traversing item’s location upwards in the hierarchy tree and finding family root. Items with the same family root are part of the same Family. Certain types of items are skipped when determining the family root, namely all folders, mail containers, disk images, load files and cellphone reports. The meaning of the two columns is then as follows:

    • The Families column shows in how many families the keyword occurs. For example, if a mail and two of its attachments all contain the keyword, that counts as a single family.
    • The Family Items column shows the total number of items that are contained in these families. This may (and usually will) include items that do not contain the keyword at all; they just belong to a family that has a hit in one of its other items. In cases where you are not directly exporting search results but rather their top-level parents (i.e. the default setting when exporting to PST), this will tell you how much of the case is conceptually being exported this way. This may give an indication of how well a certain search filters items in a case.
  • The Saved searches checkbox enables a “Configure…” button. Clicking this button opens a dialog in which the user can select one or more saved searches stored in the case. Each saved search will then be represented by a table column, showing how many items matching the saved search also match the keyword in that row.

14.2. Calculation

When the Calculate button is clicked, Intella Connect will populate the table after finishing all calculations.

The time required for the calculation is dependent on several factors, including the size of the keyword list, the hardware, the chosen search options and the storage location and size of the case. While most options can benefit from indices that make the calculation fast regardless of case size, the Hits option will have a considerable impact on the search speed.

The progress of the calculation will be shown in the status panel above the table.

During calculation, the Calculate button will change into a Stop button, allowing for manually terminating the process.

When clicking Calculate again, the previous results will be discarded and the table will be populated from scratch, using the (possibly changed) configuration options.

14.3. Results

The table order is the same as the order in the keyword list.

Once calculation has completed, the table can be exported to a CSV file by clicking on the Export button in the top right corner.