It was hard to come up with a title to describe this post, so let’s jump straight to questions this technique helps answer:
- What important key words can be found in a free text field?
- What Qlik functions appear in reload logs or front-end expressions?
- What connections or QVD names appear in reload logs?
- What SQL functions appear in a reload log?
- What data model fields are used in front-end expressions?
For the example in this post, I am going to search for instances of Qlik scripting functions found in reload logs. Ex., What are the most common functions used? Which functions are not used at all that perhaps we could research what they do? What functions might we expect to find that we don’t see? What applications are likely to have a lot of script complexity?
Qlik is quite efficient at this, so parsing a lot of data will not be a problem. In the example I built for this post, it parsed over 120,000 log rows per second on my laptop.
Step 1: Create a mapping table with the values you want to find
To get the list of all script functions, I used Patrick‘s clever web scraping script, although for simplicity I will load the output of that with an inline load here. I am also keeping this as a resident table, so I can still see the domain of all possible functions in the final app, not just those that were found in the logs.
Next create a mapping table using this resident table. Notice that the second column of our mapping table is the same as the first column, only with a very unusual sequences of strings before and after — sequences that we’re unlikely to encounter in a typical log. (I stored them in variables because we’ll use them more than once.) This essentially just adds a prefix and suffix to instances of functions so we can easily find and parse them out later.
LET v_ObscureDelimiterStart = '!^@'; LET v_ObscureDelimiterEnd = '@^!';
Here’s a preview of the what the mapping table values look like:
Step 2: Load the data and “tag” instances of the target values using MapSubstring
MapSubstring is different from ApplyMap because can replace multiple parts of a value, independently. (ApplyMap does full matches only — the value looked up must have an exact match of the whole value in the mapping table.) This attribute of MapSubstring helps because a single row of reload log may contain many functions, all of which we will want to tag.
Here’s an example of a log row and what would result from our expression with MapSubstring above, with IsNull and FileSize being tagged with our prefix and suffix:
- Before: 20200901T082409.376-0400 0041 IsNull(FileSize(‘lib://99_System_File/10_File_Data/ADF Helper.xlsx’))
- After: 20200901T082409.376-04000041!^@ISNULL@^!!^@FILESIZE@^!‘LIB://99_SYSTEM_FILE/10_FILE_DATA/ADFHELPER.XLSX’))(
Step 3: Parse the “tagged” instances into their own table using LOAD WHILE and TextBetween
Great, we’re two-thirds of the way there, although our log data is looking strange, thanks to the weird characters added using our mapping table. Now we will reduce the data to essential: functions by log file, including how many instances of each function are in the file.
I have broken this out into three steps, using preceding loads. (The prefix and suffix we added enable us to efficiently identify and parse out the tagged function instances.)
- Count how many functions were found on each row, which will be a parameter for the next step.
- LOAD WHILE up to the maximum number of functions found on any row, using TextBetween, our predefined prefix and suffix, and IterNo() to capture the nth instance of a function
- Aggregate the count of function instances, grouped by the function name, to summarize the results for the log file
Because they are preceding loads, you have to read the code from the bottom up:
The resulting LogFunctionInstance table has a row per log per function used and the count of times it was used. This bridges the table with the list of logs we iterated through and the table with the list of all functions.
Run with this example
Here is an example application with a simple interface, to follow the entire process, from start to finish. (Some false positives for this use case could be eliminated, but it makes the code more complicated without helping demonstrate the general technique.)
Since we developed this technique, we find new problems it helps solve all the time, especially understanding script dependencies using reload logs. How will you use it?