Introduction
Hello Rockers!
Today I want to introduce you to some very interesting study concepts on how Qlik stores your information and how we can take advantage of this knowledge to decrease the size of stored QVDs, in addition to favoring content standardization.
To make you feel convinced, I will demonstrate one of the results in advance. A storage QVD with 8 columns and 20 million rows that had 333.4MB went to 205.1MB. A reduction of approximately 38.5%!
“- Why do you use dark background on Windows?”
“- Because the light attracts bugs”
How is this possible?
Depending on the database you extract the information from, some contents that are apparently numeric, have an additional text property to be represented. This “mask” is basically how the data will be displayed preferentially.
Example:
WIth mask: 123.456-789
Without mask: 123456789
To start I will comment about currency values. In this format the data is stored in its text format so that it is presented with all its characteristics, such as the “$” in the front, but it also has its numeric format so that arithmetic functions can be performed like a sum.
And just so you know, if you perform an arithmetic function in Qlik with text data, the result will always be null. Example: 1 + B = Null.
In this way, we would have two contents in the representation of currency value like $ 2.58 in the same “cell”:
$ 2.58: in text
2.58: in numeric
When we have two types of content in the same “cell”, Qlik treats it as DUAL property. Isolated numbers without a mask are NUM and texts are simply TEXT.
The second example is about dates. They have their mask content to be presented in the appropriate format (chosen) and also numerical content so that we can make calculations of the type: the date added to N days, months and years. If the field is only text, it will not be possible to calculate it.
01/01/2021: in text
44197: in number
’01/01/2021′ (text) + 3 = Null. In other hand: 44197 + 3 = 44200, which if represented in date format, results in 01/04/2021 (MM/DD/YYYY).
This seems to sound a little strange, but it is simple to understand when you see how Qlik works with few types of fields: numeric, text, null and dual (number and text at the same time). Unlike a database that has so many other variations, such as: Int, BigInt Smallint, Tinyint, Bit, Decimal, Numeric, Money, Smallmoney, Float, Real, Date, Time, Datetime, Char, Varchar, Text, Binary. ..
See some type examples in this site
If you extract a field in the “date” format from a date field, Qlik will interpret it as “dual”. If it is “numeric” or “smallint” it will be numeric.
“- And if it is a “string” it will be text, right?” Wrong!
For text fields Qlik tries to implement a kind of intelligence, thinking that the field can be calculated in the future. If it starts with numbers it will not be just Text, it will be Dual. Like a string field that is “00012345”. For Qlik it will be “00012345” and “12345” at the same time.
There are some peculiarities for this, including parameterization of the connectors, but it is not worth going into details in this article.
What happens in Qlik?
As you may already be thinking, storing a DUAL data consumes more space than a text or numeric data, because it will contain information from both properties.
Who knows, you may want to address, in addition to the data extraction itself, how we can choose which property we will keep them.
To find out the output format, we can use the Qlik Help online. Let’s take a case here to study together, which is the Num() function, one of the simplest to exemplify:
Pay attention to where it refers to the “Return data type” in the image:
This information is interesting, because while you imagine that a “numeric” function will return a numeric property, in fact it returns a DUAL. This is because there is a parameter of this function that is used to create “masks” for the data. You could force “12345” content to be presented in the application as “0000012345”, if you used it as follows:
Num(FIELD, ‘0000000000’) the FIELD
So we would have:
12345: numerical property for arithmetic calculations
0000012345: text property
If the intention is to force a numeric content only, it is not necessary to make Num(FIELD)+0 you can simply use FIELD+0 (zero to force an arithmetic function).
The exception is when the content is a string and requires conversion to text. See about Num#() here:
Now let’s look at the Text() function, does it return DUAL as well?
And to our surprise, on the Qlik website it is DUAL, at the same time that it explains that the numerical result is null. I think it is a documentation bug and I have already reported them, see:
The truth is that Text() will force text-only content. And maybe you ask if it makes any sense to force text on a piece of data that only contains numbers… It makes sense in situations where you’re sure you won’t make calculations, such as: a document number, phone number, your home number at a certain address, code for registering a company in the government, etc.
How do I force Numeric or Text on a field that was originally Dual?
This question is not so simple to answer, but let’s go in parts:
- Transform dual content into text: you can use the Text() function and easily solve your problem. You will be excluding the numerical property from that data and maintaining only the text property;
- Transform a dual content into numeric: you can force an arithmetic treatment. Remember when I mentioned that a “dual” is number and text at the same time, besides that if we use an arithmetic function in a text, the result would be null? So, all you had to do was transform the field with an arithmetic function.Example:
Dual: $ 2.12
Numeric: 2.12
Text: $ 2.12The transformation in the script: FIELD + 0 as FIELD, will result in only numeric content, as it will go through the procedure:Numeric: 2.12 + 0 = 2.12
Text: $ 2.12 + 0 = NullThis way you “remove” the text content from the field.
You can force numeric in all fields that you deem necessary, whether or not there is a future need for calculation. For example: does it make sense to store a text property for content in the transaction code that has 5 zeros in front? Probably not. You can force the field to be numeric only and remove leading zeros, with a simple transformation in the script:
If the content is dual:
TransactionID+0 as TransactionIDIf the content is text, this is because Num#() transforms a string into a number according to some optional parameters:
Num#(TransactionID)+0 as TransactionID
How to check the ownership of a field within the QVD?
I believe there can be several ways, but the one I use most is with the aid of the EasyQlik QViewer application. It makes reading QVDs files much easier, you can download a copy from the official website here.
When we open a QVD through it and click on a cell, it promptly tells us if the content is Text, Numeric or Dual.
Let us now use the QVD files I mentioned at the beginning of the article as an example to compare them. Regarding NUM versus DUAL.
Regarding text content, just as an example, as they remained the same:
Script simulations
Well, enough bulls***, how about practicing a little? Here is a snippet of script for you to copy and paste into your Qlik Sense Desktop. Remember to create a folder connection for QVD stores.
// Extraction Simulared:
Transactions_Extracted:
Load
TransLineID,
TransID,
Num(Mod(TransID, 26)+1,’000′) as Dim1,
Pick(Ceil(3 * Rand1), ‘A’,’B’,’C’) as Dim2,
Pick(Ceil(6 * Rand1), ‘a’,’b’,’c’,’d’,’e’,’f’) as Dim3,
Round(1000 * Rand() * Rand() * Rand1) as Value1,
Round( 10 * Rand() * Rand() * Rand1) as Value2,
Money(Round(Rand() * Rand1, 0.00001)) as Value3;
Load
Rand() as Rand1,
Num(IterNo(), ‘00000’) as TransLineID,
Num(RecNo(), ‘0000000000’) as TransID
AutoGenerate 10000000
While
Rand()<=0.5 or IterNo()=1;
STORE Transactions_Extracted into lib://Database/NUMs/Transactions_Extracted.qvd(qvd);
DROP Table Transactions_Extracted;
// Forcing to NUM at “Value3” and getting decimails
Transactions_NumForce:
LOAD
TransLineID+0 as TransLineID,
TransID+0 as TransID,
Dim1+0 as Dim1,
Dim2,
Dim3,
Value1,
Value2,
Value3+0 as Value3
FROM [lib://Database/NUMs/Transactions_Extracted.qvd](qvd);
STORE Transactions_NumForce into lib://Database/NUMs/Transactions_NumForce.qvd(qvd);
DROP Table Transactions_NumForce;
// Forcing NUM with 3 decimals limitations at Value3
Transactions_Optimized:
LOAD
TransLineID+0 as TransLineID,
TransID+0 as TransID,
Dim1+0 as Dim1,
Dim2,
Dim3,
Value1,
Value2,
Round(Value3, 0.001) as Value3
FROM [lib://Database/NUMs/Transactions_Extracted.qvd](qvd);
STORE Transactions_Optimized into lib://Database/NUMs/Transactions_Optimized.qvd(qvd);
DROP Table Transactions_Optimized;
// Round force to 2 decimals. Lost of values without real benefits
Transactions_RoundForce:
LOAD
TransLineID+0 as TransLineID,
TransID+0 as TransID,
Dim1+0 as Dim1,
Dim2,
Dim3,
Value1,
Value2,
Round(Value3, 0.01) as Value3
FROM [lib://Database/NUMs/Transactions_Extracted.qvd](qvd);
STORE Transactions_RoundForce into lib://Database/NUMs/Transactions_RoundForce.qvd(qvd);
DROP Table Transactions_RoundForce;
Curiosities
When you force a currency to numeric field, in some cases you will have more than two decimal places as a result. This I demonstrate in the example Transactions_NumForce. That although the size of the QVD has already decreased considerably, there is still some room for improvement;
If you force Round (FIELD, 0.01) so that the currency field has two decimal places, you may lose some values during rounding, so I don’t recommend it, even because it won’t affect the file size that much. See the comparison between Transactions_Optimized and Transactions_RoundForce;
I usually use Round () with at least three decimal places. Stay as an example in Transactions_Optimized. However, you cannot use this as a silver bullet. Each case is different, it may be necessary for your business to maintain a larger number of decimal places.
And as a final comparison, see how the file sizes look:
Some benefits of studies and techniques
– Reduction in the size of QVDs undergoing transformation for storage;
– Standardization of data;
– Encourage the development to know better the data that is working;
– Decrease cardinality, thus gaining performance in the final application.
“- Nice! What about the pumpkin?”
Its size was reduced.
Did you find this study cool? Do you believe it fits in any scenario of yours? Did you have any cool ideas with that? Comment!