Calculated Fields

Rationale

The VIZYUL™ Calculated Fields performance tuning rule fires for any data connection with 7 or more calculated fields.

Insight

Consider this, the number of calculated fields in a data connection can have a direct impact on the overall performance of the workbook.  Specifically the calculated fields used on worksheets and those that are calculated at the level of detail.

In order to better understand how calculated impact performance, we’ll walk thru a simple scenario.  My original data source contains three fields, sales, region and items (the number of items sold for that region).  The entire data set has 10 rows of data, keeping it simple.  If I create a calculated field that evaluates the value of each region and then either divides sales by items or multiplies sales time items, I’ve created what’s commonly referred to as a LOD (level of detail) calculation.  Basically LOD calculation means a field where the calculation must be performed for every row in the data set.

As you can imagine, if I drop my new calculated field on a worksheet, it’ going to take tableau longer to aggregated my new field than it would to simply aggregate sales.  For this reason, this rule simply alerts the VIZYUL™ user to an opportunity to optimize the data connection for optimal performance.

Here are a few techniques you can use to optimize calculated fields, based on your specific environment.

  1. I have access to a tableau server to publish my data sources
    1. Create Calculations in the Original Data Source – Offloading the processing of calculated fields onto the origin data source is our preferred approach.  However, we also understand that this method isn’t available to everyone.  For Excel users this means adding an additional column to your spreadsheet that contains the calculation prior to connecting to the spreadsheet with tableau desktop.  You’ll be surprised the performance gain you’ll experience taking this simple step.  For those accessing data on a database server, this usually means a custom sql query that calculates the desired fields.
    2. Optimize Calculations – We recommend, as much as possible, creating calculations that use data that tableau has already aggregated.  An example would be the difference between SUM([sales])/SUM([items]) versus sales/items.  The former causes tableau to aggregate all sales and items before performing the division operation.  The latter is performed for each row of data included in the worksheet.
    3. Use Tableau Desktop’s Optimize Feature (only available for tableau data extracts) – Performing this on a data source causes tableau desktop to generate metadata for the calculated fields you’ve created.  For measures tableau desktop internally stores things like the minimum and maximum values.  For dimensions tableau desktop internally stores the unique set of values for the dimension.  Storing this metadata increases performance by providing information that tableau uses to query the data.  See the screen shot for the location of this feature.
    4. Publish Calculated Fields to Tableau Server – Let’s say you’ve created 50 calculated fields; all of which you need for your dashboards.  If you don’t publish the data source to tableau server, each time you view one of your dashboards, tableau desktop has to calculate each of the calculated fields on the worksheets included on the dashboard.  However, if you publish the data source to tableau server, tableau is smart enough to create actual fields from your 50 calculated fields.  This has the same or better effect as if you generated the calculated fields prior to bringing the data into tableau desktop.  This will always have an immediate positive impact on the overall performance of your dashboards.
      1. PLEASE NOTE: There are a few scenarios where, even if you publish a data source to tableau server, the calculated field will NOT be turned into an actual field yielding optimal performance.  As of this writing this includes ANY calculated field that includes a parameter.  This is because it difficult to determine the actual result of a calculation based on a parameter since the parameter can change at any time.  Calculated fields that use NOW(), TODAY(), USER() or USERDOMAIN() also cannot be materialized (turn into an actual field).  The functions mentioned are specific to a given computer so it’s difficult to determine the actual value of the calculated field.
  2. I DO NOT have access to a tableau server to publish my data sources
    • See 1.1 above
    • See 1.2 above
    • See 1.3 above

 Optimize-Tableau-Data-Extracts

Action

  • Consider the recommended steps above to optimize your data source calculated fields.
  • Consider the additional resources below

Additional Resources

  • http://community.tableau.com/message/221542
  • http://onlinehelp.tableau.com/current/pro/online/windows/en-us/help.htm#extracting_optimize.html?Highlight=optimizing
  • https://boraberan.wordpress.com/2015/01/30/whats-new-in-tableau-9-0-part-2-level-of-detail-expressions/