Hyperion Essbase: BSO

Tuesday, December 31, 2013

Optimizing Report Scripts in ASO Applications

Report script is one of the powerful reporting options in ESSBASE. Report scripts are easy to write and give us flexibility to select the members and format we need.

Performance will always play a key role here. How do we get best performance from a report script? As like many other areas in essbase, it's achievable by tweaking and testing various settings.

Let's see the report script in a different angle,

Based on my understanding, essbase engine runs a report in two phases. One, fetching the data and preparing the data in the format we need, Two, writing the report to an output file. There are different options to improve both phases.

Let's start from second phase (this is where I get more performance ) , where data will be exported to a file.

I will always get dramatic performance improvement when I arrange dimensions in row command in such a way that the dimension with least number of members that you are exporting in a report comes left most and dimension with most number of members comes right most ( ascending ).

**** Here is the confusion part; I said number of members in your report script, not members in your outline. There is difference between two statements.

For example, I have an outline with 5 dimensions ,

Account - with 10000 members

Entity - with 5000 members

Product - with 500 members

Market - 50 members

Time - with 12 months

I want to export a report with

100 Accounts

5000 Entities

500 Products

50 Markets

1 Month

Then, my report looks like,

<ROW( "Time", "Market","Accounts" ,"Products","Entities")

As I said earlier, I arranged my dimensions in ascending order based on number of members in report that I am exporting, not based on number of members in outline. This trick drastically improves second phase of the report i.e writing report to a output file. Some of my reports are improved from 100KB/Sec to 5MB/Sec.

Coming to the first phase,
Cache settings on application and database plays main role.
"Pending cache setting" on application level confuses me a lot. As per Database Administrator Guide

"If the input-level data size is greater than 2 GB by some factor, the aggregate storage cache can be
increased by the square root of the factor. For example, if the input-level data size is 3 GB
(2 GB * 1.5), multiply the aggregate storage cache size of 32 MB by the square root of 1.5, and
set the aggregate cache size to the result: 39.04 MB."

But, that never worked for me for cube with input-level data more than 5 GB. I got good performance when i set very higher than the recommendation formula. For example, i got optimal performance on reports for an ASO cube if i set pending cache to 512 MB when input-level data is 10 GB and 1024 MB when input-level data is 40 GB. I do not have a specific recommendation for pending cache , but its better to test with various values and find the sweet spot.

"Buffer size" and "Sort buffer size" also helps to speed up the report scripts. Play around with different settings to find out best for you.

Building aggregations on a cube will speed up the report script if you are pulling data on parent level.

You can also try to use <LEAVES instead of <DIMBOTTOM. Try to avoid dynamic members from member selection.

Wednesday, July 31, 2013

CALCPARALLEL in Essbase - Things worth of knowing...

I am having hard time in writing introduction of this post, thought of writing "Parallelism is one of the most important...." , "Essbase BSO calculation engine can use multiple threads" etc....

But, we all know essbase can use multiple processors for calculating data and that can be enabled by SET CALCPARALLEL command. Let's see how to use it wisely.

Essbase admin guide suggests using parallel calculation to improve the performance. Yes, it is, but this is not true in all cases.

If you do not have any backward dependencies and dynamic calcs in your formula, Essbase will decide a calculation into tasks so that it can run these tasks in different threads.CALCTASKDIMS setting will specifies how many of the sparse dimensions in an outline are used to identify potential tasks that can be run in parallel. If CALCTASKDIMS is set to 3, Essbase takes last 3 sparse dimensions into consideration, and determines number of parallel tasks which can run on parallel. The number of parallel tasks is equal to (or apprx equal to) product of all stored members to be calculated in these 3 dimensions( takes only FIX'ed members in calc). Essbase will divide these tasks to run on number of threads equally based CALCPARALLEL setting.

** Essbase v11.1.2.2 is designed to determine number of CALCTASKDIMS itself. So, we don't need to worry about this at this point.

As usual , we can try best CALCPARALLEL setting with trial and error method. Let's see when to use and not use CALCPARALLEL.

For example, I have a BSO application with 13 dimensions (3 Dense+12 Sparse). I ran a calculation script with CALCPARALLEL 6. This is what i found in logs.

Maximum Number of Lock Blocks: [100] Blocks

Completion Notice Messages: [Disabled]

Calculations On Updated Blocks Only: [Disabled]

Clear Update Status After Full Calculations: [Enabled]

Calculator Cache: [Disabled].

OK/INFO - 1012678 - Calculating in parallel with [6] threads.
OK/INFO - 1012679 - Calculation task schedule [3016,71,1].

OK/INFO - 1012680 - Parallelizing using [2] task dimensions. .

OK/INFO - 1012681 - Empty tasks [2797,71,1].

OK/INFO - 1012672 - Calculator Information Message:

From above logs, Essbase automatically decided to use 2 dimensions to identify parallel tasks ( Essbase decided to use 2 dimensions because i am using 11.1.2.2. Essbase will use 1 task dimension by default in earlier versions). Because of sparsity in my cube, essbase found 2797 Empty tasks out of 3016 identified tasks. 92% of my tasks are empty in this calculation which is bad. So, in this case, using parallelism is not adding up anything for performance even though it reserved 6 processors, it's not even using 10 % of them.

But one interesting observation i made is, above calculation ran faster in serial mode rather than parallel mode. Along with the processors,Essbase is also using some other resources on server to run calc parallel mode. I used only 6 processors (out of 32 processors in the server) only for this calc, But, Essbase had hard time in managing 6 processors for the calculation where 92% of tasks are empty.

So, bottom line is "DO NOT USE PARALLEL MODE JUST BECAUSE YOU HAVE RESOURCES AVAILABLE. USE PARELL CALC BASED ON NON EMPTY TASKS"

So, when to use calcparellel?

I will recommend parallel calculation if Empty tasks are at least 40% of identified tasks. We can play around with order of dimensions in the outline and CALCTASKDIMS settings to reduce number of parallel tasks and empty tasks. We decide number processors based on the resource available and other things running on the server etc... Better start with 2 processors.

We recently upgraded from 11.1.2.0 to 11.1.2.2. A guy from Oracle development team told me that they enhanced parallelism in new version. Instead of improving the performance, it has deprived after up gradation. We have tuned parallelism in calcs which are now running better than previous ones. 11.1.2.2 is doing better job in analyzing parallel tasks than previous version. So, tune your calcparallel if calcs are running longer in 11.1.2.2 compared to previous versions.