Tuesday, December 31, 2013

Optimizing Report Scripts in ASO Applications

Report script is one of the powerful reporting options in ESSBASE. Report scripts are easy to write and give us flexibility to select the members and format we need.

Performance will always play a key role here. How do we get best performance from a report script? As like many other areas in essbase, it's achievable by tweaking and testing various settings.

Let's see the report script in a different angle, 

Based on my understanding, essbase engine runs a report in two phases. One, fetching the data and preparing the data in the format we need, Two, writing the report to an output file. There are different options to improve both phases.

Let's start from second phase (this is where I get more performance )  ,  where data will be exported to a file.

I will always get dramatic performance improvement when I arrange dimensions in row command in such a way that  the dimension with least number of members that  you are exporting in a report comes left most and dimension with most number of members comes right most ( ascending ). 

**** Here is the confusion part; I said number of members in your report script, not members in your outline. There is difference between two statements.

For example, I have an outline with 5 dimensions ,

Account - with 10000 members
Entity - with 5000 members 
Product - with 500 members
Market - 50 members
Time - with 12 months 

I want to export a report with

100 Accounts
5000 Entities
500 Products 
50 Markets
1 Month

Then, my report looks like,

<ROW( "Time", "Market","Accounts" ,"Products","Entities")

As I said earlier, I arranged my dimensions in ascending order based on number of members in report that I am exporting, not based on number of members in outline. This trick drastically improves second phase of the report i.e writing report to a output file. Some of my reports are improved from 100KB/Sec to 5MB/Sec.


Coming to the first phase,
Cache settings on application and database plays main role.
"Pending cache setting" on application level confuses me a lot. As per Database Administrator Guide

"If the input-level data size is greater than 2 GB by some factor, the aggregate storage cache can be
increased by the square root of the factor. For example, if the input-level data size is 3 GB
(2 GB * 1.5), multiply the aggregate storage cache size of 32 MB by the square root of 1.5, and
set the aggregate cache size to the result: 39.04 MB."

But, that never worked for me for cube with input-level data more than 5 GB. I got good performance when i set very higher than the recommendation formula. For example, i got optimal performance on reports for an ASO cube if i set pending cache to  512 MB when input-level data is 10 GB and 1024 MB when input-level data is 40 GB. I do not have a specific recommendation for pending cache , but its better to test with various values and find the sweet spot.

"Buffer size" and "Sort buffer size" also helps to speed up the report scripts. Play around with different settings to find out best for you.

Building aggregations on a cube will speed up the report script if you are pulling data on parent level.

You can also try to use <LEAVES instead of <DIMBOTTOM. Try to avoid dynamic members from member selection.











Tuesday, September 10, 2013

How does SET AGGMISSG make big difference to your calculation??

SET AGGMISSG works with sparse dimensions, As given in the technical reference if it is set to OFF it will not aggregate #Missing values ,which makes big difference to our calculations. If we look at the below detailed .................example it can explain how it works.
 
If I have an outline like below,
Account(Dense)
A1
A2
A3
Department(Sparse)
D1
D2
D3
 
And I want to load some data to Department level 0 and aggregate the dimension to get data at parent level as my cube is BSO I have to write some calculation to get it aggregated. Assume I have data like below.
Case 1:
A1->D1 =10
A1->D2 =20
A1->D3 =20
And I will use AGG function or CALC DIM to aggregate Department data .
 
SET AGGMISSG OFF
Fix(Account)
AGG(Department);
ENDFIX;
 
Now the data looks like
A1->Department = 50
 
Case 2 : Again run the calculation by sending # Missing to one of the intersections like below.
A1->D1=10
A1->D2 =#Missing
A1->D3 =20
 
SET AGGMISSG OFF
Fix(Account)
AGG(Department);
ENDFIX;
 
Now the data looks like
A1->Department = 30,which is correct .
Case 3 :
Now clear all the data and load like below, and run the calculation.
A1->D1=10
A1-> D2 =20
A1-> D3 =20
 
SET AGGMISSG OFF
Fix(Account)
AGG(Department);
ENDFIX;
 
Now the data looks like
A1-> Department = 50,which is correct.
Then send #Missing to all the intersections and run the calc like below
A1->D1=#Missing
A1-> D2 =#Missing
A1-> D3 =#Missing
 
SET AGGMISSG OFF
Fix(Account)
AGG(Department);
ENDFIX;
 
Now the data looks like
A1-> Department = 50,which is Not correct because we don’t have data for Department children but still it is showing 50 .We have explanation about how it will work but not in a elaborated way to show how exactly it make difference to a calculation results.
Now run the same calculation by keeping AGGMISSG ON
SET AGGMISSG ON
Fix(Account)
AGG(Department);
ENDFIX;
Now the data looks like below
A1-> Department = #Missing which is correct.


So it will not aggregate #Missing values when you set it to OFF only when you don’t have data for all the combinations. It will  only make difference when you don’t have data for all its children of particular member ,it doesn't matter whether or not you set it to OFF or ON if we have data for
atleast one child .
So i will always keep it ON to get  accurate data.


 
 

Wednesday, July 31, 2013

CALCPARALLEL in Essbase - Things worth of knowing...

I am having hard time in writing introduction of this post, thought of writing "Parallelism is one of the most important...." , "Essbase BSO calculation engine can use multiple threads" etc....

But, we all know essbase can use multiple processors for calculating data and that can be enabled by SET CALCPARALLEL command. Let's see how to use it wisely.

Essbase admin guide suggests using parallel calculation to improve the performance. Yes, it is, but this is not true in all cases.

If you do not have any backward dependencies and dynamic calcs in your formula, Essbase will decide a calculation into tasks so that it can run these tasks in different threads.CALCTASKDIMS setting will specifies how many of the sparse dimensions in an outline are used to identify potential tasks that can be run in parallel. If CALCTASKDIMS is set to 3, Essbase takes last 3 sparse dimensions into consideration, and determines number of parallel tasks which can run on parallel. The number of parallel tasks is equal to (or apprx equal to) product of all stored members to be calculated in these 3 dimensions( takes only FIX'ed members in calc). Essbase will divide these tasks to run on number of threads equally based CALCPARALLEL setting.

** Essbase v11.1.2.2 is designed to determine number of CALCTASKDIMS itself. So, we don't need to worry about this at this point.

As usual , we can try best CALCPARALLEL setting with trial and error method. Let's see when to use and not use CALCPARALLEL.

For example, I have a BSO application with 13 dimensions (3 Dense+12 Sparse). I ran a calculation script with CALCPARALLEL 6. This is what i found in logs.

Maximum Number of Lock Blocks: [100] Blocks
Completion Notice Messages: [Disabled]
Calculations On Updated Blocks Only: [Disabled]
Clear Update Status After Full Calculations: [Enabled]
Calculator Cache: [Disabled].
OK/INFO - 1012678 - Calculating in parallel with [6] threads.
OK/INFO - 1012679 - Calculation task schedule [3016,71,1].
OK/INFO - 1012680 - Parallelizing using [2] task dimensions. .
OK/INFO - 1012681 - Empty tasks [2797,71,1].
OK/INFO - 1012672 - Calculator Information Message:

From above logs, Essbase automatically decided to use 2 dimensions to identify parallel tasks ( Essbase decided to use 2 dimensions  because i am using 11.1.2.2. Essbase will use 1 task dimension by default in earlier versions). Because of sparsity in my cube, essbase found 2797 Empty tasks out of 3016 identified tasks. 92% of my tasks are empty in this calculation which is bad. So, in this case, using parallelism is not adding up anything for performance even though it reserved 6 processors, it's not even using 10 % of them.

But one interesting observation i made is, above calculation ran faster in serial mode rather than parallel mode. Along with the processors,Essbase is also using some other resources on server to run calc parallel mode. I used only 6 processors (out of 32 processors in the server) only for this calc, But, Essbase  had hard time in managing 6 processors for the calculation where 92% of tasks are empty.
So, bottom line is "DO NOT USE PARALLEL MODE JUST BECAUSE YOU HAVE RESOURCES AVAILABLE. USE PARELL CALC BASED ON NON EMPTY TASKS"

So, when to use calcparellel?

I will recommend parallel calculation if Empty tasks are at least 40% of identified tasks. We can play around with order of dimensions in the outline and CALCTASKDIMS settings to reduce number of parallel tasks and empty tasks. We decide number processors based on the resource available and other things running on the server etc... Better start with 2 processors.

We recently upgraded from 11.1.2.0 to 11.1.2.2. A guy from Oracle development team told me that they enhanced parallelism in new version. Instead of improving the performance, it has deprived after up gradation. We have tuned parallelism in calcs which are now running better than previous ones. 11.1.2.2 is doing better job in analyzing parallel tasks than previous version. So, tune your calcparallel if calcs are running longer in 11.1.2.2 compared to previous versions.



Tuesday, July 30, 2013

How XWRITE and XREF can be used …………….

The main intension of writing this Post is to describe how @XREF and @XWRITE can work in similar way.
Sometimes you will be in the need of writing or copying the data between cubes either they can be from within the application or they can be from remote server application.
@XWRITE and @XREF are two calculation commands can be used for such operations.
For example you have two databases(cubes ) called A1 and B1 and they both share different outline structure like below.
Cube :: A1 Outline looks like below.
Account
Sales
Expenses
Period
Q1
Jan
Feb
Mar
Market
East
West
Scenario
Actual
Budget
Year
2011
2012
2013
Cube :: B1 Outline looks like below.
Account
East_Sales
East_Expenses
Period
Q1
Jan
Feb
Mar
Department
Dep_101
Dep_102
Year
2011
2012
2013
 
@XREF: For example East_Sales->Jan->Dept_101->2011intersection of B1 cube has to get data from sales->Jan->East->Budget->2011 intersection in the A1 cube. Nothing but we are copying the data from A1 cube to populate data in B1 cube .
Below calculation script can be referred in such cases,which is written under B1 cube.
FIX(“Jan”,”Dept_101”,”2011”)
“East_Sales”=@XREF(_A1alias_,”Sales”,”Budget”);
ENDFIX

  • _A1alias_  is location alias name of A1 cube which acts as source for @XREF from where we get the data. B1 is called as target to where we are copying the data.
  •  @XREF always refer the data cell by taking the combination of Members names given in the Fix statement and the members given in the @XREF. Here Sales and Budget are members from A1 cube where as we still have this calculation under B1 cube .
  • Whenever we planned to get the value for some intersection by referring other cube we have to run this calculation. So whenever we run this calculation it will always get into A1 cube and search for the intersection to get data .As we know it will take some time to get the data from another cube .
  • If you want to copy more than one member from the same dimension we can write multiple @XREF statements.
  • We can alse use this command in member formulas.
 
@XWRITE: Even this works in the same way but source and target are interchanged here. That is we write this calculation script in Source cube that is in A1 cube in our case. So whenever you think data is ready we can run the below calculation to write the data from A1 cube to B1.
FIX(“Jan”,”East”,”Budget”,2011”)
“Sales”
(
@XWRITE(“Sales”,_B1alia_,”East_Sales”,”Dept_101”);
)
ENDFIX

  • If you want to perform same kind of operation in any of your requirement and I would suggest using @XWRITE over @XREF because @XWRITE will write the data to the target cube whenever we have data in the source cube and it is faster .whereas @XREF always has to fetch data from other cube when we requested ,so it will take time as it depends on source cube availability and we don’t know whether we have data available for the intersection we are seeking. If we don’t have it doesn’t make any sense running the calculation .
  • When you have a situation like cube will be accessed  a lot and so many calculations might be running frequently, where as you have to populate some cells by referring other cube .If you use XREF to get those values ,it will increase your calculation time .In such situation you can go with XWRITE.
  • Using @XWRITE we can also write data to same cube itself by using @LOOPBACK function like followed statement @XWRITE(“Sales”,@LOOPBACK,”Expenses”);
  • We can also use @XWRITE command in member formula calculations.
  •  @XREF can be used for different purpose as specified in the technical reference, it has its own advantages.

Deepa

Wednesday, June 26, 2013

Issue with selecting "Number" and "String" members together in the rule file - What is an alternative solution?

As we all know, rule files are used for loading data to ASO,BSO cubes.

This Post is all about, how to make selection between two different types of members in rule file( Like between Numeric member and String member).


For Example, your Account dimension looks like below. Here, "100" is a numeric account and "control Accounts" is a string account member.
Account
    100
    101
    102
    103
    200
    201
    Control Accounts

If you want to select data for 1XX  accounts, “Control Accounts” and exclude 2XX from the data file.
Usually we add selection statements in “select record”  window in rule file, which looks like below. 






    







As we have selected  “OR” in the "Boolean" selection , it has to select accounts less than 200 (which will return 100 series accounts) along with Control Accounts records .

But this will not give you expected resultsbecause ESSBASE will not allow us to make selection on a field using  "String" and "Number" types together. Workaround for this is ,go to the Field properties and replace your string member with some numeric number as shown below.


 















We are just replacing all the occurrences of "Control Accounts" with some dummy numeric number(Make sure it  will not exist in your outline ).
Go back to select record window(checkout the below screenshot) and make the changes .Here 12345 is nothing but "Control Accounts", but now its numeric data type.So that we can perform Boolean operation between these two selection.












Author : Deepa

Monday, June 24, 2013

Data Copy in ASO Applications using calculation script

Data copy in a BSO application ( Ex: copy data from one version to another ) is straight forward. You can write a calculation script, and use DATACOPY command. Easy, Peasy, Japaneesy!!!

But, ASO is a different story.
We usually export the required data from ASO application and import it back using rule file by changing the member names in rule file.

This approach is not faster and needs extra automation scripts.

But, things have changed after Oracle introduced calculation scripts in ASO application. We can do things like simple math, minor allocations and datacopy  using ASO calculation scripts.

Even though it is not a straight forward calculation script in BSO, you can achieve it with little MDX scripting knowledge and patience (Patience because, it will take some time to write MDX expression to make sure you are not "fixing" on derived, parent or dynamic cells in POV section).

Let’s see how we can do it.

Take ASOsamp:sample as an example.

If we want to copy all data from "Prev Year" to "Curr Year", We have to create a  MAXL script like below,

execute calculation on database ASOSamp.Sample with local script_file "/hyperion/Test_ASO_Calc.csc"
 POV " Crossjoin(Descendants([MTD],Time.Levels(0)),
Crossjoin({[Original Price],[Price Paid],[Returns],[Units],[Transactions]},
Crossjoin(Descendants([Transaction Type],[Transaction type].Levels(0)),
Crossjoin(Descendants([Payment Type], [Payment Type].Levels(0)),
Crossjoin(Descendants(Promotions, promotions.Levels(0)),
Crossjoin(Descendants(Age, Age.Levels(0)),
Crossjoin(Descendants([income Level],[Income Level].Levels(0)),
Crossjoin(Descendants(Geography, Geography.Levels(0)),
Crossjoin(Descendants(Stores,Stores.Levels(0)),
Descendants(Products,Products.Levels(0)))))))))))"

SourceRegion "{[Prev Year],[Curr Year]}";



My custom calculation script, Test_ASO_Calc.csc look like

[Curr Year] := [Prev Year];


This process runs faster like DATACOPY in BSO. 

Wednesday, June 19, 2013

Loading Data to ASO Cube - How "Override" Option Works.


Data Load Properties in Rule file like Overwrite and Add to existing will not have any impact on data load for ASO cubes. where as  subtract from existing will work as expected.We have to mention the load type while importing data from the “Load Buffer” in MAXL statement ( "Load data" window if you are loading from EAS). Below are some examples to explain how these properties will work.


1) Override (Default): By Default, ASO will use “override” property. Data will over write only for the intersections we load.
          
Syntax:
import database 'AppName'.'DBname' data from data_file '/Folder/Newfile.txt' using server rules_file 'Test' to load_buffer    with buffer_id 1  on error write to 'Newfile.txt';

import database 'AppName'.'DBname' data from load_buffer with buffer_id 1;
                                                                (or)
import database 'AppName'.'DBname' data from load_buffer with buffer_id 1 override values;


 For example,
 Existing Data: Jan-> Sales = 10
                                               Feb-> Sales =20

                                  Loading: Feb-> Sales =30
                                               Mar-> Sales =40

                                After Load:Jan-> Sales =10
                                                Feb-> Sales =30
Mar-> Sales =40.

2)Override all data:   It works in a different way. Unlike override, this will replace whole data exist in the cube with the data file we load . Data cells will be cleared in the cube if you are not loading to those cells.
Syntax:
 import database 'AppName'.'DBname' data from load_buffer with buffer_id 1 override all data;


For example, 
                          Existing data:Jan -> Sales = 25
                                              Feb -> Sales = 569

                                 Loading:Jan->Sales = 34

                              After Load:Jan -> Sales = 34
                                              Feb -> Sales = #Mi


3) Override incremental data  Unlike above all the override options, override incremental data deals with slices of data. When you specify this option, data will be loaded to incremental slice. You will get sum of both primary slice and incremental slice when you pull the data.

Syntax:
import database 'AppName'.'DBname' data from load_buffer with buffer_id 1 override incremental data;

For examples, if we have data like below when are loading the first time
            
                        Existing data:Jan -> Sales = 25
                                            Feb -> Sales = 569
                                                 
                               Loading:Jan -> Sales = 34
                                           Mar -> Sales = 100

                           After Load:Jan -> Sales = 59 (25 Primary Slice + 34 Incremental Slice)
                                           Feb-> Sales = 569(Primary Slice)
                                           Mar - >Sales = 100 (Incr Slice)


When we are loading the same file with different data for the next time it looks like below.

                    Existing data:Jan -> Sales = 59  (25 Primary Slice + 34 Incremental Slice)
                                        Feb-> Sales = 569.
                                        Mar -> Sales = 100 (Incr Slice)
                           
                           Loading: Jan-> Sales = 5

                        After Load:Jan-> Sales = 30 (25 Primary Slice + 5 Incr Slice)
                                        Feb-> Sales = 569
                                        Mar -> Sales = #Mi

By looking at the above two examples we can conclude that, every time it will override all the data in incremental slice but it will not change primary slice. Data for Mar ->Sales set to #mi since the data was not loaded during second time.


The difference between “add values” and “override incremental data”  is, “add values” will always add the data to the primary slice where as the “override incremental data” will override the data within the incremental slice.

Author : Deepa.