Show and share technology design bits and bobs

Bringing Technology Solutions to Small Businesses

Removing Duplicate Rows in Interactive Reporting

When asked how to deal with duplicate rows in an Interactive Reporting Results section, my response always includes questions about why duplicate records are being returned and how the data tables are related; this information is helpful to validate that the join condition is appropriate for the data relationship between the two tables.  The majority of times, the relationship is improperly defined or assumptions have been made about the data in the offending tables and the issue resolves itself with a rework of the Query section.

Download Materials

Get the solution BQY for this article:
Click to Download

With that said, sometimes the level of data being compared is not the same across the tables and duplicate records cannot be avoided.  Without opening the Pandora’s box of whether or not properly organized data warehouses should ever result in a situation where duplicate records can be queried, let’s talk about what to do when it does happen.  There are two ways to address this: the Easy, Not-Complicated way and the Not-Easy, Complicated way.  Unfortunately, the Easy, Not-Complicated solution usually doesn’t do the trick unless there are actually duplicate records in the source system. Even-though this is rarely the case, let’s cover that one first — just in case.

The Easy, Not-Complicated Solution

In the Query section, the right-click menu for the Request line offers the Properties menu item as shown below:

Request Line Right-Click Menu

Once opened, the Properties dialog allows report authors to apply query governance limits or further modify the properties of the query using the interface shown below:

Query Properties Dialog Box

As seen in the screenshot, the “Return Unique Rows” option has already been checked, because checking this box is The Easy, Non-Complicated way of tackling this problem.  When this box is checked, the SQL statement sent to the source system is modified from a Select statement to a Select Distinct statement.  The addition of the Distinct ensures that only one row of each unique combination is presented in the Results section.

This solution is so easy that it’s a serious shame when people go down the Not-Easy, Complicated path when this could have worked.  I always try this first… even when I know it won’t work, just because it’s so quick.

 The Not-Easy, Complicated Solution

If you have vetted your query, you will likely have to walk through down this path.  For clarity’s sake, the solution itself is not specifically hard, it is just that there are a lot of little things that can be easily forgotten which would make this solution fail.

To follow along with this tutorial, use the sample.oce connection that is automatically installed with the Interactive Reporting Studio Client.

For starters, we need to create a situation where duplicate records are occurring that cannot be resolved with a Select Distinct statement.  This usually happens when two level-zero tables are joined on a level-zero key and higher-level items are requested from both tables where multiple sets of matched records exist but no facts, which could be summed, are queried.  Since the sample database has keys that are laid out well, we’ll artificially create this situation by joining the Wine and Wine Sales tables using a simple equal join on Vintage from Wine and Year from Wine Sales, instead of joining on Wine Id.  Add Winery, Vintage, and Grape from the Wine table and Region from the Wine Sales table to the Request line as shown below:

Sample Query for Duplicate Results

When processed, the Results section will display duplicate rows for each item.  As seen in the below screenshot, I’ve set the number masking on Vintage to display without the comma for presentation purposes, but there is no requirement that this needs to be done.

Sample Duplicated Results Section

With duplicate rows in place in the Results section, lets get on with it.  The Not-Easy, Complicated solution requires the following components which will be discussed in-depth as we go:

  • Results Section Sort
  • Computed Item Key
  • Computed Item Using Next or Prior
  • Local Filter on the New Table Section

The Results Section Sort

The Next and Prior functions assess the row before and the row after the current row which means the rows need to be properly sorted so that all duplicate rows are grouped together.  In the context of our example, the sorts required to group our duplicate rows together are ascending sorts on Vintage, Winery, Region, and Grape.  Drag the columns from the Results section to the Sort line to apply the required sort as shown in the screenshot below:

Sample Duplicated Sorts

Computed Item Key

To prepare for the use of the Next or Prior function, an item key needs to be created using the same logic as a hash key (for you database people) so that we can be certain that we are assessing a truly unique row.   Sometimes this key is a simple as concatenating two columns in the same row; other times, such as in our example, it is really better to combine several columns to ensure a truly unique identifier for each row group.  To be super-duper certain, lets create a Computed Item key that is a concatenation of all of the items in the sort, in the order they appear. Add a Computed Item to the Results section by selecting “Add Computed Item” from the right-click menu as shown in the screenshot below:

Add a Computed Item to the Results section

With the dialog for the Computed Item opened, update the Name property to “Row Key” and add the following code to the Computed Item Script Editor to concatenate the required columns into a single key:

Vintage + Winery + Region + Grape

When the code as been added, click OK to close the Computed Item dialog box and add the key as a column in the Results section.  With the addition of your newly created Row Key, the Results section should resemble the below screenshot:

Updated Results Section with Row Key

Computed Item Using Next or Prior

The Next and Prior functions actually do what one might guess they would based on their names:  Next assesses the value in the next row for a defined column, Prior assesses the column’s value in the prior row.  To remove duplicate rows,  either of the Next or Prior functions can be used, depending on if you are a glass-half-full or glass-half-empty kind of person.   As an optimistic, forward thinker, I like the Next function.  However, since I am also a realist, I’ll show you the code for both and you can pick your own favorite, as the difference between the two is really the location of the flag for your unique row.   With the Next function, the last row of a group will be marked unique; conversely, with the Prior function, the first row of a group will be marked unique. This will make more sense if you try it both ways and see for yourself.

For both options, the process is the same, with only minor differences in the code.  To start, add another Computed Item to the screen and call it “Duplicate Flag.”  Using either Next or Prior, we will populate a value of 1 in all duplicate rows and unique rows will be unpopulated at null.   In a future step, we are going to use this flag to filter out any row that is equal to 1.

Using the Next function requires the following code to be entered in the Script Editor of the Computed Item:

if (Next ( Row_Key ) == Row_Key) {1}

This code is asking Interactive Reporting to perform the business logic equivalent of “If the value in the Row_Key column in the next row is equal to the value of the Row_Key column in this row, the value to display in the Duplicate Flag column is 1.”  When the Next function is applied on this results set, the first 16 rows have a value of 1 in the Duplicate Flag column before a unique row is flagged in the 17th row as shown in the following screenshot:

Adding a Duplicate Flag using the Next Function

Using the Prior function instead requires the following code to be entered in the Script Editor of the Computed Item:

if (Prior ( Row_Key ) == Row_Key) {1}

This code is asking Interactive Reporting to perform the business logic equivalent of “If the value in the Row_Key column in the prior row is equal to the value of the Row_Key column in this row, the value to display in the Duplicate Flag column is 1.”  When the Prior function is employed on this results set, the first row in the Duplicate Flag column is flagged as a unique row as shown in the following screenshot:

Adding a Duplicate Flag using the Prior Function

Did you see how those paths are so similar that I copied and pasted the Prior solution with only minor modifications?  Now you know how to use both. Moving forward, pick whichever you’d like, because the remainder of the solution continues on to be the same for both.

Local Filter on the New Table Section

Something that frustrated me early on that made perfect sense once explained was the fact that the Next and Prior functions are members of an elite group of functions that — when used in a Computed Item — prevent that Computed Item from being sorted or filtered.  Because of this elitist attitude, we need to add another Table Section so that the data can be limited by filtering on the Duplicate Flag Computed Item column.  Add a new Table section using the Insert menu on the Menu Bar as shown below:

Add a new Table section

Add the Vintage, Winery, Region, and Grapes columns to the newly added Table section so the data is displayed.  Drag the Duplicate Flag column from the Elements Pane to the Filter line to open the Filter dialog.  When properly configured, the filter should look like the below screenshot:

Filter rows with the value 1 in Duplicate Flag

The business logic for the above filter translates to “If the value in Duplicate Flag is not equal to 1, include that row in this table even if the value is Null.”  This ensures only unique rows (which now have a Null value in the Duplicate Flag column) are shown in this table.  Click OK to close the Filter dialog and apply the filter.

Ta-Da! We did it!

Table section with no duplicate rows

About the Author

Emily VoseWhile Emily comes from a fine arts background, she made her mark on the world as a enterprise Business Intelligence Expert. After publishing the only expert guide on the market for Oracle's Hyperion Interactive Reporting, she decided to take on a new challenge. Nodding back to her fine arts background, she combined that with the business process management sense and project management skills she picked up working for large commercial customers and government agencies. She then set out to start a full-service tech firm to help small business get the tech they need to be successful and Wagger Designs was born.View all posts by Emily Vose →

Leave a Reply

Captcha Garb (1.5)