Removing Duplicate Rows in Interactive Reporting
Download Materials
Click to Download
With that said, sometimes the level of data being compared is not the same across the tables and duplicate records cannot be avoided. Without opening the Pandora’s box of whether or not properly organized data warehouses should ever result in a situation where duplicate records can be queried, let’s talk about what to do when it does happen. There are two ways to address this: the Easy, Not-Complicated way and the Not-Easy, Complicated way. Unfortunately, the Easy, Not-Complicated solution usually doesn’t do the trick unless there are actually duplicate records in the source system. Even-though this is rarely the case, let’s cover that one first — just in case.
The Easy, Not-Complicated Solution
In the Query section, the right-click menu for the Request line offers the Properties menu item as shown below:
Once opened, the Properties dialog allows report authors to apply query governance limits or further modify the properties of the query using the interface shown below:
As seen in the screenshot, the “Return Unique Rows” option has already been checked, because checking this box is The Easy, Non-Complicated way of tackling this problem. When this box is checked, the SQL statement sent to the source system is modified from a Select statement to a Select Distinct statement. The addition of the Distinct ensures that only one row of each unique combination is presented in the Results section.
This solution is so easy that it’s a serious shame when people go down the Not-Easy, Complicated path when this could have worked. I always try this first… even when I know it won’t work, just because it’s so quick.
The Not-Easy, Complicated Solution
If you have vetted your query, you will likely have to walk through down this path. For clarity’s sake, the solution itself is not specifically hard, it is just that there are a lot of little things that can be easily forgotten which would make this solution fail.
To follow along with this tutorial, use the sample.oce connection that is automatically installed with the Interactive Reporting Studio Client.
For starters, we need to create a situation where duplicate records are occurring that cannot be resolved with a Select Distinct statement. This usually happens when two level-zero tables are joined on a level-zero key and higher-level items are requested from both tables where multiple sets of matched records exist but no facts, which could be summed, are queried. Since the sample database has keys that are laid out well, we’ll artificially create this situation by joining the Wine and Wine Sales tables using a simple equal join on Vintage from Wine and Year from Wine Sales, instead of joining on Wine Id. Add Winery, Vintage, and Grape from the Wine table and Region from the Wine Sales table to the Request line as shown below:
When processed, the Results section will display duplicate rows for each item. As seen in the below screenshot, I’ve set the number masking on Vintage to display without the comma for presentation purposes, but there is no requirement that this needs to be done.
With duplicate rows in place in the Results section, lets get on with it. The Not-Easy, Complicated solution requires the following components which will be discussed in-depth as we go:
- Results Section Sort
- Computed Item Key
- Computed Item Using Next or Prior
- Local Filter on the New Table Section
The Results Section Sort
The Next and Prior functions assess the row before and the row after the current row which means the rows need to be properly sorted so that all duplicate rows are grouped together. In the context of our example, the sorts required to group our duplicate rows together are ascending sorts on Vintage, Winery, Region, and Grape. Drag the columns from the Results section to the Sort line to apply the required sort as shown in the screenshot below:
Computed Item Key
To prepare for the use of the Next or Prior function, an item key needs to be created using the same logic as a hash key (for you database people) so that we can be certain that we are assessing a truly unique row. Sometimes this key is a simple as concatenating two columns in the same row; other times, such as in our example, it is really better to combine several columns to ensure a truly unique identifier for each row group. To be super-duper certain, lets create a Computed Item key that is a concatenation of all of the items in the sort, in the order they appear. Add a Computed Item to the Results section by selecting “Add Computed Item” from the right-click menu as shown in the screenshot below:
With the dialog for the Computed Item opened, update the Name property to “Row Key” and add the following code to the Computed Item Script Editor to concatenate the required columns into a single key:
Vintage + Winery + Region + Grape
When the code as been added, click OK to close the Computed Item dialog box and add the key as a column in the Results section. With the addition of your newly created Row Key, the Results section should resemble the below screenshot:
Computed Item Using Next or Prior
The Next and Prior functions actually do what one might guess they would based on their names: Next assesses the value in the next row for a defined column, Prior assesses the column’s value in the prior row. To remove duplicate rows, either of the Next or Prior functions can be used, depending on if you are a glass-half-full or glass-half-empty kind of person. As an optimistic, forward thinker, I like the Next function. However, since I am also a realist, I’ll show you the code for both and you can pick your own favorite, as the difference between the two is really the location of the flag for your unique row. With the Next function, the last row of a group will be marked unique; conversely, with the Prior function, the first row of a group will be marked unique. This will make more sense if you try it both ways and see for yourself.
For both options, the process is the same, with only minor differences in the code. To start, add another Computed Item to the screen and call it “Duplicate Flag.” Using either Next or Prior, we will populate a value of 1 in all duplicate rows and unique rows will be unpopulated at null. In a future step, we are going to use this flag to filter out any row that is equal to 1.
Using the Next function requires the following code to be entered in the Script Editor of the Computed Item:
if (Next ( Row_Key ) == Row_Key) {1}
This code is asking Interactive Reporting to perform the business logic equivalent of “If the value in the Row_Key column in the next row is equal to the value of the Row_Key column in this row, the value to display in the Duplicate Flag column is 1.” When the Next function is applied on this results set, the first 16 rows have a value of 1 in the Duplicate Flag column before a unique row is flagged in the 17th row as shown in the following screenshot:
Using the Prior function instead requires the following code to be entered in the Script Editor of the Computed Item:
if (Prior ( Row_Key ) == Row_Key) {1}
This code is asking Interactive Reporting to perform the business logic equivalent of “If the value in the Row_Key column in the prior row is equal to the value of the Row_Key column in this row, the value to display in the Duplicate Flag column is 1.” When the Prior function is employed on this results set, the first row in the Duplicate Flag column is flagged as a unique row as shown in the following screenshot:
Did you see how those paths are so similar that I copied and pasted the Prior solution with only minor modifications? Now you know how to use both. Moving forward, pick whichever you’d like, because the remainder of the solution continues on to be the same for both.
Local Filter on the New Table Section
Something that frustrated me early on that made perfect sense once explained was the fact that the Next and Prior functions are members of an elite group of functions that — when used in a Computed Item — prevent that Computed Item from being sorted or filtered. Because of this elitist attitude, we need to add another Table Section so that the data can be limited by filtering on the Duplicate Flag Computed Item column. Add a new Table section using the Insert menu on the Menu Bar as shown below:
Add the Vintage, Winery, Region, and Grapes columns to the newly added Table section so the data is displayed. Drag the Duplicate Flag column from the Elements Pane to the Filter line to open the Filter dialog. When properly configured, the filter should look like the below screenshot:
The business logic for the above filter translates to “If the value in Duplicate Flag is not equal to 1, include that row in this table even if the value is Null.” This ensures only unique rows (which now have a Null value in the Duplicate Flag column) are shown in this table. Click OK to close the Filter dialog and apply the filter.
Ta-Da! We did it!












