In Java development, we may encounter the complex set operations. Java alone is not powerful enough to save programmers’ efforts in implementing the computation details, which is time-consuming and poor in code reuse. In view of this, programmers usually resort to dynamic calculation script for set operation.

 

SQL is surely the first kind of script that comes into most programmers’ mind. However, to their disappointments, SQL does not support the explicit set, and is unable to represent the sets of a set, ordered set, generic set, and only the result set can be recognized as a set. Therefore, it is only the subset of the true set. Many operations on sets are hard to implement through SQL. Moreover, the computation is not limited on database, such as the data from Excel and even there is no database in the application environment. In this case, the usage of SQL database is further narrowed.

 

As the true dynamic calculation script for set operations, both R and esProc are more suitable for such computation, considering its support for generic set, ordered set, and Java- embedding. In addition, they not only empower users to retrieve data from one or more databases, Excel sheet, and txt file, but also enable users to compute step by step, and ultimately solve many complex computations in a much more convenient way.

 

Let’s check it out with this example: a sales department needs to make statistics on the outstanding salesmen who account for half of the total sales based on their sales records.

esProc scripts:

http://1.bp.blogspot.com/-977UY30WFoE/UeYBdvunbpI/AAAAAAAABAw/-NeTQ5S82BE/s1600/set-operations.png


R scripts:

01 library(RODBC)

02 odbcDataSources()

03 conn<-odbcConnect("sqlsvr")

04 originalData<-sqlQuery(conn,'select * from salesOrder')

05 odbcClose(conn)

06 nameSum<-aggregate(originalData$sales,list(originalData$name),sum)

07 names(nameSum)<-c('name','salesSum')

08 orderData<-nameSum[rev(order(nameSum$salesSum)),]

09 halfSum<-sum(orderData$salesSum)/2

10 orderData$addup<-cumsum(orderData$salesSum)

11 subset(orderData,addup<=halfSum | (addup>halfSum & c( 0, addup[- length (addup)]) <halfSum))

 

esProc supports JDBC for reporting tools and Java codes to reference directly. By comparison, embedding R in Java is more complex due to its reliance on RServer library or Perl for transition and no simple interface is available.