# 6 Data Computing Tools

Posted by raqsoft in Java Development and Database Computation on Jan 23, 2014 3:35:00 AMRecently, I finished a project which involves using the Excel, R Project, and es-series in combination. An idea occurred to me in the work. Why not put them along with the Matlab, SPSS, and Stata side-by-side to make an introductions and comparisons of Desktop BI tools? At last, this essay comes into being, as you can see below.

Desktop BI refers to the BI tools running on the desktop environment, almost not requiring any server supports. The typical Desktop BI only provides the core BI functions with less requirement on the technical environments. By comparison, such software as Solution BI cannot operate without the support from private server. They are usually the integrated solution or platform system in-built with lots of half-finished components. Besides the core BI functions, Solution BI also provide some non-core functions like the authority management, resources sharing, and collaboration between jobs of various types. Desktop BI is the commonest tool for people doing the data computing and analysis.

Comparison results:

Technical Requirements: Excel > ES Series> SPSS >Matlab >STATA >R

Numbers of statistical models: R >STATA >SPSS >Matlab >Excel >ES Series

API capability: R >Excel > STATA > Matlab >SPSS >ES Series

Complex computing goal: ES Series >R >Matlab >Excel >STATA >SPSS

Graphic capability: SPSS >R > Matlab > ES Series >Stata >Excel

Learning curve: Excel > SPSS > ES Series >Matlab >STATA >R

Interactive computing: ES Series > Excel >R >Matlab >STATA >SPSS

Price: R >Excel >ES Series >Stata >Matlab >SPSS

Note: In the above comparison, the tool on the left has more advantages than those the right.

**Excel**

Of all BI software, which one occupies the largest market shares, has the largest user base, and experiences the greatest increase in each year? It is neither QlikView / Spotfire, nor SAS / SAP, but Excel. Does it surprise you? BI producers are always running down Excel in an attempt to describe Excel as “inferior BI toy” or even “outdating BI”. But the figures won’t lie, Gartner, many true BI users, and even these producers have to admit that Excel is the most (not just the one of the most) important BI tool.

Excel is an intuitive and flexible Desktop BI tool of low technical threshold. BI is aimed to solve the “Business” problem. Who understand the business best? Needless to say, it is the business experts – the core users of BI software. Most of them do not have strong technical background, so they wish the technical threshold will be as low as possible. It seems Excel lack the computing capability and it is hard for Excel to handle the abstract data structure as the SQL and other script languages do. However, confronting the intuitive data directly, Excel has the advantage of providing a flexible and natural computing method. The business experts can be benefited to turn their business algorithm into computer languages according to their business thoughts. This is great expression ability of BI tools. Although other BI tools are more powerful in computing, they are too difficult for business experts to use in expressing the business algorithms. Therefore, no matter how powerful computing capability these tools have got, the business experts cannot leverage it.

**R Project** http://www.r-project.org/

R Project holds the largest market shares of open source BI tools. In the KDNuggets survey 2012 on the “Top Analytics, Data Mining, and Big Data software used”, R won the top one title with 30.7% of votes. It is often used for the statistical analysis, matrix computing, and graphic plotting, mainly in the sector of information biology and partly in the economic measurement, financial analysis, and cultural sciences.

R features the interactive computing environment and the abundant 3rd party library functions. R offers an intuitive way to view and make reference to the results of previous computing. Relying on the agile and elegant syntax, R users can carry forward the data processing step by step, and decompose the complex computing goal into several simple goals easily. Such interactive computing environment is ideal for solving the complex and ambiguous BI problems. R is the open-source software with massive function libraries and rapidly updating algorithms. Its secondary development interface supports various languages for users to integrate the 3rd party library functions easily, so it becomes widely popular with a great many users.

Comparatively, R suffers some drawbacks on UI friendliness and technical requirements, which hinders R from further popularization. RStudio and the alike tools can remedy the weak points of R on UI friendliness. However, it is still a far cry from the business software, and the technical requirement on languages is native and unchangeable. Moreover, many people also complain on the relatively low computing speed and the unsatisfactory accuracy of package from the 3rd party.

Last, thanks to the powerful computing capability and the open source characteristic of R, various big data solutions like Teradata , SAP, Oracle, and IBM all declared their support for R and R thus holds the spotlight.

**ES Series **http://www.raqsoft.com

ES Series is the next generation Desktop BI tool of the most promising one in making a breakthrough on the traditional BI. Regarding the spreadsheets sector, es-series provides a more powerful computing capability than Excel, quite suitable for the business personnel without technical background to conduct the complex data computing. It implements the homocell model and the visualization of computing procedure. es-series not only allows for the intelligent formula-pasting to reduce the manual operations dramatically and the free step-by-step computing to implement the free data manipulation, but also provides the all-around set computing ability to solve the complex computing easily. The users of es-series tools can perform the table association operation to implement the computing between multiple tables with none formula required. These advantages are always the bottlenecks for the traditional spreadsheets.

Regarding the data computing scripts, es-series tools have the same complete structural data computing ability as SQL, with much lower technical requirements. es-series tools are the same capable as R in its interactive computing abilities to solve the complex problems, and offer a friendlier interface. The syntax of es-series is more intuitive and easier-to-understand. It is the computing scripts that business personnel can grasp easily. Considering its distinguishing features, es-series is superior to SQL or R on many respects. For example, es-series can reach the complex goal of computing more easily through the step-by-step computing in the cellset; its support for explicit sets allows for the intuitive data manipulation from business perspective, and will ultimately reduce the difficulty and improve the readability; es-series tools enable the object reference to implement the associative access to multi-tables intuitively, and support the sorted set to solve any tough problems related to orders.

However, es-series tools comparatively lack such model algorithm as regression analysis, collaboration, and sharing that Excel provides.

**SPSS **http://www-01.ibm.com/software/analytics/spss/

SPSS is reputed for its simplicity and friendliness and occupies the greatest market share. SPSS provides the completely graphical UI for operations and the command options of menu style, so that users can perform the commonest module analysis without any scripting. With regard to the fixed module analysis, SPSS is really good at the ANOVA and Multivariate Statistical Analysis. They even do better than SAS of Solution BI in this aspect.

The overall ability in graphic drawing of SPSS is the best among all Desktop BI software. Although the drawing plotted by R is also quite fine, the interactive plotting procedure of SRSS can overtake R completely. Almost everything in the drawings can be altered. No matter it is the statistic chart design stage or the graphic result stage, users can directly alter the colors and line patterns. They can also add the marking variables in the Scatter Diagram or even change the 2D to 3D, or delete some data, or change the basic type of drawings, for example, change the bar charts to the line charts, and add several auxiliary lines at will. Drawing with SPSS will give you a feeling of acting willfully.

However, SPSS is comparatively rather rigid, and only capable to perform the analysis on fixed model. It is quite hard for SPSS users to perform the computing outside the models. For example, firstly, filter the analysis result by the keyword; then rank by another column; secondly, retrieve the several top rankings; and lastly convert the unit of values in a certain column from US Dollar to the Pound. In this aspect, it is quite natural that the computing scripts of R, esProc, and MATLAB are more powerful. Therefore, SPSS cannot be used for free computing and complex analysis. It is the fool-proof software. In addition, SPSS is quite expensive. STATA is comparatively more cost-effective.

**STATA ***http://www.stata.com*

STATA can be regarded as the tools between SPSS and R, considering the price, interface friendliness, flexibility, and degree of freedom. Almost all SPSS fixed analysis models have the corresponding features in STATA. The differences are that SPSS provides the friendly parameter-inputting interface and result representation interface for analysis, while STATA only provides the command line prompt for input and console text output. In addition, the regression analysis of STATA is more powerful than several other tools, such as OLS. The similar advantageous analyses include the Time series analysis and the Panel Data analysis.

The drawing ability of STATA is also fine and almost at the same level as that of es-series and R, although it is worse than that of SPSS.

I disagree to the opinion held by some people that STATA is the commercialized R. Although the expansibility of STATA is greater than that of SPSS, and the 3rd party vendors can update the algorithm timely, it is still far worse than that of R. In addition, R, es-series, and MATLAB are the most typical computing scripts, which allow for the free analysis in a way similar to programming. By comparison, STATA is a bit rigid and awkward. We can reach this conclusion:

R can provide all features of STATA, while STATA may not have some features of R. SPSS cannot give the desired features for it doesn’t have, STATA can give them in a very awkward and rigid way, R can give them normally, while esProc can give them easily.

**MATLAB **http:// www.mathworks.com/products/matlab/

MATLAB is a computing language and interactive environment for numerical calculation, algorithm development, and data analysis, enabling users to create the user interface by plotting the graphic for themselves. Matlab is widely applied in the industrial automation system design and analysis, as well as the sectors of graphic processing, signal processing, communications, finance-modeling & analysis.

At a first glance, it appears that Matlab and R shares many similarities in the UI style, syntax structure, graphic capability, and other aspect. Their differences are great. Matlab is the shortened form of Matrix Laboratory. As the name implies, it is the best at the matrix computation. Matlab can provide more mathematical functions than R. In addition, it can provide many functions based on in-depth study on the specific industries or majors, for example, industrial data analysis, financial model setup, and neural network toolbox. Matlab can do more professionally in these areas than R. Its graphical operation capability is greater than that of R but worse than that of SPSS.

Comparatively, R has stronger language expressing ability and more powerful statistic functions than Matlab. In other words, R is simpler and more flexible to convert the algorithms on paper to the languages that computers could understand. In addition, the statistical module of Matlab is neither complete nor updated.

As to the price of MATLAB, its price is between SPSS and STATA, which is “a little expensive”.

Conventionally, we shall say that each of these tools has its strong point and let’s use them in combination. But according to my years of experiences, I would like to offer an personal advice that the tool capable of expressing your thoughts freely is the tool best for you.

## Comments