An easy way of Grouping and Summarizing Text File Data in Java

Posted by raqsoft Aug 31, 2014

We often need to process text file data while programming. Here is an example for illustrating how to group and summarize text file data in Java: load employee information from text file employee.txt, group according to DEPT and seek COUNT, the number of employee, and total amount of SALARY of each group.

Text file employee.txt is in a format as follows:

Java’s way of writing code for this task is:

1. Import data from the file by rows and save them in emp, the multiple Map objects of sourceList, the List object.
2. Traverse the object of sourceList, perform grouping according to DEPT and save the result in list, which contains different List objects of group, the Map object.
3. Traverse group and then traverse each DEPT’s list object, and sum up SALARY.
4. While traversing group, save the values of DEPT, COUNT and SALARY in result, the Map object, and the results of different departments in resultList, the List object.
5. Print out the data of resultList.

The code is as follows:

public static void myGroup() throws Exception{

File file = new File("D:\\esProc\\employee.txt");

FileInputStream fis = null;

fis = new FileInputStream(file);

InputStreamReader input = new InputStreamReader(fis);

BufferedReader br = new BufferedReader(input);

String line = null;

String info[] = null;

List<Map<String,String>> sourceList= new ArrayList<Map<String,String>>();

List<Map<String,Object>> resultList= new ArrayList<Map<String,Object>>();

if ((line = br.readLine())== null) return;//skip the first row

while((line = br.readLine())!= null){

info = line.split("\t");

Map<String,String> emp=new HashMap<String,String>();

emp.put("EID",info[0]);

emp.put("NAME",info[1]);

emp.put("SURNAME",info[2]);

emp.put("GENDER",info[3]);

emp.put("STATE",info[4]);

emp.put("BIRTHDAY",info[5]);

emp.put("HIREDATE",info[6]);

emp.put("DEPT",info[7]);

emp.put("SALARY",info[8]);

sourceList.add(emp);

}

Map<String,List<Map<String,String>>> group = new HashMap<String,List<Map<String,String>>>();

//grouping object

for (int i = 0, len = sourceList.size(); i < len; i++) {//group datafrom different DEPT

Map<String,String> emp =(Map) sourceList.get(i);

if(group.containsKey(emp.get("DEPT"))) {

group.get(emp.get("DEPT")).add(emp) ;

} else {

List<Map<String,String>> list = new ArrayList<Map<String,String>>() ;

list.add(emp) ;

group.put(emp.get("DEPT"),list) ;

}

Set<String> key = group.keySet();

for (Iterator it = key.iterator(); it.hasNext();) {//summarize the grouped data

String dept = (String) it.next();

List<Map<String,String>> list = group.get(dept);

double salary =0;

for (int i = 0, len = list.size(); i < len; i++) {

salary += Float.parseFloat(list.get(i).get("SALARY"));

}

Map<String,Object> result=new HashMap<String,Object>();

result.put("DEPT",dept);

result.put("SALARY",salary);

result.put("COUNT",list.size());

resultList.add(result);

}

for (int i = 0, len = resultList.size(); i < len; i++) {//print out the resulting data

System.out.println("dept="+resultList.get(i).get("DEPT")+

"||salary="+resultList.get(i).get("SALARY")+

"||count="+resultList.get(i).get("COUNT"));

}

The results after the code is executed are as follows:

dept=Sales||salary=1362500.0||count=187

dept=Finance||salary=177500.0||count=24

dept=Administration||salary=40000.0||count=4

dept=Production||salary=663000.0||count=91

dept=Marketing||salary=733500.0||count=99

Here myGroup function has only one grouping field. If it has multiple grouping fields, nested multi-layer collections class is needed and the code will become more complicated. As myGroup function has fixed grouping fields and summarizing fields, if there is any change about the fields, we have no choice but to modify the program. This robs the function of the ability to deal with situations of flexible and dynamic grouping and summarizing. In order to enable it to handle these situations as well as SQL statement does, we need to develop additional program for analyzing and evaluating dynamic expressions, which is a rather difficult job.

As a programming language specially designed for processing structured (semi-structured) data and able to perform dynamic grouping and summarizing easily, esProc can rise to the occasion at this time as a better assistive tool. It can integrate with Java seamlessly, enabling Java to access and process text file data as dynamically as SQL does.

For example, group according to DEPT and seek COUNT, the number of employees, and the total amount of SALARY of each group. To do this, esProc can import from external an input parameter “groupBy” as the condition of dynamic grouping and summarizing. See the following chart:

The value of “groupBy” is DEPT:dept;count(~):count,sum(SALARY):salary. esProc needs only three lines of code as follows to do this job:

A1：Define a file object and import the data to it. The first row is the headline which uses tab as the default field separator. esProc’s IDE can display the imported data visually, like the right part of the above chart.

A2：Group and summarize according to the specified field. Here macro is used to dynamically analyze the expression in which groupBy is an input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as the macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.groups(DEPT:dept;count(~):count,sum(SALARY):salary).

A3：Return the eligible result set to the external program.

When the grouping field is changed, it is no need to change the program. We just need to change the parameter groupBy. For example, group according to the two fields DEPT and GENDER and seek COUNT, the number of employees, and the total amount of SALARY of each group. The value of parameter groupBy can be expressed like this: DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary. After execution, the result set in A2 is as follows:

Finally, call this piece of esProc code in Java to get the grouping and summarizing result byusing JDBCprovided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:

// create esProc jdbc connection

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

//call esProc code (the stored procedure) in which test is the file name of dfx

com.esproc.jdbc.InternalCStatement st;

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");

// set parameters

st.setObject(1,"DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary");//the parameters are the dynamic grouping and summarizing fields

// execute the esProc stored procedure

st.execute();

// get the result set

ResultSet set = st.getResultSet();

Here the relatively simple esProc code can be called directly by Java, so it is unnecessary to write esProc script file (like the above test.dfx). The code is as follows:

st=(com. esproc.jdbc.InternalCStatement)con.createStatement();

ResultSet set=st.executeQuery("=file(\"D:\\\\esProc\\\\employee.txt\").import@t().groups(DEPT:dept,GENDER:gender;count(~):count,sum(SALARY):salary)");

The above Java code directly calls a line of code from esProc script: get data from the text file, group and summarize them according to the specified fields and return the result toset, Java’s ResultSet object.

Cases for the statistics on users’ online time at web applications(II)

Posted by raqsoft Aug 24, 2014

In last part we mentioned that the Operation Department of the Web Company brought about a new demand: adding new conditions to the way the online time are computed. As IT department was using esProc as the tool for computation, it’s easy to handle such changes in requirements. On the other hand, the increasing amount of data could be accommodated by out-of-memory computation with esProc’s file cursor functionality.

First, let’s review the way the user behavior information is recorded in the Web Company. Data was recorded in the log file. Everyday a separate log file is generated. For example, the following log file, “2014-01-07.log”, contains the users online actions on January 7, 2014. To compute the online time for user in the week of 2014-01-05 to 2014-01-11, we need to retrieve data from 7 log files:

logtime userid action

2014-01-07 09:27:56 258872799 login

2014-01-07 09:27:57 264484116 login

2014-01-07 09:27:58 264484279 login

2014-01-07 09:27:58 264548231 login

2014-01-07 09:27:58 248900695 login

2014-01-07 09:28:00 263867071 login

2014-01-07 09:28:01 264548400 login

2014-01-07 09:28:02 264549535 login

2014-01-07 09:28:02 264483234 login

2014-01-07 09:28:03 264484643 login

2014-01-07 09:28:05 308343890 login

2014-01-07 09:28:08 1210636885 post

2014-01-07 09:28:09 263786154 login

2014-01-07 09:28:12 263340514 get

2014-01-07 09:28:13 312717032 login

2014-01-07 09:28:16 263210957 login

2014-01-07 09:28:19 116285288 login

2014-01-07 09:28:22 311560888 login

2014-01-07 09:28:25 652277973 login

2014-01-07 09:28:34 310100518 login

2014-01-07 09:28:38 1513040773 login

2014-01-07 09:28:41 1326724709 logout

2014-01-07 09:28:45 191382377 login

2014-01-07 09:28:46 241719423 login

2014-01-07 09:28:46 245054760 login

2014-01-07 09:28:46 1231483493 get

2014-01-07 09:28:48 266079580 get

2014-01-07 09:28:51 1081189909 post

2014-01-07 09:28:51 312718109 login

2014-01-07 09:29:00 1060091317 login

2014-01-07 09:29:02 1917203557 login

2014-01-07 09:29:16 271415361 login

2014-01-07 09:29:18 277849970 login

Log files record, in chronological order, users’ operation (action), user ID (userid) and the time when the actions took place (logtime) in the application. Users operations include three different types, which are login, logout and get/post actions.

Previously, the Operation Department provided the following requirements for computation of users online time:

Login should be considered as the starting point of online time, and overnight should be take into consideration.
If the time interval between any two operations is less than 3 seconds, then this interval should not be added to online time.
If after login, the time interval between any two operations is longer than 600 seconds, then the user should be considered as logged out.

4. If there is only login, without logout, then the last operation time should be treated as time for logout.

Over time, the operations department found that there are some "key point" in users behavior: between login and logout, user who conducted post actions are more loyal to the online application. Therefore, the Web Company plans to introduce an incentive: Based on the original rules, if a user conducted a post operation, his/her online time will be tripled in computation.

After receiving the task, the IT engineer considered the possibility for future adjustment in the way of computation, plus the need for added conditions. The decision is to use out-memory cursor and for loop to realize the computation.

After analysis, it’s found that most user behavior analysis are done for each user independently. Thus, if the log file are pre-sorted according to userid, the performance for various analysis computation will be raised, with reduced difficulty and shortened process time. The pre-processing programming are as following:

As we could see, pre-processing means that we sort and output the seven days log files to a binary file. This way we can eliminate the need for subsequent consolidation and sort.

Meanwhile, the binary files provided by esProc can also help to raise the data/write performance for data.

After pre-processing, the codes for online time computation could be written as following:

Note that:

The volume of data for one user in seven days is not big. Thus in cell A5 we can retrieve all log data for a user into memory in one batch.
In the one-loop-for-each-user cycle, the codes in red box implemented the computation of the new business logic: for every post operation conducted, the users’ current time online time will be tripled in computation. The removal of unqualified record is done in cell B9, and in B10 we calculate a serial number for every login (lognum). Records are grouped in B10 according to lognum, to compute the sum of onlinetime for each group. If there is at least one "post" action in the current group of operations, then the sum of onlinetime for current group will be tripled.
Considering the relatively large data resulted, when the computation is done for 10,000 users, and the result also reach 10,000 lines, we’ll do a batch output of the data from memory to a result file. This improves the performance while avoiding the memory overflow at the same time.

After meeting this demand, the IT engineers in Web Company found that the single-threaded program does not take full advantage of the of the server’s computing power. Here comes another question: can these engineers leverage esProc’s multi-threaded parallel computing capabilities to take full advantages of the server's quad dual core CPUs? Is it troublesome to shift from single-threaded to multiple-threaded? See “Computing the Online Time for users with esProc (III)”.

Code Examples of Accessing and Analyzing HTTP Data in esProc

Posted by raqsoft Aug 20, 2014

Hereby, I just provide a more convenient way to access and process HTTP data. Now I illustrate some use cases in esProc, an innovative data processing language.

In this example, a servlet provides outward query of employee information in json format. Servlet accesses employee table in the database and saves employee information as follows:

EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY

1 Rebecca Moore F California 1974-11-20 2005-03-11 R&D 7000

2 Ashley Wilson F New York 1980-07-19 2008-03-16 Finance 11000

3 Rachel Johnson F New Mexico1970-12-17 2010-12-01 Sales 9000

4 Emily Smith F Texas 1985-03-07 2006-08-15 HR 7000

5 Ashley Smith F Texas 1975-05-13 2004-07-30 R&D 16000

6 Matthew Johnson M California 1984-07-07 2005-07-07 Sales 11000

7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000

8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000

9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000

…

doGet function of servlet receives employee id strings of json format, queries corresponding employee information through the database and generates employee information list in json format and then returns it. Process of reading the database and generating employee information is omitted in the following code:

protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {

// TODO Auto-generated method stub

String inputString=(String) req.getParameter("input");

//inputString input value is："[{EID:8},{EID:32},{EID:44}]";

if (inputString==null) inputString="";

String outputString ="";

{...}//code for querying the database through inputString and generating outputSring is omitted here

// the generated outputString

//"[{EID:8,NAME:"Megan",SURNAME:"Wilson",GENDER:"F",STATE:\...";

resp.getOutputStream().println(outputString);

resp.setContentType("text/json; charset=GBK");

}

The following code can be used for esProc to access this http servlet:

A1： Define the input parameter to be submitted to servlet, i.e. the employee id list in json format.

A2： Define httpfile objects, URL is http://localhost:8080/demo/testServlet?input=[{EID:8},{EID:32},{EID:44}].

A3：Import A2, the result returned by httpfile objects.

A4：Parse by rows the json format information of each employee, and create a sequence.

A5：Compute on the table sequence in A4 and combine SURNAME and NAME into FULLNAME.

A6：Export results of A5 to a text file.

How to integrate and apply esProc JDBC with programs

Posted by raqsoft Aug 18, 2014

esProc JDBC is like an incomplete database JDBC driver without physical tables. It regards esProc as a database only having stored procedures (strong computing power and weak storage mechanism). Similar to the use of database JDBC, esProc JDBC calls esProc program as it calls stored procedures. Their difference is that esProc JDBC is a completely embedded computing engine. All computations are completed in the embedded package rather than by an independent server like in the databases.

1 Loading driver

Jars need by esProc JDBC include dm.jar, poi-3.7-20101029.jar, log4j_128.jar, icu4j_3_4_5.jar and dom4j-1.6.1.jar. The five jarscan be obtained in\esProc\lib in esProc’s IDE installation directory. Load these jars when starting java application; they can be put in WEB-INF/lib directory for a web application.

Note that esProc JDBC requires JDK1.6 or higher versions.

2 Modifying configuration files config.xml and dfxConfig.xml

Prepare file config.xml which contains esProc’s basic configuration information, such as registration code, addressing path, main pathand data source configuration. The file can be found in the directory esProc\config in esProc’s installation directory, and its configuration can be modified before deployment (for detailed explanation of configuration information, please see Appendix).

Configuring authorized information

Configure as follows in file config.xml:

<regCode> license</regCode>

license represents authorization code. Now esProc provides users with free distributions for integration. Free authorization codes are available in the official website.

Save config.xml and dfxConfig.xml in classpath of application projects.

Here we should note that names of the configuration files must beconfig.xml and dfxConfig.ximl, and cannot be changed;reentering is forbidden and esProc JDBC itself cannot be used as a data source and configured as a database connection during configuring database connection information.

3 Deploying esProc program

Put the pre-edited esProc script (dfx file) in classpath of the application project, or put it in the path designated by <paths/> node of dfxConfig.xml file.

4 Java’s calling of esProc program

We’ll look at situations when esProc program returns a single result set and when it returns multiple result sets.

4.1 Single result set

esProcscript

Java’s calling

public class CallSingleRS {

publicvoidtestDataServer(){

Connection con = null;

com.esproc.jdbc.InternalCStatementst;

try{

//create a connection

Class.forName("com.esproc.jdbc.InternalDriver");

con= (Connection) DriverManager.getConnection("jdbc:esproc:local://");

//call the stored procedure,dfxfile’s name is singleRs

st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call singleRs(?)");

//set parameters

st.setObject(1,"5");

//execute the stored procedure

st.execute();

//get result set

ResultSetrs = (ResultSet) st.getResultSet();

}

catch(Exception e){

System.out.println(e);

}

finally{

//close the connection

if (con!=null) {

try {

con.close();

}

catch(Exception e) {

System.out.println(e);

}

4.2 Multiple result sets

esProc script

Java’s calling

Here the main code will be provided and the rest is the same as that in the previous example.

//call the stored procedure

st =( com. esproc.jdbc.InternalCStatement)con.prepareCall("call MutipleRs()");

//execute the stored procedure

boolean hasResult = st.execute();

//if the execution returns result sets

if (hasResult) {

//get multiple result sets

ResultSet set = st.getResultSet();

intcsize = set.getMetaData().getColumnCount();

// the multiple result sets are data rows of one column, csize is 1

while (set.next()) {

Object o = set.getObject(1);

//in this example, one table sequence can be read out each time and retrieve table sequences respectively in A2 and A3

}

【Appendix】 Explanation of Configuration Information

config.xml

<?xmlversion="1.0"encoding="UTF-8"?>

<ConfigVersion="1">

<regCode>W2r@7-8cLMJ-GVU33-BhTIB3E</regCode>

<dfxPath>D:\files\dfx</dfxPath>

</dfxPathList>

<!--Configuration method one: configure connection pool in the application server and designate data source name here

--> <DBList>

<DBname="demo">

<propertyname="url"value="jdbc:hsqldb:hsql://127.0.0.1/demo"/>

<propertyname="driver"value="org.hsqldb.jdbcDriver"/>

<propertyname="type"value="HSQL"/>

<propertyname="user"value="sa"/>

<propertyname="password"value=""/>

<propertyname="batchSize"value="1000"/>

<propertyname="autoConnect"value="true"/>

<propertyname="useSchema"value="false"/>

<propertyname="addTilde"value="false"/>

</DB>

</DBList>

<mainPath>D:\tools\raqsoft\main</mainPath>

<tempPath>D:\tools\raqsoft\main\temp</tempPath>

</Config>

dfxConfig.xml

<?xmlversion="1.0"encoding=" UTF-8"?>

<maxWaitTime>5000

</maxWaitTime>

<log>logger.properties

</log>

<jndi-ds-configs>

<!—jndi-prefix-->

<jndi-prefix>java:comp/env</jndi-prefix>

<jndi-ds-config>

<dbType>ACCESS</dbType>

<needTranContent>false</needTranContent>

<needTranSentence>false</needTranSentence>

</jndi-ds-config>

</jndi-ds-configs>

</dfxConfig>

A Handy Method of Conditioned Filtering in Text Files with Java

Posted by raqsoft Aug 17, 2014

We often encounter the situation that requires text file data processing. Here we’ll look at how to execute conditioned filtering in text files with Java through an example: read employee information from text file employee.txt and select female employees who were born on and after January 1, 1981.

The text file employee.txt is in a format as follows:

EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY

1 Rebecca Moore F California 1974-11-20 2005-03-11 R&D 7000

2 Ashley Wilson F New York 1980-07-19 2008-03-16 Finance 11000

3 Rachel Johnson F New Mexico1970-12-17 2010-12-01 Sales 9000

4 Emily Smith F Texas 1985-03-07 2006-08-15 HR 7000

5 Ashley Smith F Texas 1975-05-13 2004-07-30 R&D 16000

6 Matthew Johnson M California 1984-07-07 2005-07-07 Sales 11000

7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000

8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000

9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000

10 Ryan Johnson M Pennsylvania1976-03-12 2006-03-12 R&D 13000

11 Jacob Moore M Texas 1974-12-16 2004-12-16 Sales 12000

12 Jessica Davis F New York 1980-09-11 2008-09-11 Sales 7000

13 Daniel Davis M Florida 1982-05-14 2010-05-14 Finance 10000

…

Java’s way of code writing is that it reads data from the file by rows, save them in the List objects, traverse List objects, and savethe eligible records in the resultingList objects. Lastly, print out the number of eligible employees. Detailed code is as follows:

public static void myFilter() throws Exception{

File file = new File("D:\\employee.txt");

FileInputStream fis = null;

fis = new FileInputStream(file);

InputStreamReader input = new InputStreamReader(fis);

BufferedReader br = new BufferedReader(input);

String line = null;

String info[] = null;

List sourceList= new ArrayList();

List resultList= new ArrayList();

if ((line = br.readLine())== null) return;//skip the first line, exit if the file is null

while((line = br.readLine())!= null){ //import to the memory from the file

info = line.split("\t");

Map<String,String> emp=new HashMap<String,String>();

emp.put("EID",info[0]);

emp.put("NAME",info[1]);

emp.put("SURNAME",info[2]);

emp.put("GENDER",info[3]);

emp.put("STATE",info[4]);

emp.put("BIRTHDAY",info[5]);

sourceList.add(emp);

}

for (int i = 0, len = sourceList.size(); i < len; i++) {//process data by rows

Map<String,String> emp =(Map) sourceList.get(i);

SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

if ( emp.get("GENDER").equals("F") && !sdf.parse(emp.get("BIRTHDAY")).before(sdf.parse("1981-01-01")) )

{ //save the eligible records in List objects using the conditional statement

resultList.add(emp);

}

System.out.println("count="+resultList.size());//print out the number of eligible employees

}

The filtering condition of this function is fixed. If the condition is changed, the conditional statement in the program should be modified accordingly. Multiple pieces of code are needed if there are multiple conditions, and the program lacks the ability to handle the provisional, dynamic conditions. Now we’ll rewrite the code and make it universal in some degree by slightly changing the loop of traversing sourceList:

for (int i = 0, len = sourceList.size(); i < len; i++) {

Map<String,String> emp =(Map) sourceList.get(i);

SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

boolean isRight = true;

if (gender!=null && !emp.get("GENDER").equals(gender)){//process the condition of gender

isRight = false;

}

if (start!=null && sdf.parse(emp.get("BIRTHDAY")).before(start) ){//process the starting conditionof BIRTHDAY

isRight = false;

}

if (end!=null && sdf.parse(emp.get("BIRTHDAY")).after(end) ){//process the end condition of BIRTHDAY

isRight = false;;

}

if (isRight) resultList.add(emp);//save the eligible records in the resulting list

}

In the rewritten code, gender, start and end are input parameters of the functionmyFilter. The program can manage situations that GENDER field equals the input value gender, BIRTHDAY field is greater than or equal to the input value start as well as less than or equal to the input value end. If any of the input values is null, the condition will be ignored. Conditions are joined by AND.

If we want to make myFiltera more universal function, for example, join conditions with OR or allow computation between fields, the code will become more complicated, requiringprogram for analyzing and evaluating dynamic expressions. This type of program can be as flexible and universal as database SQL, but it is really difficult to develop.

In view of this, we can turn to esProc to assist with this task. esProc is a programming language designed for processing structured (semi-structured) data. It is quite easy for it to perform the above universal query task and can integrate with Java seamlessly so that Java can access and process text file data as flexibly as SQL does. For example, to query female employees who were born on and after January 1, 1981, esProc can import from external an input parameter “where” as the dynamic condition, see the following chart:

The value of “where”is:BIRTHDAY>=date(1981,1,1) && GENDER=="F". esProc needs only three lines of code as follows:

A1：Define a file object and import data to it. The first row is the headline with tab as the field separator by default. esProc’s IDE can visually display the imported data, as shown on the right of the above chart.

A2：Filter according to the condition. Here macro is used to analyze the expression dynamically. “where” is the input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").

A3：Return the eligible result set to the external program.

When the filtering condition changes, we just need to change the parameter “where”without rewriting the code. For example, the condition is modified into querying female employees who were born on and after January 1, 1981,or records of employees whose NAME+SURNAMEequals“RebeccaMoore”. The code forwhere’s parameter value can be like this: BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After execution, the result set in A2 is shown in the following chart:

Finally, call this piece of esProc code with Java to get the filtering result by using jdbc provided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:

// create esProcjdbcconnection

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

//call esProc program (the stored procedure) in which test is the file name of dfx

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");

//set parameters

st.setObject(1," BIRTHDAY>=date(1981,1,1) && GENDER==\"F\" ||NAME+SURNAME==\"RebeccaMoore\"");//the parameter is the dynamic filtering condition

// execute esProc stored procedure

st.execute();

//get the result set: a set of eligible employees

ResultSet set = st.getResultSet();

When writing script of relatively simple code, we may write the esProc code directly into Java code that calls the esProc JDBC. This can save us from having to writethe esProc script file (test.dfx):

st=(com. esproc.jdbc.InternalCStatement)con.createStatement();

ResultSet set=st.executeQuery("=file(\"D:\\\\esProc\\\\employee.txt\").import@t().select(BIRTHDAY>=date(1981,1,1)&&GENDER==\"F\" || NAME+SURNAME==\"RebeccaMoore\")");

This piece of Java code directly calls a line of code from esProc script: get data from the text file, filter them according to the specified condition and return the result set toset, the ResultSet object.

Cases to carry out the statistics on users’ online time at web applications(I)

Posted by raqsoft Aug 11, 2014

As the operator of an online system, the Web Company believes that the users’ time spent with their online application is a key analysis scenario. Specifically, the online time refers to the cumulative time a user spent with their online business application over a certain period of time.

With the evolving of the company's online application, total number of users has grown and the task of user behavior analysis is becoming more complex. Here, we use the example of computing the online time for users to show the various computing scenarios, ranging from simple to complex. Hopefully this could serve as a reference for similar development projects. In fact, the following approach are also applicable for other categories of user behavior analysis, such as user’s activity level, user churn, etc..

Let’s start from the time when the application just went online. The Operation Department needed to know the user’s online time with their application every week. For this the engineers from IT department provided the following resolution.

The user behavior information is recorded in log files in the Web Company. Everyday a separate log file is generated. For example, the following log file, “2014-01-07.log”, contains the users online actions on January 7, 2014. To compute the online time for user in the week of 2014-01-05 to 2014-01-11, we need to retrieve data from 7 log files:

logtime userid action

2014-01-07 09:27:56 258872799 login

2014-01-07 09:27:57 264484116 login

2014-01-07 09:27:58 264484279 login

2014-01-07 09:27:58 264548231 login

2014-01-07 09:27:58 248900695 login

2014-01-07 09:28:00 263867071 login

2014-01-07 09:28:01 264548400 login

2014-01-07 09:28:02 264549535 login

2014-01-07 09:28:02 264483234 login

2014-01-07 09:28:03 264484643 login

2014-01-07 09:28:05 308343890 login

2014-01-07 09:28:08 1210636885 post

2014-01-07 09:28:09 263786154 login

2014-01-07 09:28:12 263340514 get

2014-01-07 09:28:13 312717032 login

2014-01-07 09:28:16 263210957 login

2014-01-07 09:28:19 116285288 login

2014-01-07 09:28:22 311560888 login

2014-01-07 09:28:25 652277973 login

2014-01-07 09:28:34 310100518 login

2014-01-07 09:28:38 1513040773 login

2014-01-07 09:28:41 1326724709 logout

2014-01-07 09:28:45 191382377 login

2014-01-07 09:28:46 241719423 login

2014-01-07 09:28:46 245054760 login

2014-01-07 09:28:46 1231483493 get

2014-01-07 09:28:48 266079580 get

2014-01-07 09:28:51 1081189909 post

2014-01-07 09:28:51 312718109 login

2014-01-07 09:29:00 1060091317 login

2014-01-07 09:29:02 1917203557 login

2014-01-07 09:29:16 271415361 login

2014-01-07 09:29:18 277849970 login

Log files record, in chronological order, users’ operation (action), user ID (userid) and the time when the actions took place (logtime) in the application. Users operations include three different types, which are login, logout and get/post actions.

The Operation Department provided the following requirements for computation of users online time:

1. Login should be considered as the starting point of online time, and overnight should be take into consideration.

2. If the time interval between any two operations is less than 3 seconds, then this interval should not be added to online time.

3. If after login, the time interval between any two operations is longer than 600 seconds, then the user should be considered as logged out.

4. If there is only login, without logout, then the last operation time should be treated as time for logout.

As the online application was just rolled out, the data volume for log file is relatively small. To compute on data from log files for 2014-01-05 to 2014-01-11, we could retrieve all data into memory in one batch, or out to a resulting file. Thus all codes here are written for in-memory computing.

The IT Department leverages esProc to meet the above requirements.

The actual codes are as following:

The ideas for program design are:

1. First, retrieve all log files for the week ( 2014-01-05 to 2014-01-11 ) and merge them in chronological order. Sorting them according to userid and logtime. Add two extra fields, onlinetime and loginflag for subsequent calculations.

2. Onlinetime is for computing of the interval between two operations by the same user. If difference between the operation time of current line and last action is less than 3 seconds, or if the userid of current operation does not equal to that of last one, then onlinetime is directly set to 0.

3. Loginflag is used to indicate a valid onlinetime. If onlinetime does not exceed 10 minutes (600 seconds), or the type of operation is logout, then loginflag is set to true. Otherwise it’s set to false. If it’s login operation, then loginflag is directly set to true.

4. Upon the resulting sorted table from previous steps, compute loginflag again. If loginflag was originally set to false, then leave it to false. If the value were originally set to true, then the type of last operation would result to different value. If the last operation were login, then loginflag should still be set to true, otherwise it should be set to false.

5. Upon the resulting sorted table from previous steps, group the data according to userid. Compute the sum of onlinetime for all records whose loginflag is true. This is the total online time for the same user.

6. Output the result in the last step to a file onlinetime.data.

The advantage of the above codes lies in the step-by-step computing, which is easy to maintain and modify.

After working for a while, a new problem was found: On the one hand, The Operation Department said that the original way for online time computation should be adjusted, with new conditions added. On the other hand, with the increase of users, the log files grow larger, which is too big to fit into memory in one batch. Well, how should the IT Departments cope with this change in the requirements? See "Cases to carry out the statistics on online time of web applications (II)”.

Code Examples of Processing json Data with esProc

Posted by raqsoft Aug 4, 2014

esProc can process json data. Here we’ll introduce some of the applications through examples. 1. Analyzing and generating json data with esProc; 2. Data-interchange between esProc and application program through json; 3. Reading json file data in esProc.

A Analyzing and generating json data with esProc

Generally speaking, json is a format used by webpage js program and Java’s server-side program (such as servlet) to interchange data. While data access between Java’s server-side program and the databases adopts SQL result set format. esProc can act as an intermediary in the data computation and data-interchange between the two formats.

In this example, we use esProc to query detailed information of a group of designated employees. Both data input and output will adopt json format. Table employee of database demo contains all information of the employees:

esProc receives an EID list of json format and returns corresponding detailed information of employees in json format. The code is as follows:

1. esProc program test.dfx receives a parameter: jsonEID.

2. esProc completes json analysis, data processing and generates results in json format:

A1： Connect to database demo.

A2： Retrieve data from table employee.

A3： Use import@j function to parse the inputting jsonEID parameter (EID list in json format) and generate a table sequence containing only one field EID.

A4： Use align function to get from users data the employee information designated by A3.

A5： Convert employee information into json strings.

A6： Return employee information of json format.

B Interchanging json data between esProc and Java application

In the above example, esProc program is saved as test.dfx file to be called by Java application. Steps for calling the file are as follows:

1. Deploy esProc in Java application.

See esProc Tutorial for detail.

2. Call test.dfx in Java application.

Code example is as follows:

public void testDataServer(){

Connection con = null;

com.esproc.jdbc.InternalCStatementst;

try{

// Users id list in json format can be transmitted from browser-side to the program and converted into strings for use. Here process of receiving json data is omitted and value is assigned directly

String jsonEid="[{EID:8},{EID:32},{EID:44}]";

// Create a connection

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

// Call stored procedure. test is the file name of dfx

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");

// Set parameters

st.setObject(1,jsonEid);

// Execute stored procedure

st.execute();

// Get result set

ResultSet set = st.getResultSet();

String jsonEmployee=null;

if (set.next()) jsonEmployee=set.getString(1);

// After getting detailed user information in json format, convert it into json objects and return them to browser-side. How to use jsonEmployee is omitted here

}

catch(Exception e){

System.out.println(e);

}

finally{

// Close the connection

if (con!=null) {

try {

con.close();

}

catch(Exception e) {

System.out.println(e);

}

C Reading and processing json file data in esProc

JSON file test.json contains information including class, serial number, names, subjects, scores, etc. Format is as follows:

[

{

"class": "Class one",

"id": 1,

"name": "Emily",

"subject": "English",

"score": 84

},

{

"class": "Class one",

"id": 1,

"name": "Emily",

"subject": "Math",

"score": 77

},

......

{

"class": "Class one",

"id": 7,

"name": "Nicholas",

"subject": "PE",

"score": 60

}

]

It is convenient for esProc to perform the reading and computation of JSON data. After that the result is submitted to Java application in the format of JDBC result set. Steps are as follows:

1. Developing esProc script

Use esProc editor to develop script (fromJSON.dfx), read the json file, analyze it and complete the computation:

A1： Use read() to read json file in string format;

A2： Use import@j() function to parse the json file into a table sequence;

A3： Group students ID and summarize the total scores in A4;

A5： Sort by total scores in descending order and return result set through result in A7

2. Java application calls fromJSON.dfx to present result.

Steps are omitted here for they are almost the same as those in the above example.

JBossDeveloper

Java Development and Database Computation

An easy way of Grouping and Summarizing Text File Data in Java

Cases for the statistics on users’ online time at web applications(II)

Code Examples of Accessing and Analyzing HTTP Data in esProc

How to integrate and apply esProc JDBC with programs

1 Loading driver

2 Modifying configuration files config.xml and dfxConfig.xml

3 Deploying esProc program

4 Java’s calling of esProc program

4.1 Single result set

4.2 Multiple result sets

【Appendix】 Explanation of Configuration Information

config.xml

dfxConfig.xml

A Handy Method of Conditioned Filtering in Text Files with Java

Cases to carry out the statistics on users’ online time at web applications(I)

Code Examples of Processing json Data with esProc

Actions

Filter Blog