A Handy Method of Conditioned Filtering in Text Files with Java
Posted by raqsoft in Java Development and Database Computation on Aug 17, 2014 11:31:34 PMWe often encounter the situation that requires text file data processing. Here we’ll look at how to execute conditioned filtering in text files with Java through an example: read employee information from text file employee.txt and select female employees who were born on and after January 1, 1981.
The text file employee.txt is in a format as follows:
EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY
1 Rebecca Moore F California 1974-11-20 2005-03-11 R&D 7000
2 Ashley Wilson F New York 1980-07-19 2008-03-16 Finance 11000
3 Rachel Johnson F New Mexico1970-12-17 2010-12-01 Sales 9000
4 Emily Smith F Texas 1985-03-07 2006-08-15 HR 7000
5 Ashley Smith F Texas 1975-05-13 2004-07-30 R&D 16000
6 Matthew Johnson M California 1984-07-07 2005-07-07 Sales 11000
7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000
8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000
9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000
10 Ryan Johnson M Pennsylvania1976-03-12 2006-03-12 R&D 13000
11 Jacob Moore M Texas 1974-12-16 2004-12-16 Sales 12000
12 Jessica Davis F New York 1980-09-11 2008-09-11 Sales 7000
13 Daniel Davis M Florida 1982-05-14 2010-05-14 Finance 10000
…
Java’s way of code writing is that it reads data from the file by rows, save them in the List objects, traverse List objects, and savethe eligible records in the resultingList objects. Lastly, print out the number of eligible employees. Detailed code is as follows:
public static void myFilter() throws Exception{
File file = new File("D:\\employee.txt");
FileInputStream fis = null;
fis = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(input);
String line = null;
String info[] = null;
List sourceList= new ArrayList();
List resultList= new ArrayList();
if ((line = br.readLine())== null) return;//skip the first line, exit if the file is null
while((line = br.readLine())!= null){ //import to the memory from the file
info = line.split("\t");
Map<String,String> emp=new HashMap<String,String>();
emp.put("EID",info[0]);
emp.put("NAME",info[1]);
emp.put("SURNAME",info[2]);
emp.put("GENDER",info[3]);
emp.put("STATE",info[4]);
emp.put("BIRTHDAY",info[5]);
sourceList.add(emp);
}
for (int i = 0, len = sourceList.size(); i < len; i++) {//process data by rows
Map<String,String> emp =(Map) sourceList.get(i);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
if ( emp.get("GENDER").equals("F") && !sdf.parse(emp.get("BIRTHDAY")).before(sdf.parse("1981-01-01")) )
{ //save the eligible records in List objects using the conditional statement
resultList.add(emp);
}
}
System.out.println("count="+resultList.size());//print out the number of eligible employees
}
The filtering condition of this function is fixed. If the condition is changed, the conditional statement in the program should be modified accordingly. Multiple pieces of code are needed if there are multiple conditions, and the program lacks the ability to handle the provisional, dynamic conditions. Now we’ll rewrite the code and make it universal in some degree by slightly changing the loop of traversing sourceList:
for (int i = 0, len = sourceList.size(); i < len; i++) {
Map<String,String> emp =(Map) sourceList.get(i);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
boolean isRight = true;
if (gender!=null && !emp.get("GENDER").equals(gender)){//process the condition of gender
isRight = false;
}
if (start!=null && sdf.parse(emp.get("BIRTHDAY")).before(start) ){//process the starting conditionof BIRTHDAY
isRight = false;
}
if (end!=null && sdf.parse(emp.get("BIRTHDAY")).after(end) ){//process the end condition of BIRTHDAY
isRight = false;;
}
if (isRight) resultList.add(emp);//save the eligible records in the resulting list
}
In the rewritten code, gender, start and end are input parameters of the functionmyFilter. The program can manage situations that GENDER field equals the input value gender, BIRTHDAY field is greater than or equal to the input value start as well as less than or equal to the input value end. If any of the input values is null, the condition will be ignored. Conditions are joined by AND.
If we want to make myFiltera more universal function, for example, join conditions with OR or allow computation between fields, the code will become more complicated, requiringprogram for analyzing and evaluating dynamic expressions. This type of program can be as flexible and universal as database SQL, but it is really difficult to develop.
In view of this, we can turn to esProc to assist with this task. esProc is a programming language designed for processing structured (semi-structured) data. It is quite easy for it to perform the above universal query task and can integrate with Java seamlessly so that Java can access and process text file data as flexibly as SQL does. For example, to query female employees who were born on and after January 1, 1981, esProc can import from external an input parameter “where” as the dynamic condition, see the following chart:
The value of “where”is:BIRTHDAY>=date(1981,1,1) && GENDER=="F". esProc needs only three lines of code as follows:
A1:Define a file object and import data to it. The first row is the headline with tab as the field separator by default. esProc’s IDE can visually display the imported data, as shown on the right of the above chart.
A2:Filter according to the condition. Here macro is used to analyze the expression dynamically. “where” is the input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").
A3:Return the eligible result set to the external program.
When the filtering condition changes, we just need to change the parameter “where”without rewriting the code. For example, the condition is modified into querying female employees who were born on and after January 1, 1981,or records of employees whose NAME+SURNAMEequals“RebeccaMoore”. The code forwhere’s parameter value can be like this: BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After execution, the result set in A2 is shown in the following chart:
Finally, call this piece of esProc code with Java to get the filtering result by using jdbc provided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:
// create esProcjdbcconnection
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//call esProc program (the stored procedure) in which test is the file name of dfx
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");
//set parameters
st.setObject(1," BIRTHDAY>=date(1981,1,1) && GENDER==\"F\" ||NAME+SURNAME==\"RebeccaMoore\"");//the parameter is the dynamic filtering condition
// execute esProc stored procedure
st.execute();
//get the result set: a set of eligible employees
ResultSet set = st.getResultSet();
When writing script of relatively simple code, we may write the esProc code directly into Java code that calls the esProc JDBC. This can save us from having to writethe esProc script file (test.dfx):
st=(com. esproc.jdbc.InternalCStatement)con.createStatement();
ResultSet set=st.executeQuery("=file(\"D:\\\\esProc\\\\employee.txt\").import@t().select(BIRTHDAY>=date(1981,1,1)&&GENDER==\"F\" || NAME+SURNAME==\"RebeccaMoore\")");
This piece of Java code directly calls a line of code from esProc script: get data from the text file, filter them according to the specified condition and return the result set toset, the ResultSet object.
Comments