3 Replies Latest reply on Nov 10, 2016 3:51 PM by cfang

    Large file sort

    richardmoore

      I have a situation where I need to sort a file with more than 2 million fixed format records by 4 different fields throughout the record. Do you have some suggestions as to the most efficient way to implement this in the framework?

        • 1. Re: Large file sort
          cfang

          Just some ideas, I haven't tried any of them.

           

          you can import them into database, and query them with various orderBy conditions.  This is most flexible, though can be slow.

           

          You can also try command-line tools (e.g., sort).

           

          If you also have some other processing to do, you can sort them at the same time for each chunk (big chunk size?), then somehow merge-sort all sorted results.

          • 2. Re: Large file sort
            richardmoore

            I ended up going with command-line sort for linux and then downloaded cygwin which has the same sort in it so we can do development on our desktops. This brought up another question along this subject. I can have varying number of sort fields, order, and field types. How should I go about putting these in my jsl? For instance, one of my sorts needs to start in column 10 for 30 characters in desc order, the next is column 50 for 10 characters in asc order, the last is column 40 for 10 characters in desc order. I am not sure how to put such complex lists of variables in so that I keep the order of them and have repeating entries that I can access in my batchlet.

            • 3. Re: Large file sort
              cfang

              So with the sort utility you can perform the sorting manually, and now you wanted to run this tool from a batchlet?

               

              I would pass all args as a batch property to the batchlet:

               

              <property name="sortArgs" value="-k 2 -r"/>

               

              Not sure if the sort tool can handle multiple sorting keys, like first sort by State, then by City in the same State, then by last name in the same city and state, etc.  Not sure how to specify the column length for the soring key.  But if you already have it working in command line, then it's jsut a matter of passing it to batchlet.

               

              If the structure gets too complicated, you can keep it one sorting key per property:

               

              <property name="sortArg1" value="..."/>

              <property name="sortArg2" value="..."/>

              <property name="sortArg3" value="..."/>

               

              you can have a rule that allow for at most 5 args.  The batchlet class will have all 5 injection fields, and at runtime arg4 or arg5 may be null.