IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

How I can verify the data is in sorted order without sorting


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
PUMA

New User


Joined: 08 Aug 2006
Posts: 10
Location: FRANCE

PostPosted: Fri Sep 19, 2008 10:32 pm
Reply with quote

Hi

Is there a facility in DFSORT or ICETOOL to verify that the data is in sorted order in my very large file to prevent the sort mechanism.
In fact is there a function that read all my data and verify that all my data is sorted on a criteria .

Thinks
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Sun Sep 21, 2008 1:38 pm
Reply with quote

if the procedures have been setup in the proper way,
and the programs have been properly tested ........

there is no need for such check


just curious...
and if the dataset is not sorted what are You going do do ??
another pass to sort it maybe .....
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Sun Sep 21, 2008 2:40 pm
Reply with quote

Hello,

Quote:
if the procedures have been setup in the proper way,
and the programs have been properly tested ........
there is no need for such check
Agree with Enrico.

FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Sun Sep 21, 2008 9:00 pm
Reply with quote

Puma,

You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT.

Code:

//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN01 DD DSN=...  file you want to check
//SORTOUT DD DUMMY
//SYSIN DD *
  MERGE FIELDS=(...)
/*


FIELDS for MERGE would be the same as you'd use for SORT.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Sun Sep 21, 2008 9:07 pm
Reply with quote

Quote:
FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly.


Huh? Where did you get that idea? It's not true. For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order.
Back to top
View user's profile Send private message
CICS Guy

Senior Member


Joined: 18 Jul 2007
Posts: 2146
Location: At my coffee table

PostPosted: Sun Sep 21, 2008 10:24 pm
Reply with quote

Frank Yaeger wrote:
You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT.
I was thiinking along the same lines but I thought that the merge needed two inputs (one dummy?).
Also, I couldn't figure out what return code value would generate when the "out of sequence" error was posted.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Sun Sep 21, 2008 10:46 pm
Reply with quote

H Frank,

Quote:
For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order
No, it doesn't know in advance, but it can surely "see" this when the input is being processed.

Long ago some of the sort products scattered sequenced strings of input records and then merged the strings to create the final output. The longer the initial strings, the faster the sort ran (less strings equaled less manipulation). When we ran completely random sets of input the process took far longer/more resources than if an in sequence file was processed.

Maybe with all of the improvements in technology, this went away. Maybe it was not part of DFSORT. Maybe my memory suffers from far too many systems and products. . . icon_confused.gif

We don't have DFSORT available on my current systems, so i can't run a volume test. I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 8:24 pm
Reply with quote

Quote:
I thought that the merge needed two inputs (one dummy?).


No, it doesn't. You can do a MERGE with one file.

Quote:
I couldn't figure out what return code value would generate when the "out of sequence" error was posted.


ICE068A 0 OUT OF SEQUENCE SORTIN01

RC=16
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 8:36 pm
Reply with quote

Quote:
I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output.


I ran a test sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Sep 22, 2008 9:18 pm
Reply with quote

Hi Frank,

Quote:
sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats.
Yup, i'd believe that because both are "in sequence" already.

If the same file had a 10 positon sort-key and was sorted by (10,1,ch,a,9,1,ch,a,8,8,ch,a. . . etc), i would expect the resource usage to go up (both cpu and excp). As i hinted at earlier, the results are more dramatic for large files.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 9:39 pm
Reply with quote

Quote:
Yup, i'd believe that because both are "in sequence" already.


No, only one file is already in sequence. The experiment I did is like this:

Sort 1 - input records look like this (RECFM=FB,LRECL=100)

Code:

          00000001
          ...
          00300000


I used SORT FIELDS=(11,8,ZD,A) - so records are already in order.

Sort 2 - input records look like this (RECFM=FB,LRECL=100)

Code:

          00300000
          ...
          00000001


I used SORT FIELDS=(1,3,CH,A) - so records are not already in order.

I think this experiment tests one variation of your statement that
Quote:
If the file is completely in sequence, only a fraction of the usual resources are used
(although it certainly doesn't test every variation). The first file is completely in sequence. The second file is completely out of sequence. But there was no appreciable difference in the resources used, so I don't think your blanket statement above is accurate, although it certainly may be true in certain cases.

I've asked our Performance Team Lead to comment on this. He knows much more about Performance than I do, so I'll defer to him.
Back to top
View user's profile Send private message
Dave Betten

New User


Joined: 24 Jan 2006
Posts: 26

PostPosted: Mon Sep 22, 2008 10:49 pm
Reply with quote

This notion of whether a sort uses less resources if the input is already in sequence is complex and there's no simple answer. First, one has to define what we mean by resources?

CPU - in some cases (but not all) there will be less cpu time required to do the sort if the input is already in sequence. The degree of cpu savings is going to vary depending on the characteristics of the sort and the available resources. During the input phase, we gain some efficiencies in the number of instructions required since the records we read in are already in sequence. During the output phase, we may or may not be more efficient in how we merge those strings and write the output. Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect. But in today's environments where we can optimize main storage for larger sorts, that merge process is more efficient. And in cases where we were able to read the entire file into memory, we're going to be very efficient whether the input was in sequence or not.

Intermediate storage - Whether we're talking central storage or DASD, we're still going to have to store the entire file in some sort of intermediate storage. This is because we can't write any of the records out until the last one is read. For all we know, that last record could be the fist one to be written out! Yes, we might have some efficiencies in and reduce our intermediate storage requirement slightly, but we're still going to have to store the entire file somewhere. So I'd say you're going to require almost the same amount of intermediate storage whether your input is in sequence or not. And if that intermediate storage is on DASD, you're going to still do quite a bit of I/O to work datasets that you would never need to do if you ran the file through a MERGE as was suggested earlier. This is the main reason, I would disagree with the idea that a sort is going to use "a fraction of the usual resources" if the input file is already in sequence.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Sep 22, 2008 11:12 pm
Reply with quote

Hi Dave,

Thanks for your reply.

Quote:
Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect.
Long ago and far away. . . And that would have been when we were trying these experiments icon_smile.gif

d
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts How to save SYSLOG as text data via P... All Other Mainframe Topics 4
No new posts Store the data for fixed length COBOL Programming 1
No new posts Data set Rec-Cnt and Byte-Cnt Testing & Performance 2
No new posts SCOPE PENDING option -check data DB2 2
No new posts Check data with Exception Table DB2 0
Search our Forums:

Back to Top