View previous topic :: View next topic
|
Author |
Message |
PUMA
New User
Joined: 08 Aug 2006 Posts: 10 Location: FRANCE
|
|
|
|
Hi
Is there a facility in DFSORT or ICETOOL to verify that the data is in sorted order in my very large file to prevent the sort mechanism.
In fact is there a function that read all my data and verify that all my data is sorted on a criteria .
Thinks |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10873 Location: italy
|
|
|
|
if the procedures have been setup in the proper way,
and the programs have been properly tested ........
there is no need for such check
just curious...
and if the dataset is not sorted what are You going do do ??
another pass to sort it maybe ..... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
if the procedures have been setup in the proper way,
and the programs have been properly tested ........
there is no need for such check
|
Agree with Enrico.
FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly. |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Puma,
You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT.
Code: |
//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN01 DD DSN=... file you want to check
//SORTOUT DD DUMMY
//SYSIN DD *
MERGE FIELDS=(...)
/*
|
FIELDS for MERGE would be the same as you'd use for SORT. |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Quote: |
FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly. |
Huh? Where did you get that idea? It's not true. For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order. |
|
Back to top |
|
|
CICS Guy
Senior Member
Joined: 18 Jul 2007 Posts: 2146 Location: At my coffee table
|
|
|
|
Frank Yaeger wrote: |
You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT. |
I was thiinking along the same lines but I thought that the merge needed two inputs (one dummy?).
Also, I couldn't figure out what return code value would generate when the "out of sequence" error was posted. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
H Frank,
Quote: |
For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order |
No, it doesn't know in advance, but it can surely "see" this when the input is being processed.
Long ago some of the sort products scattered sequenced strings of input records and then merged the strings to create the final output. The longer the initial strings, the faster the sort ran (less strings equaled less manipulation). When we ran completely random sets of input the process took far longer/more resources than if an in sequence file was processed.
Maybe with all of the improvements in technology, this went away. Maybe it was not part of DFSORT. Maybe my memory suffers from far too many systems and products. . .
We don't have DFSORT available on my current systems, so i can't run a volume test. I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output. |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Quote: |
I thought that the merge needed two inputs (one dummy?). |
No, it doesn't. You can do a MERGE with one file.
Quote: |
I couldn't figure out what return code value would generate when the "out of sequence" error was posted. |
ICE068A 0 OUT OF SEQUENCE SORTIN01
RC=16 |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Quote: |
I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output. |
I ran a test sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hi Frank,
Quote: |
sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats. |
Yup, i'd believe that because both are "in sequence" already.
If the same file had a 10 positon sort-key and was sorted by (10,1,ch,a,9,1,ch,a,8,8,ch,a. . . etc), i would expect the resource usage to go up (both cpu and excp). As i hinted at earlier, the results are more dramatic for large files. |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Quote: |
Yup, i'd believe that because both are "in sequence" already. |
No, only one file is already in sequence. The experiment I did is like this:
Sort 1 - input records look like this (RECFM=FB,LRECL=100)
Code: |
00000001
...
00300000
|
I used SORT FIELDS=(11,8,ZD,A) - so records are already in order.
Sort 2 - input records look like this (RECFM=FB,LRECL=100)
Code: |
00300000
...
00000001
|
I used SORT FIELDS=(1,3,CH,A) - so records are not already in order.
I think this experiment tests one variation of your statement that
Quote: |
If the file is completely in sequence, only a fraction of the usual resources are used |
(although it certainly doesn't test every variation). The first file is completely in sequence. The second file is completely out of sequence. But there was no appreciable difference in the resources used, so I don't think your blanket statement above is accurate, although it certainly may be true in certain cases.
I've asked our Performance Team Lead to comment on this. He knows much more about Performance than I do, so I'll defer to him. |
|
Back to top |
|
|
Dave Betten
New User
Joined: 24 Jan 2006 Posts: 26
|
|
|
|
This notion of whether a sort uses less resources if the input is already in sequence is complex and there's no simple answer. First, one has to define what we mean by resources?
CPU - in some cases (but not all) there will be less cpu time required to do the sort if the input is already in sequence. The degree of cpu savings is going to vary depending on the characteristics of the sort and the available resources. During the input phase, we gain some efficiencies in the number of instructions required since the records we read in are already in sequence. During the output phase, we may or may not be more efficient in how we merge those strings and write the output. Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect. But in today's environments where we can optimize main storage for larger sorts, that merge process is more efficient. And in cases where we were able to read the entire file into memory, we're going to be very efficient whether the input was in sequence or not.
Intermediate storage - Whether we're talking central storage or DASD, we're still going to have to store the entire file in some sort of intermediate storage. This is because we can't write any of the records out until the last one is read. For all we know, that last record could be the fist one to be written out! Yes, we might have some efficiencies in and reduce our intermediate storage requirement slightly, but we're still going to have to store the entire file somewhere. So I'd say you're going to require almost the same amount of intermediate storage whether your input is in sequence or not. And if that intermediate storage is on DASD, you're going to still do quite a bit of I/O to work datasets that you would never need to do if you ran the file through a MERGE as was suggested earlier. This is the main reason, I would disagree with the idea that a sort is going to use "a fraction of the usual resources" if the input file is already in sequence. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hi Dave,
Thanks for your reply.
Quote: |
Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect. |
Long ago and far away. . . And that would have been when we were trying these experiments
d |
|
Back to top |
|
|
|