Portal | Manuals | References | Downloads | Info | Programs | JCLs | Mainframe wiki | Quick Ref
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Profile Log in to check your private messages Log in
 
How I can verify the data is in sorted order without sorting

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL
View previous topic :: :: View next topic  
Author Message
PUMA

New User


Joined: 08 Aug 2006
Posts: 10
Location: FRANCE

PostPosted: Fri Sep 19, 2008 10:32 pm    Post subject: How I can verify the data is in sorted order without sorting
Reply with quote

Hi

Is there a facility in DFSORT or ICETOOL to verify that the data is in sorted order in my very large file to prevent the sort mechanism.
In fact is there a function that read all my data and verify that all my data is sorted on a criteria .

Thinks
Back to top
View user's profile Send private message

enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10311
Location: italy

PostPosted: Sun Sep 21, 2008 1:38 pm    Post subject: Reply to: How I can verify the data is in sorted order witho
Reply with quote

if the procedures have been setup in the proper way,
and the programs have been properly tested ........

there is no need for such check


just curious...
and if the dataset is not sorted what are You going do do ??
another pass to sort it maybe .....
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Sun Sep 21, 2008 2:40 pm    Post subject:
Reply with quote

Hello,

Quote:
if the procedures have been setup in the proper way,
and the programs have been properly tested ........
there is no need for such check
Agree with Enrico.

FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Sun Sep 21, 2008 9:00 pm    Post subject:
Reply with quote

Puma,

You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT.

Code:

//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN01 DD DSN=...  file you want to check
//SORTOUT DD DUMMY
//SYSIN DD *
  MERGE FIELDS=(...)
/*


FIELDS for MERGE would be the same as you'd use for SORT.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Sun Sep 21, 2008 9:07 pm    Post subject:
Reply with quote

Quote:
FWIW - if this is an external file and you cannot control the content and it "should" be in sequence (but might not), go ahead and run the sort. If the file is completely in sequence, only a fraction of the usual resources are used. Sort is smart enough to recognize the data is already in sequence and processes accordingly.


Huh? Where did you get that idea? It's not true. For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order.
Back to top
View user's profile Send private message
CICS Guy

Senior Member


Joined: 18 Jul 2007
Posts: 2150
Location: At my coffee table

PostPosted: Sun Sep 21, 2008 10:24 pm    Post subject:
Reply with quote

Frank Yaeger wrote:
You can do a one-file MERGE like the one below. MERGE will issue a message and terminate if it finds a record out of sorted order. MERGE is more efficient than SORT.
I was thiinking along the same lines but I thought that the merge needed two inputs (one dummy?).
Also, I couldn't figure out what return code value would generate when the "out of sequence" error was posted.
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Sun Sep 21, 2008 10:46 pm    Post subject:
Reply with quote

H Frank,

Quote:
For a sort application, DFSORT has no way of knowing in advance if the file is already in sorted order
No, it doesn't know in advance, but it can surely "see" this when the input is being processed.

Long ago some of the sort products scattered sequenced strings of input records and then merged the strings to create the final output. The longer the initial strings, the faster the sort ran (less strings equaled less manipulation). When we ran completely random sets of input the process took far longer/more resources than if an in sequence file was processed.

Maybe with all of the improvements in technology, this went away. Maybe it was not part of DFSORT. Maybe my memory suffers from far too many systems and products. . . icon_confused.gif

We don't have DFSORT available on my current systems, so i can't run a volume test. I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 8:24 pm    Post subject:
Reply with quote

Quote:
I thought that the merge needed two inputs (one dummy?).


No, it doesn't. You can do a MERGE with one file.

Quote:
I couldn't figure out what return code value would generate when the "out of sequence" error was posted.


ICE068A 0 OUT OF SEQUENCE SORTIN01

RC=16
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 8:36 pm    Post subject:
Reply with quote

Quote:
I'd be interested in seeing the difference in run stats (if any) in sorting a file with a few million "unsorted" records and then re-sorting the sorted output.


I ran a test sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats.
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Sep 22, 2008 9:18 pm    Post subject:
Reply with quote

Hi Frank,

Quote:
sorting records that were already in order vs sorting records that were in reverse order and there was no appreciable difference in the run stats.
Yup, i'd believe that because both are "in sequence" already.

If the same file had a 10 positon sort-key and was sorted by (10,1,ch,a,9,1,ch,a,8,8,ch,a. . . etc), i would expect the resource usage to go up (both cpu and excp). As i hinted at earlier, the results are more dramatic for large files.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Moderator


Joined: 15 Feb 2005
Posts: 7130
Location: San Jose, CA

PostPosted: Mon Sep 22, 2008 9:39 pm    Post subject:
Reply with quote

Quote:
Yup, i'd believe that because both are "in sequence" already.


No, only one file is already in sequence. The experiment I did is like this:

Sort 1 - input records look like this (RECFM=FB,LRECL=100)

Code:

          00000001
          ...
          00300000


I used SORT FIELDS=(11,8,ZD,A) - so records are already in order.

Sort 2 - input records look like this (RECFM=FB,LRECL=100)

Code:

          00300000
          ...
          00000001


I used SORT FIELDS=(1,3,CH,A) - so records are not already in order.

I think this experiment tests one variation of your statement that
Quote:
If the file is completely in sequence, only a fraction of the usual resources are used
(although it certainly doesn't test every variation). The first file is completely in sequence. The second file is completely out of sequence. But there was no appreciable difference in the resources used, so I don't think your blanket statement above is accurate, although it certainly may be true in certain cases.

I've asked our Performance Team Lead to comment on this. He knows much more about Performance than I do, so I'll defer to him.
Back to top
View user's profile Send private message
Dave Betten

New User


Joined: 24 Jan 2006
Posts: 26

PostPosted: Mon Sep 22, 2008 10:49 pm    Post subject:
Reply with quote

This notion of whether a sort uses less resources if the input is already in sequence is complex and there's no simple answer. First, one has to define what we mean by resources?

CPU - in some cases (but not all) there will be less cpu time required to do the sort if the input is already in sequence. The degree of cpu savings is going to vary depending on the characteristics of the sort and the available resources. During the input phase, we gain some efficiencies in the number of instructions required since the records we read in are already in sequence. During the output phase, we may or may not be more efficient in how we merge those strings and write the output. Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect. But in today's environments where we can optimize main storage for larger sorts, that merge process is more efficient. And in cases where we were able to read the entire file into memory, we're going to be very efficient whether the input was in sequence or not.

Intermediate storage - Whether we're talking central storage or DASD, we're still going to have to store the entire file in some sort of intermediate storage. This is because we can't write any of the records out until the last one is read. For all we know, that last record could be the fist one to be written out! Yes, we might have some efficiencies in and reduce our intermediate storage requirement slightly, but we're still going to have to store the entire file somewhere. So I'd say you're going to require almost the same amount of intermediate storage whether your input is in sequence or not. And if that intermediate storage is on DASD, you're going to still do quite a bit of I/O to work datasets that you would never need to do if you ran the file through a MERGE as was suggested earlier. This is the main reason, I would disagree with the idea that a sort is going to use "a fraction of the usual resources" if the input file is already in sequence.
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Sep 22, 2008 11:12 pm    Post subject: Reply to: How I can verify the data is in sorted order witho
Reply with quote

Hi Dave,

Thanks for your reply.

Quote:
Long ago when we were running with limited amounts of main storage, the process of merging those strings was less efficient so having the data already in sequence could have a big effect.
Long ago and far away. . . And that would have been when we were trying these experiments icon_smile.gif

d
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts File Aid tool to compare numeric data balaji81_k Compuware & Other Tools 1 Tue Sep 26, 2017 3:35 am
No new posts Question related to Data dictionary f... rexx77 SYNCSORT 3 Thu Aug 31, 2017 7:23 am
No new posts Extract Data till prev hour balaji81_k DB2 16 Fri Aug 25, 2017 9:25 pm
No new posts FTP data transfer from PC to Mainfram... AJAYREDDY All Other Mainframe Topics 8 Wed Aug 23, 2017 9:59 pm
No new posts Sorting on text - but in non-alphabet... Roy Ware SYNCSORT 5 Wed Aug 23, 2017 9:15 pm

Facebook
Back to Top
 
Job Vacancies | Forum Rules | Bookmarks | Subscriptions | FAQ | Polls | Contact Us