The three fields after key have in common to be out of the key and not summed.
M1 is a packed decimal value to be summed.
The problem is that DFSORT chooses at random one value among all possible
To take your example,
code :
Code:
Key1bAbM1
Key1bGXM1
Key1bbBM1
Key1CbbM1
Key1DFbM1
after sort/sum only one record will be written, but which value in position 4, 5 and 6 ?
in position 4 : may be b(lank) or C, or D...
The rule : I want any non blank value if it exists (here in position 4, C or D, it doesn't matter, but not b)
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Quote:
The problem is that DFSORT chooses at random one value among all possible
Not really. If you have EQUALS in effect, then DFSORT chooses the first record with each key. If you have NOEQUALS in effect, then DFSORT chooses one record with each key. But you want to take a different field from different records with the same key. There's no built-in function to do that.
However, you can do it with the following DFSORT/ICETOOL job. You didn't tell me the starting postion and length of your PD field as I asked, so for the example, I assumed it was in positions 8-9. You can change the job appropriately if needed.
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Quote:
I don't understand why there's no "WITH(5,1)"
WITH(5,1) isn't necessary. 5,1 comes from the base record (first record). 6,1, comes from the first overlay record (second record). 7,1 comes from the second overlay record (third record). 8,2 comes from the third overlay record (fourth record).
For complete details on how the SPLICE operator of DFSORT's ICETOOL works, see:
what about the performance of such a step with million of records?
"Performance" in the context you're using it is a very vague term. Compared to what? Measured by what criteria? The only way to determine if "performance" is "acceptable" by your criteria is to run some experiments and evaluate the results for yourself.
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Quote:
Are there 4 sorts running one after the other ? In this case, it might be expensive...
There are actually 5 sorts here. Each SELECT does a sort. SORT does a sort. And SPLICE does a sort. Each SELECT only writes a subset of the records. The SORT and SPLICE only sort a subset of the records. Whether it's "expensive" depends on how many records are in the input file and what you consider "expensive". The reason we need all of these sorts is that you're looking for several fields in each record among a group of records. DFSORT is record oriented, not group oriented, although it can do "tricks" with groups. It might be possible to do it another way using less sorts (e.g. COPYs instead of SORTs), but I don't have time to figure it out for you.
Quote:
I compare with a COBOL program where MOVE for non blank value are made after "internal" sort and before summing and writing in output file.
You asked for a sort solution, so I assumed you didn't want to write a program. This could certainly be done faster using DFSORT and an E35 exit with the appropriate logic (probably similar to whatever you'd use for your COBOL program and perhaps faster than the COBOL program or perhaps not).
Certainly, nobody is stopping you from writing a COBOL program if that's what you want.
Certainly, nobody is stopping you from writing a COBOL program if that's what you want.
Note that I don't want to write a COBOL program... for the reason this one already exists in prod.
But I ask myself if it would be better to do it with DFSORT in the future.
Quote:
This could certainly be done faster using DFSORT and an E35 exit with the appropriate logic (probably similar to whatever you'd use for your COBOL program and perhaps faster than the COBOL program or perhaps not).
In fact, the E35 could be a way to improve performance... I think nobody here at work have run it yet
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
If you already have the COBOL program, then you could run it and the DFSORT solution I gave you with the same data and see what the performance looks like for yourself. I suspect the COBOL program will be "faster" in this case because it's more targeted to what it has to do whereas DFSORT is a general purpose utility. On the other hand, DFSORT with an E35 might be faster than the COBOL program because it could also be targeted to what it has to do.
Note that if you had said up-front that you already have a COBOL program and were looking for a DFSORT solution that would run faster than the COBOL program, I probably wouldn't have bothered to post my solution. I thought you didn't have a way to do what you wanted and were looking for one. Please try to be more forthcoming about what you're trying to do in the future.
Note that your solution was very interesting for me and I will certainly use it in the future. Sure It would have been a shame you didn't post the solution. I'm here to learn.
Now, I recognize there was two questions in my initial post :
1- how to do it with DFSORT ?
2- which solution is the best with ten millions of records ?
I have not preconceived idea on the way to use either DFSORT or COBOL.
The fact is that we have more than 2000 COBOL programs and a few problems of performance. I wonder if in some cases, it wouldn't be better to replace "internal" SORT by DFSORT.