View previous topic :: View next topic
|
Author |
Message |
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Hi,
I have a file that is split into 2 files, a 'reversal' and an 'other' file.
We compare the 2 files and eliminate any matches. We then write the clean records to a new file.
The problem is that we may have duplicate keys on both files i.e.
REVERSAL FILE
Code: |
1234567890
1234567890
1111111111
2222222222 |
OTHER FILE
Code: |
1234567890
1234567890
1111111111
2222222222
3333333333
4444444444
5555555555 |
I want the 'CLEAN' file to look like this:
Code: |
3333333333
4444444444
5555555555 |
but it actually looks like this:
Code: |
1234567890
3333333333
4444444444
5555555555 |
When we have a duplicate key it only eliminates one of the records from the CLEAN file rather than the duplicates as well.
Here is the code I have:
Code: |
JOB INPUT (FL50938O KEY(FL50938O-KEY) FL50938R KEY(FL50938R-KEY))
COMPARE-FILES
IF FL50938O
WS-OTHERS = WS-OTHERS + 1
IF NOT FL50938R
PERFORM WRITE-CLEAN
ELSE
WS-REFUNDS = WS-REFUNDS + 1
WS-MATCHED = WS-MATCHED + 1
END-IF
ELSE
IF FL50938R
PRINT UNMATCHED-REFUNDS
WS-UNMATCHED = WS-UNMATCHED + 1
WS-REFUNDS = WS-REFUNDS + 1
END-IF
END-IF
WRITE-CLEAN. PROC
FL50938C-REC = FL50938O-REC
FL50938C-FROM-DATE = WS-FROM-DATE
FL50938C-TO-DATE = WS-TO-DATE
PUT FL50938C
WS-WRITTEN = WS-WRITTEN + 1
END-PROC |
Is there a way I can eliminate the dups? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I assume that the sample data not being in sequence is just the way you made it for posting?
You're doing the "matched file processing" without any of the file-matching tests - consult your manual and try some of those. Search/browse this forum. Probably search for MATCHED will be enough. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Bill Woodger wrote: |
I assume that the sample data not being in sequence is just the way you made it for posting?
You're doing the "matched file processing" without any of the file-matching tests - consult your manual and try some of those. Search/browse this forum. Probably search for MATCHED will be enough. |
Yes it should be in sequence....so it would be:
REVERSAL
Code: |
1111111111
1234567890
1234567890
2222222222 |
OTHER
Code: |
1111111111
1234567890
1234567890
2222222222
3333333333
4444444444
5555555555 |
Expected:
Code: |
3333333333
4444444444
5555555555 |
Actual:
Code: |
1234567890
3333333333
4444444444
5555555555 |
|
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
OK, I searched the CA forum for MATCHED for you.
Can you read through this? |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Bill Woodger wrote: |
OK, I searched the CA forum for MATCHED for you.
Can you read through this? |
Bill,
I've had a look but it seems to be a slightly different issue.
I can have duplicates on both files.
Here is some more example data:
REVERSAL
Code: |
1111
1111
3333
5555 |
OTHER
Code: |
1111
1111
1111
2222
3333
3333
4444
5555
5555 |
I'd expect my output to be:
Code: |
1111
2222
3333
4444
5555 |
Actual is:
1111
2222
3333
4444 |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
OK, let step back a... step.
Code: |
111
111
222
111
111 |
With those as your file one and two, you will get, with your code:
Why? Because that is the way the matched file processing works.
If you want:
as your output, you can arrange for that.
The information and explanation is all in the topic I linked to. That topic has duplicates on both files and wants to know how to deal with them. If a different outcome is required, that does not matter for your purpose.
Do you have a manual for Easytrieve Plus? Get to the part about the Synchronized File Processing. Look at the special conditions available.
You can then use one of those special conditions with your existing code or, better, use just those conditions rather than just testing for file-presence.
If you set up a small test program to read DD * data and DISPLAY the results in that program, you should find
Code: |
FILE1 Record 1 present, FILE2 Record 1 present
FILE1 Record 1 present, FILE2 Record 2 present
FILE1 Record 2 present, NO File 2 record present
|
Duplicates on both files is a complicated thing in Synchronized File Processing. You can code the matching yourself instead if you find the resultant code would be clearer. I'd use a "sideways match". Use File 1 as the driver. Look to File 2 from File 1. Similar type of processing to "SKIP SEQUENTIAL" except you'd do it all with GETs. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Tried the following and still doesnt seem to work:
Code: |
COMPARE-FILES
IF NOT MATCHED
IF FL50938O
WS-OTHERS = WS-OTHERS + 1
PERFORM WRITE-CLEAN
END-IF
IF FL50938R
PRINT UNMATCHED-REFUNDS
WS-UNMATCHED = WS-UNMATCHED + 1
WS-REFUNDS = WS-REFUNDS + 1
END-IF
ELSE
IF LAST-DUP FL50938R OR NOT DUPLICATE FL50938R
WS-REFUNDS = WS-REFUNDS + 1
WS-MATCHED = WS-MATCHED + 1
WS-OTHERS = WS-OTHERS + 1
END-IF
END-IF |
Also tried it with first-dup.
Any ideas? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I'm sorry not to give you the exact answer, but it goes like this:
With duplicates on both files, the SFP is complex.
Because the SFP is complex, you have to understand it.
If I just give you working code and you don't understand how the match is working, you're stuck for documenting it, changing it, explaining it to colleagues.
My "rule-of-thumb" for SFP is "avoid duplicates on both files".
If this is not possible, I'd more likely code the manual matching. Reason being, next one along won't understand so well how to change the thing, or what exactly (and easily) is happening, if I use SFP.
If you are determined to stick with SFP for this, your experiments, the topic I linked to, and the manual will lead you to the correct code without me having to supply it (I can't test it, anyway). Or you'll recogmise that it is somewhat less intuitive than is suited to the program and you'll code the manual match.
If you had been using the matching conditions already (like in the other topic) the story is slightly different, as you should be able to see, for the provision of code. To use SFP with one-to-one is easy with the conditions. With one-to-many is easy with the conditions. With many-to-many is complex, even with the conditions.
If you are new to the matching conditions, you need to get a good grip of them, which you will get if you follow through on what I suggested.
If you get nowhere, check back tomorrow. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Bill Woodger wrote: |
I'm sorry not to give you the exact answer, but it goes like this:
With duplicates on both files, the SFP is complex.
Because the SFP is complex, you have to understand it.
If I just give you working code and you don't understand how the match is working, you're stuck for documenting it, changing it, explaining it to colleagues.
My "rule-of-thumb" for SFP is "avoid duplicates on both files".
If this is not possible, I'd more likely code the manual matching. Reason being, next one along won't understand so well how to change the thing, or what exactly (and easily) is happening, if I use SFP.
If you are determined to stick with SFP for this, your experiments, the topic I linked to, and the manual will lead you to the correct code without me having to supply it (I can't test it, anyway). Or you'll recogmise that it is somewhat less intuitive than is suited to the program and you'll code the manual match.
If you had been using the matching conditions already (like in the other topic) the story is slightly different, as you should be able to see, for the provision of code. To use SFP with one-to-one is easy with the conditions. With one-to-many is easy with the conditions. With many-to-many is complex, even with the conditions.
If you are new to the matching conditions, you need to get a good grip of them, which you will get if you follow through on what I suggested.
If you get nowhere, check back tomorrow. |
The duplicates on both files is unavoidable. This is an existing live issue.
I'll maybe try the manual matching. |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
since you don't seem to have the witherwithall to manage this
'complex' logic,
why don't you presort both files, removing dups,
then you have a simple ezytrieve program? |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
dbzTHEdinosauer wrote: |
since you don't seem to have the witherwithall to manage this
'complex' logic,
why don't you presort both files, removing dups,
then you have a simple ezytrieve program? |
I cant simply remove dups.
I have to match them first and then remove.
For example if i had file a:
Code: |
1111
1111
1111
2222
2222
2222 |
and file b:
I would expect:
Code: |
1111
1111
2222
2222 |
Your suggestion however would give me nothing i.e. empty file |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10888 Location: italy
|
|
|
|
REVERSAL
Code: |
1111
1111
3333
5555 |
OTHER
Code: |
1111
1111
1111
2222
3333
3333
4444
5555
5555 |
I'd expect my output to be:
Code: |
1111
2222
3333
4444
5555 |
for example the OTHER 1111 being kept will be last one or the first one ???
IMO the whole business approach is flawed ...
just curious ... the <duplicate> check is only on the key or on the whole record |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
enrico-sorichetti wrote: |
REVERSAL
Code: |
1111
1111
3333
5555 |
OTHER
Code: |
1111
1111
1111
2222
3333
3333
4444
5555
5555 |
I'd expect my output to be:
Code: |
1111
2222
3333
4444
5555 |
for example the OTHER 1111 being kept will be last one or the first one ???
IMO the whole business approach is flawed ...
just curious ... the <duplicate> check is only on the key or on the whole record |
It doesnt matter whether its first or last.....as long as the correct number of reversals are eliminated
The business approach is not flawed if you knew the context.
The use of Easytrieve under these circumstances is the flaw. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
The use of Easytrieve under these circumstances is the flaw. |
The flaw is Not Easytrieve, but rather the knowledge / creativity of those trying to use it. . .
It is rather frustrating that these software vendors write code for the masses, not necessarily as one would prefer. . .
What i understand as your objective has been solved many times using Easytrieve - it just takes a bit of work and is not necessarily automagic.
Would you know how to code this in COBOL (i surely hope so)? The same technique can be used in Easytrieve (as Bill has mentioned). |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10888 Location: italy
|
|
|
|
Quote: |
The business approach is not flawed if you knew the context. |
whenever people talk about duplicates on a key which is a subset of the whole record
and tell that it does not make any difference which <duplicate> is kept...
than the logic is flawed ( wether You like it or not )
or somebody was too lazy do describe clearly the problem
since we reply on our own time and free of charge
the minimum that is expected from people asking for help is a proper description of the problem. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
enrico-sorichetti wrote: |
Quote: |
The business approach is not flawed if you knew the context. |
whenever people talk about duplicates on a key which is a subset of the whole record
and tell that it does not make any difference which <duplicate> is kept...
than the logic is flawed ( wether You like it or not )
or somebody was too lazy do describe clearly the problem
since we reply on our own time and free of charge
the minimum that is expected from people asking for help is a proper description of the problem. |
Whether we agree or not if the business approach is flawed, the problem description was clear and concise. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Unless i have missed something, you have been told how to do what you need.
What was not clear? If you understood the code needed, implement away If there is some doubt, post what we might clarify. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
dick scherrer wrote: |
Hello,
Unless i have missed something, you have been told how to do what you need.
What was not clear? If you understood the code needed, implement away If there is some doubt, post what we might clarify. |
I'm still no further forward...... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
I'm still no further forward...... |
Most of the work getting help from the forum is up to the requestor. Saying "no progress" is no help to someone trying to help you.
Maybe it is not yet clear that you need to toss out the "MATCHED" code and write your own code to control the processing of the 2 files. No matter what you do, the files need to be in the same key sequence.
If you do not understand how to writ ethis kind of code, there is a "Sticky" at the top of the COBOL part of the forum that is a tested, working example of matching/merging 2 files that are in sequence. This same logic has been used rather often in Easytrieve as well. It may be of value to get the cobol sample running for your files and then re-code in Easytrieve - if the deliverable has to be Easytrieve. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
No problem. Ill have a look.
Thanks |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
well, i am not an ezytrieve specialist,
but there is such a thing as JOB INPUT NULL
and a GET instruction.
you need two additional proc's. one to GET the O's and one to GET the R's.
you don't want to have a JOB INPUT statement that reads one record from both,
because you don't always want to read a record from both. |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
you (sorta) stipulated that reversals should eliminate matches on the other file.
what if you have an unmatched reversal?
PERFORM GET R
PERFORM GET O
PROCESS:
if R end-of-file Perform write rest of O's as clean
If O end-of-file Perform write rest of R's as unmatched
If R = O then PERFORM GET R, Perform GET O, goto process
If R < O then Perform write unmatched R, Perform GET R, goto process
IF R > O then PERFORM Write O as clean, Perform GET O, goto process
goto process. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Quote: |
The use of Easytrieve under these circumstances is the flaw. |
The only flaw was the code. OK, maybe the requirement. I'm with Dick on former and enrico on latter.
From the sample data you showed after your first post:
- The SFP was never going to do what you want
- The coder not only didn't know how to do the SFP, but neither the matching conditions
- The coder failed to test properly
- Every single tester down the line, of all stripes, failed
- Something that just would not work managed to get into production
To blame the language for the mal-use of an element of the language is absurd.
Code it out. Get it going how it should have been at first. Dbz has given a start. |
|
Back to top |
|
|
Michaelod Warnings : 1 New User
Joined: 02 Sep 2008 Posts: 49 Location: Edinburgh
|
|
|
|
Bill Woodger wrote: |
Quote: |
The use of Easytrieve under these circumstances is the flaw. |
The only flaw was the code. OK, maybe the requirement. I'm with Dick on former and enrico on latter.
From the sample data you showed after your first post:
- The SFP was never going to do what you want
- The coder not only didn't know how to do the SFP, but neither the matching conditions
- The coder failed to test properly
- Every single tester down the line, of all stripes, failed
- Something that just would not work managed to get into production
To blame the language for the mal-use of an element of the language is absurd.
Code it out. Get it going how it should have been at first. Dbz has given a start. |
It may have come across wrong but I'm not blaming the language.
What I'm blaming is the lack of knowledge of the person who coded the easytrieve i.e. not understanding how to use it properly. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
What I'm blaming is the lack of knowledge of the person who coded the easytrieve i.e. not understanding how to use it properly. |
Yup, but that ship has sailed long ago.
Now a re-code/re-test is in order whether the original author is available or not.
Given thet the code "as is" many not ever be made to work, suggest you use the logic from the Sticky or the hints from DBZ. |
|
Back to top |
|
|
|