Extract matching records from two files

murmohk1 · Posted: Wed Jan 24, 2007 1:19 pm

Hi All,

I have a flat file with n records (the count varies time to time).

My requirement is to extract all the above records (with duplicates) from another flat file (say FILE2). File2 contains few million records.

File attributes:

(1) LRECL = 400
(2) Recfm=fb
(3) Key length=14 starts at 16 column.

I used Create files with matching and non-matching records from SORTTRICK. As the number of records in file2 is in millions, my job is abending with SB37.

Is there another way to acheive the same in different manner (without using temp files or taking much space).

Regads,
Murali

William Thompson · Posted: Wed Jan 24, 2007 2:37 pm

First off, is it working correctly (at least until it abends)? Have you tested it against a small subset of both files?
Is it b37ing just the one output file? Does the output allocation equal the input allocation (since the max size could be the entire input file)?

Yes, there are other ways, but which one is dependant on the size of the key file and the sort order of the large file2.

murmohk1 · Posted: Wed Jan 24, 2007 3:25 pm

The JCL was working fine when I tried with 67 sample records.

Regarding B37 abend, since we are copying entire file2, Im unable to get the required space. Moreover file2 is on tape and writing files on tape for test purpose is prohibited in my shop.

William Thompson · Posted: Wed Jan 24, 2007 3:47 pm

IQofaGerbil · Posted: Wed Jan 24, 2007 5:50 pm

What is the purpose of your test?

1 - a full system test using production sized datasets?
If yes and you do not have enough DASD (have you tried multi-volume DASD allocation perhaps?) or permission to use tapes, then you would seem to be stuck

2- to test the functionality of your process?
If yes then use a cut down version of the production file, estimate how big a file you will get away with using, then base your test on that.

murmohk1 · Posted: Thu Jan 25, 2007 10:14 am

IQofaGerbil · Posted: Thu Jan 25, 2007 4:23 pm

Which file is giving the B37?

muthuvel · Posted: Thu Jan 25, 2007 5:13 pm

Please try giving

murmohk1 · Posted: Sun Jan 28, 2007 11:20 am

Thanks all for the replies.

F1 is throwing space abend.

William Thompson · Posted: Sun Jan 28, 2007 11:30 pm

I doubt this will help, but what is the current allocation of the two input files?
What is the content of the IEC030I message?

dick scherrer · Posted: Mon Jan 29, 2007 12:04 am

Hello,

If you wrote a small COBOL program that would "match" the 2 files and write out what you need to meet your requirement, you would eliminate the need for more dasd or permission to use "work" tape(s) or data compression or some other work-around.

It would very likely run as fast or faster than the process that needs very large intermediate/transient storage. A 2-file match/merge is a single pass of the data and will run about the same speed as merely reading the files sequentially.

IQofaGerbil · Posted: Mon Jan 29, 2007 4:37 pm

If I have got this correct,
F1 will be smaller than IN1+IN2 because it has all of the dups removed
T1 will be bigger than IN1+IN2
so if F1 fails on space then T1 will surely also fail for same reason?

You say that the main file has 'millions' of records. Do you know how many millions?

It looks to me that you need approx 7000 tracks per million records, so on a model-3 3390 disk you will squeeze in appox 7 million records.
That of course assumes that you get your hands on an empty volume (not likely!)

Unless you can view your starage pool to see what is available, then it looks like to might need to calculate your storage requirements accurately and then speak to your storage managment people to see if they can accomodate.

Also consider (depending on your actual calculations) reducing your secondary space allocation request, you might be getting a B37 because the disk allocated to you does not have 1200cyls available when you might not actually need it.

murmohk1 · Posted: Mon Jan 29, 2007 5:00 pm

Thanks IQofaGerbil for the information.

Since IN1 is a master file (kind of), record count is getting increased daily. As of now its holding close to 4 million records. As expected, Im unable to get empty volume.

Writing a program is ruled out as the records are stored randomly. I need to open/close the multiple times (which again is not a good programming technique).

Is there a way to extract the records in some other manner.

IQofaGerbil · Posted: Mon Jan 29, 2007 5:30 pm

Looks like you 'only' need approx 2000 cyls for each of T1 F1.

Can you see your storage pools to find out if there are disks with that kind of space available?
Depending on the storage management system in your shop there might be few disks with 'big' (1200cyls) amounts of contiguous space but lots with small/meduim amounts.

Use trial and error , why not try playing with your allocation numbers eg (480,90) or (150,150)

dick scherrer · Posted: Mon Jan 29, 2007 8:29 pm

Hello,

Writing a program should NOT be ruled just because the way the data is stored is not convenient for this process. If your data is "random", sort it before comparing the files. You do not need to keep the sorted data, just use it for the compare, then delete it.

Depending on just how your process works, you may have created a process that will run for many, many hours - if it ever completes with the full volume of data. If you need to open/read/close a file containing several million records and do this 60-70thousand times, my guess is that the job will never be allowed to complete. If you multiple 65,000 by 5 million, you get 325,000,000,000 "reads".

murmohk1 · Posted: Tue Jan 30, 2007 5:57 pm

Whether file is sorted or not, I guess it occupies same space. Since the required space was not available for my job (using dfsort technique), job is failing with space abend.

shuklas · New User Joined: 21 Dec 2006 Posts: 20 Location: London

You can use DATACLAS=DSIZE10

It can accommodate 10MB of data.

dick scherrer · Posted: Tue Jan 30, 2007 8:54 pm

Hello,

Please post your sort jcl, the control statements, and the abend info.

murmohk1 · Posted: Thu Feb 01, 2007 5:48 pm

I had posted my JCL in the previous posts. Attached is the spool content (xdc).

Note : Ran the job again today for the spool content. I had changed the SPACE parameter only, others being as it was previous run.

dick scherrer · Posted: Thu Feb 01, 2007 9:05 pm

Hello,

Are you sure you posted the output that was created from the JCL you posted previously? Your posted jcl says it is STEP3. The jesysmsg.doc has no STEP3. This is where the abend occurred in that attached output

murmohk1 · Posted: Fri Feb 02, 2007 2:14 pm

In my shop, volume allocation is done dynamically by the system (as told by the batch management people). So, I havent used UNIT parameter in the job.

Also, I had attached the job from JES output.

dick scherrer · Posted: Fri Feb 02, 2007 8:46 pm

Hello,

From here, i'd recommend one or more of the following:

1. Talk with the batch management people and find out how much space is available in the storage class your job dynamically uses.

2. Try a run with these jcl changes (from above - UNIT=(SYSDA,16) and SPACE=(CYL,(2500,500),rlse) and see if that helps. If 2500 is too big, lower it, but if you cannot get 2500 initially, i suspect you will still have space issues. In this shop, our datasets often dynamically span packs (we do use the basic unit parameter), but when i ran into space problems, includeing the ",16" got around the abends.

3. Go ahead and write the COBOL code. After all, most of the folks here ARE programmers

Good luck and keep us posted.