Is Concatenation of files essential

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

Hi,

I want to search for a particular expression in a group of 10 files. Is it essential for me to concatenate all the files before proceeding? The occurence is same in all the files. I am using sort to find the expression.

Garry Carroll · Posted: Fri Jun 27, 2008 4:26 pm

Concatenate to process in one step or else process each file separately, I'd think. Also, depends on what you mean by

enrico-sorichetti · Posted: Fri Jun 27, 2008 4:28 pm

the main issue here is that the concatenation will be seen as a single file
and You will lose track of the file where the search argument will be located

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

So, Enrico you suggest me to search individual files instead of concatenating them?

enrico-sorichetti · Posted: Fri Jun 27, 2008 4:35 pm

all depends on what the result of the search should be

I would go for
single file to understand data occurrence patterns
concatenated files to extract records for further processing

Craq Giegerich · Posted: Fri Jun 27, 2008 5:05 pm

Is the results of the search a large number of records? Are you sorting the input file before selecting the records?

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

The Input files are sorted and the output file is not having large number of records when compared with the number of records in the Input files.

Phrzby Phil · Posted: Fri Jun 27, 2008 5:38 pm

SAS note: SAS provides a simple way of knowing which file (by position number in the list) of a concatenation you are in:

EOV=variable
names a variable that SAS sets to 1 when the first record in a file in a series of concatenated files is read. The variable is set only after SAS encounters the next file. Like automatic variables, the EOV= variable is not written to the data set.
Tip: Reset the EOV= variable back to 0 after SAS encounters each boundary.

dick scherrer · Posted: Fri Jun 27, 2008 8:27 pm

Hello,

gcicchet · Posted: Sat Jun 28, 2008 6:38 am

Hi,
you can still sort the file but as long as use the include/omit then the sort will only apply to the selected records.

Gerry

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

Actually, I was trying to merge the files using SYNCSORT. But the job got abended. The sysout message that I received is as under:

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

The job that I am using is as under:

Garry Carroll · Posted: Thu Jul 03, 2008 5:08 pm

This problem is something you should take up with your storage administrators. You are failing to get sufficient space allocated via SMS.

Garry.

Robert Sample · Posted: Thu Jul 03, 2008 5:09 pm

Typical 3390 mod 3 disk packs have 3335 cylinders of space. Asking for 4200 cylinders requires a mod 9 or higher. From the output messages, the storage pool you're asking for space from either (1) has no mod 9 devices defined, or (2) has no device with 4200 cylinders available in 5 extents or less.

Cut the space allocation to 3335 cylinders or less, and possibly use more volumes to get the same amount of space.

dick scherrer · Posted: Thu Jul 03, 2008 9:07 pm

Hello,

Are there multiple questions in this topic?

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

Hi Dick,

This is a single requirement.

The input files(which are 30 in #) are in tapes. The number of records in each input file is around 100million. I am required to pick up particularly 1800 records from these number of input files.

For this reason previously I was trying to concatenate the files.

Can you guide me through a good approach.

dick scherrer · Posted: Fri Jul 04, 2008 9:45 am

Hello,

Will there be 1800 total records selected or will there be 1800 records per input tape selected?

1800 5k records is not a terribly high volume. . . You still might talk with your storage management peole and ask what is the proper device and/or dataclas for your output files. They will best know the storage configuration on yhour system.

How long does it take to read one of these input tapes? Without knowing much about your data and your environment, my first approach would probably be to split the task nto 5 or even 10 jobs and create a smaller selected output from each. Once the separate jobs successfully complete, the selected files could be copied back to one tape file or used in-place on dasd until this requirement to use the data is met and then backed up.

Again, coordinating with the storage management people should help.

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

Thank you Dick for your suggestion.

Is there any method by which we can restore the Tape datasets into DASD?

gcicchet · Posted: Fri Jul 04, 2008 10:46 am

Hi,
why would you need to copy the tape datasets to disk, you still need to read all of the data and use up lots of disk space.

Gerry

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

But, can we sort that huge number of input records that are residing in the tape?

The job is utilizing huge amount of CPU time and at the end its abending with SE37.

gcicchet · Posted: Fri Jul 04, 2008 11:05 am

Hi,

it makes no difference whether you are sorting from disk or tape.

The SE37 is on which file ?

Why are you sorting the file if you are extracting data ?

Extract the data and then sort it. From my understanding, you are not selecting many records.

Gerry

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

The runjcl that I am using is as under:

swapnadeep.ganguly · Active User Joined: 21 Mar 2007 Posts: 203 Location: India

I am getting the SE37 abend on the SORTOUT file.

gcicchet · Posted: Fri Jul 04, 2008 12:01 pm

Hi,

like it has been mentioned earlier, run it 1 file at a time and merge at the end.

Output can be written to tape if space is an issue.

Can you also post the joblog output including messages.

Gerry