Need your thoughts about BSAM compression

sudarshan.srivathsav · New User Joined: 10 Jul 2012 Posts: 24 Location: USA

Hi,

I am trying to understand how BSAM does compression , Does IBM share the internal documentations of how they use BSAM to compress datasets. How much sampling they do before they start the actual compression.

I tried to PRINT the compressed striped dataset using the ADRDSSU utility, but i understood that after sampling a block of data, they start compressing.

I was interested to learn how IBM internally does this compression, what control blocks are involved in it etc etc.. I tried to see many documentations online but wasn't really successful in finding something useful.

Any inputs will be really helpful.

Thanks,
Sudarshan

Rohit Umarjikar · Posted: Thu Aug 14, 2014 2:59 am

Did you look these links which may help you?

www.redbooks.ibm.com/abstracts/gg244251.html
publib.boulder.ibm.com/infocenter/ieduasst/stgv1r0/index.jsp?topic=/com.ibm.iea.zos/zos/1.0/DFSMS/V1R0-SAM-Extended-Format-100808/player.html

sudarshan.srivathsav · New User Joined: 10 Jul 2012 Posts: 24 Location: USA

Thanks Rohit.

Have another question, BSAM initially does sampling to find the compression token, but then later it does sampling again after sometime, but trying to find what triggers BSAM to do another sampling??

Any thoughts?

Pete Wilson · Posted: Wed Aug 20, 2014 9:30 pm

Between 8K and 64K for generic compression and much more for tailored compression is sampled in deciding whether a file is eligible for compression.

Also, I believe the initial files primary allocation size has to be at least 8MB (~10cyls) for compression services to be invoked.

Definitely read the link Rohit posted...I just did and learnt a bit more myself.

I'm not sure what you meant about the DFDSS PRINT. Any process that opens a compressed file automatically decompresses the data being read. The format of the output would depend on the parameters you feed into the utility or the type of utility reading the file. Some of the data may be 'unprintable'.

sudarshan.srivathsav · New User Joined: 10 Jul 2012 Posts: 24 Location: USA

Thanks Pete , I happened to read the document and did learn a lot from that.

The utility I used to print the data set would not decompress it, here is the jcl code snippet to do the same:

PeterHolland · Posted: Wed Aug 20, 2014 9:45 pm

sudarshan.srivathsav · New User Joined: 10 Jul 2012 Posts: 24 Location: USA

Peter, I agree with you, but someone who knew about it can share a generic idea about it, so i could get better idea of why more TCB/SRB is spent on compressing files, and why the compression ratio is bad etc..

Did not mean to steal anything from IBM

!!

Pete Wilson · Posted: Mon Sep 01, 2014 8:33 pm

Looking at your JCL Sudarshan I see you have a volser coded which implies it is not an SMS managed dataset, so it is probably not compressed. I think it's still the case that they have to be SMS managed to be compressed.

If a file IS compressed, the system automatically decompresses the file when you open it and you have no control over that. Can you show a full LISTCAT ENT output for the file to show if it has a compression token or not.

sudarshan.srivathsav · New User Joined: 10 Jul 2012 Posts: 24 Location: USA

Peter ,

Please below:

Pete Wilson · Posted: Fri Sep 12, 2014 10:39 pm

OK so it is definitely SMS managed and compressed.

Not sure how you know for certain it is not decompressing when you print it with DFDSS. Have you verified that by trying to print the file with IDCAMS or SAS or something? If you browse the file through something like Fileaid does the data appear to be printable characters?

I doubt you will easily be able to get any details on the internal workings of the compression process. Even if it is available this sort of thing is usually what's termed 'Licenced Material' and would restricted to the teams who need to know about it.

To be honest there's probably more productive ways to spend your time. Normally I'd encourage research but in this case I wouldn't.