IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Masking of non-blank characters


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 202
Location: Toronto, ON, Canada

PostPosted: Thu Nov 11, 2010 2:01 am
Reply with quote

Is there a way to change all non-blank characters within a field to a single character (such as a 'X') to achieve data masking? So
"John Smith "
would become
"XXXX XXXXX "
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Thu Nov 11, 2010 2:40 am
Reply with quote

Jerryte,

You can use the following DFSORT JCL to get the desired results. It will replace A-Z and 0-9 with X.

Code:

//STEP0100 EXEC PGM=SORT                                           
//SYSOUT   DD SYSOUT=*                                             
//SORTIN   DD *                                                   
JOHN SMITH                                                         
//SORTOUT  DD SYSOUT=*                                             
//SYSIN    DD *                                                   
  SORT FIELDS=COPY                                                 
  INREC FINDREP=(IN=(C'A',C'B',C'C',C'D',C'E',C'F',C'G',C'H',C'I',
                     C'J',C'K',C'L',C'M',C'N',C'O',C'P',C'Q',C'R',
                     C'S',C'T',C'U',C'V',C'W',C'X',C'Y',C'Z',C'0',
                     C'1',C'2',C'3',C'4',C'5',C'6',C'7',C'8',C'9'),
                     OUT=C'X')                                     
                                                                   
//*
Back to top
View user's profile Send private message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 202
Location: Toronto, ON, Canada

PostPosted: Tue Nov 16, 2010 3:25 am
Reply with quote

I was hoping for something more elegant ie.
IF character <> ' ' THEN
replace with 'X'

I could expand the list to include lowercase letters and special characters.

Has anyone tried to use an ALTSEQ statement for translating characters?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Nov 16, 2010 3:33 am
Reply with quote

Hello,

You might consider IFTHEN and OVERLAY. . .

Quote:
Has anyone tried to use an ALTSEQ statement for translating characters?
Yes, there are multiple examples in the forum.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Tue Nov 16, 2010 10:20 am
Reply with quote

As Dick said already you might need a series of IFTHEN statements to scan each position in the field for NON-SPACE characters and replace it with 'X', rather than an ALTSEQ.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Tue Nov 16, 2010 11:16 pm
Reply with quote

dick scherrer wrote:
Hello,
You might consider IFTHEN and OVERLAY. . .


Isn't it an overkill if the LRECL exceeds 80 ? I mean even 80 is pushing it as you need to validate 1 byte at a time.

ALTSEQ or FINDREP are optimal choices when compared to multiple IFTHEN statements.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Nov 17, 2010 12:03 am
Reply with quote

Yup, IFTHEN/OVERLAY would possibly be overkill but for whatever reason, the FINDREP suggestion was rejected by TS. . .

So i posted this in case the
Quote:
I was hoping for something more elegant ie.
IF character <> ' ' THEN
replace with 'X'
request might be met. . .

d
Back to top
View user's profile Send private message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 202
Location: Toronto, ON, Canada

PostPosted: Wed Nov 17, 2010 2:51 am
Reply with quote

I used the below to do a translate. It works

DISCLAIMER: I might have missed a few characters in the translation so if someone uses the below they should check it first

Code:

** TRANSLATE MOST PRINTABLE CHARACTERS TO '?'                   
 ALTSEQ CODE=(4A6F,4B6F,4D6F,4E6F,4F6F,5A6F,5B6F,5C6F,5C6F,     
    5C6F,5D6F,5E6F,506F,6B6F,6C6F,6D6F,606F,616F,7A6F,7B6F,7C6F,
    7D6F,7E6F,7F6F,796F,816F,826F,836F,846F,856F,866F,876F,886F,
    896F,916F,926F,936F,946F,956F,966F,976F,986F,996F,A16F,A26F,
    A36F,A46F,A56F,A66F,A76F,A86F,A96F,B06F,C16F,C26F,C36F,C46F,
    C56F,C66F,C76F,C86F,C96F,D16F,D26F,D36F,D46F,D56F,D66F,D76F,
    D86F,D96F,E26F,E36F,E46F,E56F,E66F,E76F,E86F,E96F,F06F,F16F,
    F26F,F36F,F46F,F56F,F66F,F76F,F86F,F96F)                   
 SORT FIELDS=COPY
 OUTREC FIELDS=(11,30,TRAN=ALTSEQ)


To make the ALTSEQ list I manually typed all the printable characters and then used a rexx edit macro to convert them into hex codes.

It would be interesting to do a performance test against a large file to see if using ALTSEQ is faster or slower then doing an FINDREP
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed Nov 17, 2010 5:12 am
Reply with quote

jerryte,

It seems you missed a few and also you have quite a few repetitions.(5c is repeated thrice)

I have come up with a total of 97 characters. They are

1. Capital A - Z = 26
2. Small a - z = 26
3. numerics 0 - 9 = 10
4. Spl char (~`!@#$%¢&*()_-+={}¬¦|/\:;'"?><,.[] space) = 35

Here are the equivalent control cards

Code:

//SYSIN    DD *                                                   
  SORT FIELDS=COPY                                                 
  INREC FINDREP=(STARTPOS=11,ENDPOS=30,                           
   IN=(C'A',C'B',C'C',C'D',C'E',C'F',C'G',C'H',C'I',       $ A - I
       C'J',C'K',C'L',C'M',C'N',C'O',C'P',C'Q',C'R',       $ J - R
       C'S',C'T',C'U',C'V',C'W',C'X',C'Y',C'Z',            $ S - Z
                                                                   
       C'a',C'b',C'c',C'd',C'e',C'f',C'g',C'h',C'i',       $ a - i
       C'j',C'k',C'l',C'm',C'n',C'o',C'p',C'q',C'r',       $ j - r
       C's',C't',C'u',C'v',C'w',C'x',C'y',C'z',            $ s - z
                                                                   
       C'~',C'`',C'!',C'@',C'#',C'$',C'%',C'¢',C'&',       $ SPL-1
       C'*',C'(',C')',C'_',C'-',C'+',C'=',C'{',C'}',       $ SPL-2
       C'¬',C'¦',C'|',C'\',C':',C';',C'"',C'?',C'/',       $ SPL-3
       C'>',C'<',C'.',C',',C'[',C']',C' ',C'''',           $ SPL-4
                                                                   
       C'0',C'1',C'2',C'3',C'4',C'5',C'6',C'7',C'8',C'9'), $ 0 - 9
       OUT=C'X')                                                   
                                                                   
//*


IMHO it is easier to maintain/understand if they are readable instead of hex values
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Nov 17, 2010 12:01 pm
Reply with quote

Kolusu,

I thought if we use IFTHEN to check for NE C' ', there is nt a chance of any non-SPACE values getting missed. But the other way requires the knowledge of all possible non-SPACE values that might come in the input.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Nov 17, 2010 8:06 pm
Reply with quote

Hi Arun,

Looks like a trade-off is whether to specify all of the individual values or all of the data positions. . .

d
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed Nov 17, 2010 10:47 pm
Reply with quote

Arun Raj wrote:
Kolusu,

I thought if we use IFTHEN to check for NE C' ', there is nt a chance of any non-SPACE values getting missed. But the other way requires the knowledge of all possible non-SPACE values that might come in the input.


Agreed , but the overhead involved in validating each single byte using IFTHEN with a HITNEXT is not justifiable when compared to a simple translation of specified characters.
Back to top
View user's profile Send private message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 202
Location: Toronto, ON, Canada

PostPosted: Wed Dec 15, 2010 10:39 pm
Reply with quote

Skolusu wrote:
jerryte,

It seems you missed a few and also you have quite a few repetitions.(5c is repeated thrice)

I have come up with a total of 97 characters. They are

1. Capital A - Z = 26
2. Small a - z = 26
3. numerics 0 - 9 = 10
4. Spl char (~`!@#$%¢&*()_-+={}¬¦|/\:;'"?><,.[] space) = 35


Thanks for the list. I modified it to remove the space character since the objective was to mask all non-blank.
Code:

//SYSIN    DD *                                                   
  SORT FIELDS=COPY                                                 
  INREC FINDREP=(STARTPOS=11,ENDPOS=30,                           
   IN=(C'A',C'B',C'C',C'D',C'E',C'F',C'G',C'H',C'I',       $ A - I
       C'J',C'K',C'L',C'M',C'N',C'O',C'P',C'Q',C'R',       $ J - R
       C'S',C'T',C'U',C'V',C'W',C'X',C'Y',C'Z',            $ S - Z
                                                                   
       C'a',C'b',C'c',C'd',C'e',C'f',C'g',C'h',C'i',       $ a - i
       C'j',C'k',C'l',C'm',C'n',C'o',C'p',C'q',C'r',       $ j - r
       C's',C't',C'u',C'v',C'w',C'x',C'y',C'z',            $ s - z
                                                                   
       C'~',C'`',C'!',C'@',C'#',C'$',C'%',C'¢',C'&',       $ SPL-1
       C'*',C'(',C')',C'_',C'-',C'+',C'=',C'{',C'}',       $ SPL-2
       C'¬',C'¦',C'|',C'\',C':',C';',C'"',C'?',C'/',       $ SPL-3
       C'>',C'<',C'.',C',',C'[',C']',C'''',           $ SPL-4
                                                                   
       C'0',C'1',C'2',C'3',C'4',C'5',C'6',C'7',C'8',C'9'), $ 0 - 9
       OUT=C'X')                                                   
                                                                   
//
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Substring number between 2 characters... DFSORT/ICETOOL 2
No new posts Reading dataset in Python - New Line ... All Other Mainframe Topics 22
No new posts Masking variable size field - min 10 ... DFSORT/ICETOOL 4
No new posts Merge files with a key and insert a b... DFSORT/ICETOOL 6
No new posts Count the number of characters in a f... CA Products 1
Search our Forums:

Back to Top