I have a requirement to write a user exit (in Optim) to scramble data. The options are either COBOL or HLASM.
The target data varies from char 20 though to varchar 4000.
The current solution (using LUA running under Optim) is not viable long term, and not just because of the inefficiently of the interpretive environment.
For a simple example, here are my requirements for a generic "text" column scramble function:
Upper alpha => upper alpha
Lower alpha => lower alpha
Number => number
Other => (not scrambled)
In my solution (below), the pseudo-random element is driven by a hash on record instance id and the contents of the db2 column.
so, with input of (iid = 12345, inText = "Jonathan 1234567 .;#') , I would calculate a hash as: (iid + hash(inText)) mod 256 which becomes a pointer to an array of random bytes.
Looking at what I've coded, it strikes me - and hence my reason for dropping COBOL into this forum - that HLASM could offer a more elegant and efficient solution.
Does anyone have any suggestions or code snippets that could point me in the right direction. Potential for use of SIMD instructions or is that overkill?
here's my first cut solution in COBOL - branchless style. non-portable, obviously.
Code:
IDENTIFICATION DIVISION.
PROGRAM-ID. SCRAM.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-STATIC.
05 WS-HASH-TBL.
10 FILLER PIC X(16) VALUE
X'00000000000000010000000200000003'.
10 FILLER PIC X(16) VALUE
X'00000004000000050000000600000007'.
10 FILLER PIC X(16) VALUE
X'00000008000000090000000A0000000B'.
10 FILLER PIC X(16) VALUE
X'0000000C0000000D0000000E0000000F'.
10 FILLER PIC X(16) VALUE
X'00000010000000110000001200000013'.
10 FILLER PIC X(16) VALUE
X'00000014000000150000001600000017'.
10 FILLER PIC X(16) VALUE
X'00000018000000190000001A0000001B'.
10 FILLER PIC X(16) VALUE
X'0000001C0000001D0000001E0000001F'.
10 FILLER PIC X(16) VALUE
X'00000020000000210000002200000023'.
10 FILLER PIC X(16) VALUE
X'00000024000000250000002600000027'.
10 FILLER PIC X(16) VALUE
X'00000028000000290000002A0000002B'.
10 FILLER PIC X(16) VALUE
X'0000002C0000002D0000002E0000002F'.
10 FILLER PIC X(16) VALUE
X'00000030000000310000003200000033'.
10 FILLER PIC X(16) VALUE
X'00000034000000350000003600000037'.
10 FILLER PIC X(16) VALUE
X'00000038000000390000003A0000003B'.
10 FILLER PIC X(16) VALUE
X'0000003C0000003D0000003E0000003F'.
10 FILLER PIC X(16) VALUE
X'00000040000000410000004200000043'.
10 FILLER PIC X(16) VALUE
X'00000044000000450000004600000047'.
10 FILLER PIC X(16) VALUE
X'00000048000000490000004A0000004B'.
10 FILLER PIC X(16) VALUE
X'0000004C0000004D0000004E0000004F'.
10 FILLER PIC X(16) VALUE
X'00000050000000510000005200000053'.
10 FILLER PIC X(16) VALUE
X'00000054000000550000005600000057'.
10 FILLER PIC X(16) VALUE
X'00000058000000590000005A0000005B'.
10 FILLER PIC X(16) VALUE
X'0000005C0000005D0000005E0000005F'.
10 FILLER PIC X(16) VALUE
X'00000060000000610000006200000063'.
10 FILLER PIC X(16) VALUE
X'00000064000000650000006600000067'.
10 FILLER PIC X(16) VALUE
X'00000068000000690000006A0000006B'.
10 FILLER PIC X(16) VALUE
X'0000006C0000006D0000006E0000006F'.
10 FILLER PIC X(16) VALUE
X'00000070000000710000007200000073'.
10 FILLER PIC X(16) VALUE
X'00000074000000750000007600000077'.
10 FILLER PIC X(16) VALUE
X'00000078000000790000007A0000007B'.
10 FILLER PIC X(16) VALUE
X'0000007C0000007D0000007E0000007F'.
10 FILLER PIC X(16) VALUE
X'00000080000100810001008200010091'.
10 FILLER PIC X(16) VALUE
X'000100910001009200010092000100A2'.
10 FILLER PIC X(16) VALUE
X'000100A2000100A20000008A0000008B'.
10 FILLER PIC X(16) VALUE
X'0000008C0000008D0000008E0000008F'.
10 FILLER PIC X(16) VALUE
X'00000090000100910001009200010081'.
10 FILLER PIC X(16) VALUE
X'000100810001008200010082000100A2'.
10 FILLER PIC X(16) VALUE
X'000100A2000100A20000009A0000009B'.
10 FILLER PIC X(16) VALUE
X'0000009C0000009D0000009E0000009F'.
10 FILLER PIC X(16) VALUE
X'000000A0000000A1000100A2000100A2'.
10 FILLER PIC X(16) VALUE
X'00010081000100810001008200010082'.
10 FILLER PIC X(16) VALUE
X'000100910001009100010092000000AB'.
10 FILLER PIC X(16) VALUE
X'000000AC000000AD000000AE000000AF'.
10 FILLER PIC X(16) VALUE
X'000000B0000000B1000000B2000000B3'.
10 FILLER PIC X(16) VALUE
X'000000B4000000B5000000B6000000B7'.
10 FILLER PIC X(16) VALUE
X'000000B8000000B9000000BA000000BB'.
10 FILLER PIC X(16) VALUE
X'000000BC000000BD000000BE000000BF'.
10 FILLER PIC X(16) VALUE
X'000000C0000000C1000000C2000000C3'.
10 FILLER PIC X(16) VALUE
X'000000C4000000C5000000C6000000C7'.
10 FILLER PIC X(16) VALUE
X'000000C8000000C9000000CA000000CB'.
10 FILLER PIC X(16) VALUE
X'000000CC000000CD000000CE000000CF'.
10 FILLER PIC X(16) VALUE
X'000000D0000000D1000000D2000000D3'.
10 FILLER PIC X(16) VALUE
X'000000D4000000D5000000D6000000D7'.
10 FILLER PIC X(16) VALUE
X'000000D8000000D9000000DA000000DB'.
10 FILLER PIC X(16) VALUE
X'000000DC000000DD000000DE000000DF'.
10 FILLER PIC X(16) VALUE
X'000000E0000000E1000000E2000000E3'.
10 FILLER PIC X(16) VALUE
X'000000E4000000E5000000E6000000E7'.
10 FILLER PIC X(16) VALUE
X'000000E8000000E9000000EA000000EB'.
10 FILLER PIC X(16) VALUE
X'000000EC000000ED000000EE000000EF'.
10 FILLER PIC X(16) VALUE
X'000100F0000100F1000100F2000100F0'.
10 FILLER PIC X(16) VALUE
X'000100F1000100F2000100F0000100F1'.
10 FILLER PIC X(16) VALUE
X'000100F2000100F0000000FA000000FB'.
10 FILLER PIC X(16) VALUE
X'000000FC000000FD000000FE000000FF'.
05 WS-HASH-TBL-X REDEFINES WS-HASH-TBL OCCURS 256 TIMES
PIC 9(05) COMP-5.
05 WS-RANDOM-TBL.
10 FILLER PIC X(16) VALUE
X'009B008500B6002100E700110088008B'.
10 FILLER PIC X(16) VALUE
X'006600EC000600D9006F009400F1004E'.
10 FILLER PIC X(16) VALUE
X'008A003A002300F90015009300F5001F'.
10 FILLER PIC X(16) VALUE
X'005B00480040003800E900EB00940053'.
10 FILLER PIC X(16) VALUE
X'0076004400F700F800AC0099008200EE'.
10 FILLER PIC X(16) VALUE
X'00A700F0005C001D005000A30057001F'.
10 FILLER PIC X(16) VALUE
X'00870022005500A900FC00DA004D008E'.
10 FILLER PIC X(16) VALUE
X'00CB006000A700E9008A005600A500AB'.
10 FILLER PIC X(16) VALUE
X'00DE008100BD00D5001B005B00FA0023'.
10 FILLER PIC X(16) VALUE
X'0045002F002600830093003E00EA00B1'.
10 FILLER PIC X(16) VALUE
X'00B600B50071001B008D0084000F007C'.
10 FILLER PIC X(16) VALUE
X'0073001600270054004B0087007D001D'.
10 FILLER PIC X(16) VALUE
X'00DA001800CE008700D400A7004E0096'.
10 FILLER PIC X(16) VALUE
X'0080004A00090088006B006B00FC00A6'.
10 FILLER PIC X(16) VALUE
X'00CF00FC0028008A00CB008B00630005'.
10 FILLER PIC X(16) VALUE
X'00E2009200F8003E00AD007A0053009C'.
10 FILLER PIC X(16) VALUE
X'00CF0081008A000F00E0000800F800D0'.
10 FILLER PIC X(16) VALUE
X'005600B70020000F000900D3000A0045'.
10 FILLER PIC X(16) VALUE
X'005C00D100E300D9005500F00027004E'.
10 FILLER PIC X(16) VALUE
X'009D004D006400F6009700AF009100C4'.
10 FILLER PIC X(16) VALUE
X'00D800D700AA001100C800C4007A00CD'.
10 FILLER PIC X(16) VALUE
X'008900D7003900740073005600030008'.
10 FILLER PIC X(16) VALUE
X'007C007B00DE00A00083009200D80082'.
10 FILLER PIC X(16) VALUE
X'00AC006F00E600A4006E0059002F0004'.
10 FILLER PIC X(16) VALUE
X'004C009A00CA00DD00E30003004C0003'.
10 FILLER PIC X(16) VALUE
X'00D400CA009D00EC00A400D2006700F8'.
10 FILLER PIC X(16) VALUE
X'003A00B100B0007700F000ED00930060'.
10 FILLER PIC X(16) VALUE
X'00AB002400DE006D002C00A900B30067'.
10 FILLER PIC X(16) VALUE
X'003500EA00AA005B001800E700E60055'.
10 FILLER PIC X(16) VALUE
X'00530092001200F3001C006F00FD00DB'.
10 FILLER PIC X(16) VALUE
X'0036000500C0004C00A600AC001900A3'.
10 FILLER PIC X(16) VALUE
X'0028003A00770072008A007D00510099'.
05 WS-RANDOM-TBL-X REDEFINES WS-RANDOM-TBL OCCURS 256 TIMES
PIC 9(03) COMP-5.
01 WS-WORKING COMP-5.
05 WS-IX PIC 9(03).
05 WS-IN-FNAME-LEN PIC 9(03).
05 WS-IN-FNAME-HASH PIC 9(03).
05 WS-IID-COL-HASH PIC 9(03).
05 WS-BYTE-TO-MAP PIC 9(03).
05 WS-BYTE-MAPPED PIC 9(03).
05 WS-RAND-0-THRU-7 PIC 9(01).
05 WS-REMAP-FW PIC 9(05).
05 WS-REMAP-HL REDEFINES WS-REMAP-FW.
10 WS-REMAP-H PIC 9(03).
10 WS-REMAP-L PIC 9(03).
01 WS-IMPORT.
05 WS-IID PIC 9(16).
05 WS-IN-FNAME PIC X(20).
01 WS-EXPORT.
05 WS-OUT-FNAME PIC X(20).
PROCEDURE DIVISION.
INITIALIZE WS-EXPORT, WS-WORKING.
MOVE 'Jonathan01#####77777' TO WS-IN-FNAME.
MOVE 20 TO WS-IN-FNAME-LEN.
MOVE 12345 TO WS-IID.
* > Calculate a hash based on Record Identifier X Input String
PERFORM VARYING WS-IX FROM 1 BY 1 UNTIL WS-IX >
WS-IN-FNAME-LEN
COMPUTE WS-IN-FNAME-HASH =
FUNCTION MOD(WS-IN-FNAME-HASH + FUNCTION
ORD(WS-IN-FNAME(WS-IX:1)), 256)
END-PERFORM
COMPUTE WS-IID-COL-HASH = 1 + FUNCTION
MOD(WS-IID + WS-IN-FNAME-HASH, 256)
* > Randomise A-Z,a-z and 0-9 as per WS-HASH-TBL x WS-RANDOM-TBL
MOVE 0 TO WS-IX
PERFORM UNTIL WS-IX = WS-IN-FNAME-LEN
COMPUTE WS-IX = WS-IX + 1
COMPUTE WS-BYTE-TO-MAP = FUNCTION
ORD(WS-IN-FNAME(WS-IX:1))
MOVE WS-HASH-TBL-X(WS-BYTE-TO-MAP) TO WS-REMAP-FW
COMPUTE WS-RAND-0-THRU-7 = FUNCTION
MOD(WS-RANDOM-TBL-X(WS-IID-COL-HASH), 8)
COMPUTE WS-BYTE-MAPPED = WS-REMAP-L + WS-REMAP-H *
WS-RAND-0-THRU-7
MOVE FUNCTION CHAR(WS-BYTE-MAPPED + 1) TO
WS-OUT-FNAME(WS-IX:1)
COMPUTE WS-IID-COL-HASH = 1 + FUNCTION
MOD(WS-IID-COL-HASH + 1, 256)
END-PERFORM
STOP RUN.
When using HLASM, there are useful instructions to make characters replacement in the whole string as defined via a translate table:
TR STRING,TRANTAB
(Translate) - replaces characters in a string up to 256 bytes maximum. In case of a longer string the TR instruction needs to be used in a loop.
TRE RSTRING,RTRANTAB
(Translate Extended) - replaces characters in a string up to 2GB maximum.
Before using any of those instruction, you may need to modify a 256-bytes translation table of characters based on your own hash requirements.
This is a general approach which can be used. Coding a real example with hashing etc. may take some time.
I did think of using INSPECT..REPLACING to scramble the data, but then I'd be forced to reduce the target range to make it secure from a statistical attack, or possibly repeat that command with different parameters ...
[Much of the data is long varchar. If a repeated token like 'xyz' appears, one might guess a source value of 'the' and be well on the way to decoding the whole string]
So, yes, I could map [A-Z] => [A-M] and that would make it stronger. But then I have the problem that two distinct tokens (words) are more likely to map into a common value. I want a kind of isomorphism in my mapping function, such that (1\) the 'distinct' property is preserved, and (2\) token (word) lengths are preserved.
my solution above does not actually enforce (1\), but does a good enough job.
I should have abstracted the problem into a more general requirement. What I'm really after is a way to parallelize a set of linear functions, for example, with:
I should have abstracted the problem into a more general requirement. What I'm really after is a way to parallelize a set of linear functions, for example, with:
Joined: 01 Sep 2006 Posts: 2593 Location: Silicon Valley
Code:
R(i) = L(i) + [H(i) * X(i)]
If that is what you want to do, it seems like you just need to set up your three arrays, then load individual bytes into registers and finally to perform the required math. And do the same for every byte of your string.
Try Load Logical Character instruction to load a byte into a register:
I haven't been able to find the latest version of the POP : SA22-7832-13.
Looking though the version I have (dated Dec 2000) I couldn't find any vector operations that would enable the kind of internal parallelism that I'm looking for.
For the record, I get good performance from my existing solution, so this is perhaps something of an academic exercise.
If someone can drop a working link to SA22-7832-13 I would be very grateful !
Usually I am an opponent to any rocket-science-like methods, unless it is really needed (very rarely that happens).
Though I did not receive an answer to any of my questions, I would do it in a much more simple way.
1. Prepare several scrambled lists of possible characters translation rules.
Do it for the first group (e.g. uppercase letters)
Code:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
- split randomly into 3 parts
ABCDEF GHIJKLM NOPQRSTUVWXYZ
- reorder parts in reverse order
NOPQRSTUVWXYZ GHIJKLM ABCDEF
- concatenate 3 parts again
NOPQRSTUVWXYZGHIJKLMABCDEF - repeat as many times as needed
WXYZABCDEFJKLMNOPQRSTUVGHI
UVGHIKLMNOPQRSTWXYZABCDEFJ
ZABCDEFJNOPQRSTWXYUVGHIKLM
IKLMEFJNOPQRSTWXYUVGHZABCD
. . . . . . . etc . . . . . . .
Do the same for another group (e.g. lowercase letters)
2. Create translation tables depending on the used tool.
Based on the used translation mechanism, those tables can be pairs of matching character strings (such as used by REXX TRANSLATE() function), or several 256-bytes translation tables (such as used by Assembler TR/TRE instructions), or whatever else.
3. Select limited portions of input text to be scrambled (with the length from 1 character to, let’s say, 100 characters; optionally varying random size of each portion can be used).
The shorter is each part of the text, the better it is protected against the statistical breaking methods. Ideally it can be 1 character each, though it takes more resources to scramble large text. Randomly varying size of text part should also work fine, and faster than 1-character parts.
4. Translate each portion of text using a particular translation table. For every following portion use the next one of created translation tables, in a loop.
One part of text is scrambled using one translation table. For each following part of text just switch to the next translation table of the once prepared group of tables.
here's a slightly better solution to what I first posted. 'Better' in the sense that it illustrates where vector operations could be exploited, not in the sense of being particularly good COBOL.
When I get some time later this week I'll get some performance stats, and then compare it with another version that replaces OBF-PARA with a call to an assembler module.
MOVE 1234567 TO WS-IN-IID.
MOVE 'Jonathan123 &*.1abcdeABCDE 1234567$%^' TO WS-IN-TEXT.
MOVE 37 TO WS-IN-TEXT-LEN, WS-OUT-TEXT-LEN.
DISPLAY WS-IN-TEXT.
INSPECT WS-IN-TEXT CONVERTING
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
- "0123456789"
TO
"AAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaa
- "0000000000"
PERFORM VARYING WS-IN-TEXT-IX FROM 1 BY 1 UNTIL
WS-IN-TEXT-IX > WS-IN-TEXT-LEN
COMPUTE WS-IN-TEXT-HASH =
FUNCTION MOD(WS-IN-TEXT-HASH +
FUNCTION ORD(WS-IN-TEXT(WS-IN-TEXT-IX:1)), 256)
END-PERFORM
COMPUTE WS-1-THRU-256 =
FUNCTION MOD (WS-IN-IID+WS-IN-TEXT-HASH, 256) + 1
PERFORM OBF-PARA VARYING WS-IN-TEXT-IX FROM 1 BY 16
UNTIL WS-IN-TEXT-IX > WS-IN-TEXT-LEN
DISPLAY WS-OUT-TEXT.
STOP RUN.
OBF-PARA.
PERFORM VARYING WS-IX FROM 0 BY 1 UNTIL WS-IX > 15
EVALUATE WS-IN-TEXT(WS-IN-TEXT-IX + WS-IX:1)
WHEN 'A'
WHEN 'a'
MOVE FUNCTION CHAR(
FUNCTION ORD(WS-IN-TEXT(WS-IN-TEXT-IX+WS-IX:1)) +
FUNCTION ORD(WS-ALPHA-OFFSET-X(WS-1-THRU-256)) - 1
) TO WS-OUT-TEXT(WS-IN-TEXT-IX+WS-IX:1)
WHEN '0'
MOVE FUNCTION CHAR(
FUNCTION ORD(WS-IN-TEXT(WS-IN-TEXT-IX+WS-IX:1)) +
FUNCTION ORD(WS-NUM-OFFSET-X(WS-1-THRU-256)) - 1
) TO WS-OUT-TEXT(WS-IN-TEXT-IX+WS-IX:1)
WHEN OTHER
MOVE WS-IN-TEXT(WS-IN-TEXT-IX+WS-IX:1) TO
WS-OUT-TEXT(WS-IN-TEXT-IX+WS-IX:1)
END-EVALUATE
the assembler module (replacing OBF-PARA) could do something like (very crudely):
load V0 with 16 bytes (from source string)
load V1 with 16 bytes - random nums from WS-ALPHA-OFFSET-X ( ptr)
load V2 with 16 bytes - random nums from WS-NUM-OFFSET-X (ptr)
set V3 (bytes), so that value 1 => char is A-Z;a-z
set V4 (bytes), so that value 1 => char is 0-9