|
View previous topic :: View next topic
|
| Author |
Message |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
Can someone please confirm that I am using the COBOL FUNCTION RANDOM (to generate random numbers) in a correct way.
For special testing, I need to copy a production file and replace the 9-digit account numbers with fake numbers that look realistic. My first approach entailed using FUNCTION RANDOM with an 'argument' as the start-up 'seed'. (And as per the documentation, I only 'planted' the seed once). Once the seed was laid, for every input record in, I then used FUNCTION RANDOM without an argument with the intent of using the generated result as my replacement account number. However, my first attempt produced un-realistic looking numbers such as '000000001', '000000033', etc. I would have expected more meaty looking numbers without all of the leading zeros.
I then stumbled across an example on the internet where they were manipulating the FUNCTION RANDOM result by multplying it by '42' and adding +1. ( Why '42' and why +1, I do not know). However, I tried doing similar manipulation experiments and after much trial and error, I came accross a formula that was generating realistic looking random numbers. My example below entailed multiplying the result by '333333333' and adding back the original-account-number that was on my current input record. These manipulating numbers were arrived at purely by trial and error. Below I will present my working solution, but I have no idea why my method is producing the desired results. So I am asking the forum readers, if this 'manipulative' way is the true prescribed way to use the FUNCTION RANDOM. Here is my working example (pseudo code used in parts of this example)
05 INPUT-RCD-CNT PIC 9(05).
05 WS-ORIGINAL-ACCT-NBR PIC 9(09).
05 WS-FAKE-ACCT-NBR PIC 9(09).
05 WS-SEED PIC 9(09)
READ INPUT-RECORD
ADD +1 TO INPUT-RCD-CNT
IF INPUT-RCD-COUNT = 1 **
MOVE WS-ORIGINAL-ACCT-NBR to WS-SEED
COMPUTE WS-FAKE-ACCT-NBR =
FUNCTION RANDOM(WS-SEED)
END-IF.
COMPUTE WS-FAKE-ACCT-NBR =
( FUNCTION RANDOM * 333333333) + WS-ORIGINAL-ACCT-NBR . |
|
| Back to top |
|
 |
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1055 Location: Richmond, Virginia
|
|
|
|
| Why do your sanitized account#'s need to look real? |
|
| Back to top |
|
 |
Bill O'Boyle
CICS Moderator

Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
If these are credit-card numbers, why not use the PCI standard and encrypt them with a private key? Then you can decrypt them as needed with the same key.
How would you decrypt them after disguising them with a RANDOM number? |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
That is a valid question.
The easy answer is that I am faced with subsequent edit routines that kick out unrealistic looking account numbers such as those with excessive leading zeros, etc.
But even if I did not have those edit routines to worry about, just in principal (did I spell that correctly?) one would expect even a pseuo random number generator to produce numbers that randomly span the allowed range, instead of the bunched up situation that I was first getting. |
|
| Back to top |
|
 |
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1055 Location: Richmond, Virginia
|
|
|
|
You want "principle."
And now maybe you can answer my question: Why do your sanitized account#'s need to look real?
That is, although the random# issue is interesting academically, what in your testing requires that the sanitized numbers "look" (i.e., to human eyes?) "real"? |
|
| Back to top |
|
 |
Bill O'Boyle
CICS Moderator

Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
If you're building a file with these disguised numbers and it's going out of house, how would the recipient be able to undisguise them upon receipt?
You could encrypt the DSN itself, keeping the numbers intact and then the recipient can decrypt the DSN without any consequence.
This is common practice.... |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
First to answer Phil: thank-you your spelling correction.
My first reply that had the spelling mistake addressed your question about why the numbers have to be realistic looking. Here was the answer:
" I am faced with subsequent edit routines that kick out unrealistic looking account numbers such as those with excessive leading zeros, etc."
In answer to Bill O'Boyle: Encrypting is a good idea.
However, the purpose of this is to not render a file that is being shipped out of house. It is simply to produce a realistic looking test file that we can run on our testing mainframe, as-is, through the existing programs - without having to decrypt something before-hand. Most developers here do not have security clearance to view the real account numbers. So we would get someone who does have the access, the create the test files for us by running through this COBOL routine. I am sure there are alternative methods out there (i.e. liliseconds, micro-seconds, etc.)
But for the time being I want to stick with the COBOL FUNCTION RANDOM to try to 'milk it for all its worth' . If this is a truly valuable function, then there must be a way to use it to produce 'spread out' random numbers; and maybe by fluke I stumbled across the way to make that happen. |
|
| Back to top |
|
 |
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1055 Location: Richmond, Virginia
|
|
|
|
| Oh - kick out = reject. Missed that. |
|
| Back to top |
|
 |
dbzTHEdinosauer
Global Moderator

Joined: 20 Oct 2006 Posts: 6965 Location: porcelain throne
|
|
|
|
I am not going to address the security concerns.
within your application (production) you have a routine to generate a new account number, yes/no.
why not seed that routine,
and generate account numbers that will pass the test!
i can think of check-digit routines and such that would negate the efforts of your random routine.
my suggestion is drop it, especially if real account numbers can not be random
multiplying the random generated number by 333333333 only forces that base number to be larger than 333333333
the adding the old account number could lead to duplicates.
the 333333333 is a constant times a variable (random number)
adding the old account number is the same as adding a variable.
the sum of 2 variables can easily equal that of the sum of 2 different variables.
now, the chances are 1 in 999,999,999/2, but wouldn't it be a bitch. |
|
| Back to top |
|
 |
Robert Sample
Global Moderator

Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
I think you missed the statement in the manual that indicates RANDOM returns a value between zero and one. My code:
| Code: |
05 WS-COMP PIC 9(09)
VALUE 99887766.
05 WS-RANDOM PIC V9(09) COMP .
/
PROCEDURE DIVISION.
S1000-MAIN SECTION.
COMPUTE WS-RANDOM = FUNCTION RANDOM (WS-COMP) .
DISPLAY '>' WS-RANDOM '<'.
COMPUTE WS-RANDOM = FUNCTION RANDOM .
DISPLAY '>' WS-RANDOM '<'.
COMPUTE WS-RANDOM = FUNCTION RANDOM .
DISPLAY '>' WS-RANDOM '<'.
COMPUTE WS-RANDOM = FUNCTION RANDOM .
DISPLAY '>' WS-RANDOM '<'.
COMPUTE WS-RANDOM = FUNCTION RANDOM .
DISPLAY '>' WS-RANDOM '<'. |
produces these results:
| Code: |
>158459707<
>566661830<
>682203486<
>820312062<
>838225258< |
Whether or not this is a valid approach for you to test with, that depends upon your site -- and you ought to get management approval before implementing this type of code in your testing. |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
Thank-you Robert !!.
I actually spotted that note about the result being between 0 and 1 (meaning it returns a long decimal number). But I somewhat forgot about it (or glossed over it) because of other examples where I read that the result has to be an INTEGER, and furthermore, other coding examples - none of which, used a PIC V9(09) pure decimal format. However, your example and displayed results proved me wrong.
And what is important is that I have to multiply the result by 1,000,000,000 in order to restore it to a 9-digit integer. This is perhaps why my earlier results were rendering numbers with excessive leading zeros. That also explains the example which I stumbled across on the internet in which they were multiplying the result by some unexplained number.
So in summary, the key to using RANDOM is that the receiving working storage field has to be defined as a pure decimal - to the number of places equal to the number of desired INTEGER digits. And, you have to multiple the received result by (1nnn...) where 'nnn...' is a string of zeros equal to the number of desired digits for the final INTEGER result.
In answer to security concerns raised by others, the names are also being randomly scrambled in an absolutely illegible way, irreversible way; and this together with the FUNCTION RANDOM for the account numbers, will lead to much better sanitized files than many of the other files out there which only sanitize part of the account number (and leave the original names intact). Having said that, the proper permissions for this method have been sought.
In answer to anyone's concern about (by fluke) generating undesired duplicates, the nature of the application being tested and the contents of these files, are such that if duplicates were generated, the consequences would be insignificant.
thanks again Robert !! |
|
| Back to top |
|
 |
dbzTHEdinosauer
Global Moderator

Joined: 20 Oct 2006 Posts: 6965 Location: porcelain throne
|
|
|
|
an SV9(09) comp field occupies the same space as an S9(09) comp field.
no need to ever multiply a random generated number to remove the decimal. |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
Hi Dick
Unless, I am doing something wrong, my various attempts of getting a PIC S9(09) COMP field to render all of the digits without resorting to the multiplication by the large number is failing. (It is showing up as all zeros).
But even if I get it to work, I am still required for my final output file to produce these account numbers in PIC 9(09) and PIC 9(09) COMP-3 format. So I would still have to move the PIC S9(09 COMP field to those final fields (i.e. perform that extra step).
At any rate, I will probably end up retaining the solution that I spelled out in the previous response; as I am very busy on many concurrent tasks around here. |
|
| Back to top |
|
 |
dick scherrer
Moderator Emeritus

Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Beats being bored. . .
d |
|
| Back to top |
|
 |
Bill O'Boyle
CICS Moderator

Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
If your compiler doesn't support COMP-5 (Native Binary), then ensure you avoid high-order truncation by using the compile option TRUNC(BIN).
Otherwise, use COMP-5, instead of COMP, where truncation is not an issue and the TRUNC option is ignored for COMP-5.
Unsigned is the better way to go as a signed binary-fullword has a maximum of 2147483647 (2**31)-1 (X'7XXXXXXX'), whereas an unsigned binary-fullword has a maximum of 4294967295 (2**32)-1 (X'FFFFFFFF').
COMP-5 was introduced with OS/390 COBOL 2.2.1. |
|
| Back to top |
|
 |
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1055 Location: Richmond, Virginia
|
|
|
|
| For your testing, could you just disable the special "looks like an acct#" check? |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
Hi Phil
Any work-around is possible; but for the time being we want the test to use the existing program code as much as possible.
But all of this is now moot. The response by Robert Sample revealed the flaw in my earlier approaches; and after adjusting for that, the RANDOM function is now producing realisitic numbers without me having to resort to the extent of the manipulation that I was earlier using. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
| Code: |
01 a-small-binary comp pic sv9(9).
01 a-big-binary redefines a-small-binary comp pic s9(9).
|
This is the point dbz is making. If you just use a-big-binary for your further processing after using a-small-binary for the function you'll have your value without a multiplication in sight.
A binary with nine digits is pretty terrible for calculations. According to a reputable source, the compiler will have to convert to a double-word, call routines to do double-word maths then convert it back to a fullword. Did you try making it a packed field? Same redefines works, no calcs needed for that either.
I'm not sure why you are doing it this way. I would be surprised if there is noone in your organisation who knows how to generate test card numbers.
The "validation" of a card number should finish quite "early" in the processing, so I don't understand the impression you give of test data being bounced all over the place. |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
thank you Bill for your response.
I should mention that even though Robert Sample's example used a COMP (binary) field (PIC V9(9) COMP) in his solution, I have tested it successfully as a non-COMP field - PIC V9(9); so I am not sure how that changes things on the overall rating with my approach. The main point about his adjustment was to use a decimal field (non INTEGER) field.
The issue about unrealistic numbers being kicked out, does take place early in the process and they are not being kicked out all over the place. I apologize if I left that impression. My point was that with my earlier ('flawed') method of using the RANDOM function (before Robert Sample corrected me), 90% of my generated numbers were unrealistic and therefore would have been kicked out. This amount of rejections is not acceptable. Now that I have the RANDOM function working correctly (albeit I am still multiplying by 1,000,000,000), practically all of the numbers are realistic, and the amount that will get kicked out is now immaterial. When time allows, I will continue to fiddle with your suggestions that will eliminate the need to have to multiply. Although, I do recall when I was googling yesterday,at least of two of the supposed working examples that people posted on the internet were all multiplying their results. Now I know why. |
|
| Back to top |
|
 |
Robert Sample
Global Moderator

Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
| From the way the manual is worded, as long as the result varaible is numeric, it can be DISPLAY, COMP, COMP-3 with no issues. I don't think floating point (COMP-1 or COMP-2) would work, though. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
It may have been me, I was reading quickly.
Please, do this instead of multiplying by 1,000,000,000.
01 A-smal-random-for-function-results PIC V9(9).
01 A-big-random-to-use-elsewhere redefines A-smal-random-for-function-results PIC 9(9). |
|
| Back to top |
|
 |
dcshnier
New User
Joined: 28 Dec 2006 Posts: 27 Location: Baltimore, MD 21215
|
|
|
|
Thanks Bill
Between the many other things I am doing, I did realize that you and others meant to employ a REDEFINE (which i was not doing).
But I finally got down to doing that and it worked !!
So the final working solution is:
05 WS-INPUT-RCD-CNT PIC 9(05).
05 WS-ORIGINAL-NBR PIC 9(09).
05 WS-FAKE-NBR-DECIMAL PIC V9(09).
05 WS-FAKE-NBR-NON-DECIMAL REDEFINES
WS-FAKE-NBR-DECIMAL PIC 9(09).
READ INPUT-RECORD
ADD +1 TO WS-INPUT-RCD-CNT
MOVE I-ORIGINAL-NBR TO WS-ORIGINAL-NBR
IF WS-INPUT-RCD-CNT = 1
COMPUTE WS-FAKE-NBR-DECIMAL
= FUNCTION RANDOM(WS-ORIGINAL-NBR)
END-IF.
COMPUTE WS-FAKE-NBR-DECIMAL
= FUNCTION RANDOM.
MOVE WS-FAKE-NBR-NON-DECIMAL TO OUTPUT-FAKE-NBR. |
|
| Back to top |
|
 |
Bill O'Boyle
CICS Moderator

Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
Bill said
| Quote: |
| A binary with nine digits is pretty terrible for calculations. According to a reputable source, the compiler will have to convert to a double-word, call routines to do double-word maths then convert it back to a fullword. Did you try making it a packed field? Same redefines works, no calcs needed for that either. |
Bill,
Wow, haven't looked at at an Assembler expansion in quite a while. Calling a run-time routine when the number of fullword digits exceeds 8?
How barbaric! |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Sorry, a bit of conflation. The subroutine use might occur with TRUNC(BIN), it is not going to use a subroutine to do the full-to-doubleword definitely if not BIN, and I've not checked if this is a time BIN would use a subroutine.
The nine digits is heavier on processing that 10-17 digits, because of the need to convert full-word to double-word, then do the maths, then covert back to fullword.
10-17 digits just does the maths, no need to convert to/from. So, 10-17 digit binary math from Cobol will mostly be faster than 9 digit. 1-4 fastest, 10-17 second, 9 third, 18 fourth.
If using 9 digits, avoid maths anyway :-)
I once "tuned" some subscripts effectively holding addresses from 8 to 9 digits. Didn't check it, if I can find an old compiler somewhere, maybe I'll do it sometime.
I squeezed everything out of the program, a "tool" of mine which was "discovered" and then used across all departments. For our small systems, it was about 3-5 seconds of CPU, but for the larger ones, 10-30. So, the tuning was for those who didn't want to admit the benefits of using it, because of having an extra three minutes on the end of the "promotion" process.
I got it down to under one second, irrespective of system size (mainly through doing things different ways). I calculated this would save a lot of time. I sent the new docs around (SCRIPT/GML/DCF, like the manuals) and highlighted the JCL change to a time limit of one CPU second, explicitly stating the only way this would be exceeded would be if it was looping eternally.
One guy ran it, 322. He thought to himself, "I'm very important, my system is very important, this took 30 seconds before, I need to change this". In mid-afternoon I noticed a job running with a familiar program name (it was called OCCULT, since general routines in our project group had to start OC and I'd already used OCTOPUS) and a squid-load of CPU against it. They guy had kept upping and re-running, till he'd got 1440 on the step and gone out to lunch :-)
The reason for the loop? It could deal with Cobol and Assembler programs. As Assembler programs can be much bigger than Cobol, I had a limit. I had asked everyone before making the change "do you have any really big Assembler programs?" "Oh, no," these particular people said, "we have some Assembler, but they're only small".
One of the "small" programs, was allocating a huge lump of storage. In fact, it wasn't really a program, it was just a means of allocating a huge lump of storage. When I asked "really big" they thought in terms of lines of code :-) My program was looping, looking at the same lump of storage for ever, just never all of it, so not finding the next program in the load module.
Of course, when I had asked everyone to "system test" the new version, those lazy lazers had just picked one of their systems, not bothered to run it on all of them. Wonderful to be so important, isn't it :-)
I thought at the time, "well, not worth changing the 8's to 9's, it'll never save the CPU time wasted today".
There is a possibility I actually slowed the thing down doing that change :-) |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|