View previous topic :: View next topic
|
Author |
Message |
vidyaa
New User
Joined: 02 May 2008 Posts: 77 Location: chennai
|
|
|
|
Hi,
I need to capture specific string based on certain conditons from a file. My file is of 10000 bytes and it can have any no of lines. It will look like
"<s> name=abc sometext name=XXX ocr=1 <kl> sometext name=iuk sometext hjlkljk name=hju name=YYY ocr=4 sometext name=text ocr=1"
i need to capture the words before the word ocr and after name= so my output will be like
XXX
YYY
text
is there a way to achieve this using rexx. |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10873 Location: italy
|
|
|
|
Quote: |
is there a way to achieve this using rexx. |
yes |
|
Back to top |
|
|
superk
Global Moderator
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
I'm thinking that if you parsed the string, delimited by spaces, into a series of stem variables, then you could use the first occurence of the string "ocr=" as an ending point, then move backwards through the other variables until you hit the closest "name=" string. Mark the start and end points, then loop through again for the next "ocr=" string.
Once you know all of the strarting and ending points, it should be relatively easy to parse out the value for the "name=" keyword. |
|
Back to top |
|
|
superk
Global Moderator
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
Actually, it's a bit easier:
where str is the single string to parse:
Code: |
v. = ''
idx = 0
Do Forever
idx = idx + 1
v.0 = idx
Parse Var str v.idx str
If Length(str) = 0 Then Leave
End
Do i = 1 To v.0
If Left(v.i,5) = 'name=' Then strt = i
If Left(v.i,4) = 'ocr=' Then
Do
Parse Var v.strt 'name=' value
Say value
End
End
|
|
|
Back to top |
|
|
vidyaa
New User
Joined: 02 May 2008 Posts: 77 Location: chennai
|
|
|
|
Thanks for your valubale suggestions ..i tried the above code this did work well if the file had single line..but am not able to loop them....i tried using
do i = 1 to Rec.0
str = rec.i
followed by the code given above...but this doesnt work for more than one line
I think it is because we have IF LENGTH(STR) = 0 THEN LEAVE
what can be the possible way to fix it. |
|
Back to top |
|
|
vidyaa
New User
Joined: 02 May 2008 Posts: 77 Location: chennai
|
|
|
|
I tried this with concatenating the whole lines into single string using
do i = 1 to rec.0
str=str||rec.0
end
am not sure if this is the best option.
Also when i parse the string
Name="xxx yyy" ocr=fghghg
using
PARSE VAR STR 'NAME=' VAL
i get the output as XXX but i need XXX YYY
what is the way to do this |
|
Back to top |
|
|
vidyaa
New User
Joined: 02 May 2008 Posts: 77 Location: chennai
|
|
|
|
how to say delimited by some other varibale say '"' |
|
Back to top |
|
|
superk
Global Moderator
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
You know, I was going to ask if there would be any reason to consider imbedded blanks or other characters inside of your target strings, but I honestly thought you would've mentioned that.
One thing you can do to assist with parsing is to translate imbedded blanks into some pre-determined character, something that you know the actual data will never contain. I usually go with an ampersand (@) or tilde (~). Then your parsing remains simple, and at the end all you have to do is translate that character back into a blank.
Anyway, at this point, I'd like to see a few records of the actual REAL data the way it's really formatted. Maybe there's a better way, maybe not. |
|
Back to top |
|
|
Pedro
Global Moderator
Joined: 01 Sep 2006 Posts: 2547 Location: Silicon Valley
|
|
|
|
The example seems to have some tags... it is not clear if there will be more such tags. If your logical records are delimited by start and ending tags, you should parse for them in an outer loop before parsing for the name and ocr fields.
And when you parse for the name and ocr, I suggest to use the PARSE instruction:
Code: |
xmlstring = 'some stuff'
Do until (xmlstring = '')
parse var xmlstring 'name=' my_text 'ocr=' xmlstring
/*process text */
End |
|
|
Back to top |
|
|
vidyaa
New User
Joined: 02 May 2008 Posts: 77 Location: chennai
|
|
|
|
The input file will be like this
NAME="STCUSTUDAPCASHSALEDBTA">20.00</F></S><S NAME="DETAIL, DESC SECTION" OCR="1">
<F NAME="STFTRITEMAMT">.00</F><F NAME="STHDRGROUPCD"
</F><S NAME="IRCS DETAIL RECORD" OCR="1"><F NAME="STDETLITEMAMT">
basically i would like to take the string in the "NAME" field that will come before
the "OCR". So my output looks like
DETAIL, DESC SECTION
IRCS DETAIL RECORD |
|
Back to top |
|
|
Pedro
Global Moderator
Joined: 01 Sep 2006 Posts: 2547 Location: Silicon Valley
|
|
|
|
I think you should parse at record boundaries first, then parse for your field delimiters.
Note that the end of the string is a double quote, rather than 'OCR', per your example.
Code: |
PARSE VAR xmlstring . '<S' one_record '>' xmlstring
PARSE VAR one_record . 'NAME="' target_string '"' . |
|
|
Back to top |
|
|
|