IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Clean up DB2 string from unreadable characters


IBM Mainframe Forums -> DB2
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Wed Sep 09, 2009 4:50 pm
Reply with quote

How to clean up DB2 string from unreadable characters ?

That's easy and usable. You have to use function TRANSLATE to do it.

You have to remove all characters having hex code less than X'40' and X'FF'.

Code:
Select
REPLACE(TRANSLATE(some_str, ' ', X'0001020304050607...3839FF'), ' ','')
as some_result
from some table


You can save X'0001020304050607...3839FF' somewhere and use by name.

Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Wed Sep 09, 2009 5:07 pm
Reply with quote

Or you can use following:
Code:
REPLACE(TRANSLATE(some_str, X'41', X'0001...3839FF'), X'41', '')
Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Wed Sep 09, 2009 5:48 pm
Reply with quote

Or you can use following:

Code:
REPLACE(TRANSLATE(some_str, X'FF', X'0001...3839'), X'FF', '')


All depends on your fantasy....

Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Wed Sep 09, 2009 5:57 pm
Reply with quote

That was the common solution.
If you know what the unreadable characters consists your string (for example X'05', X'0D', X'14') you can use shorter statement, like:

Code:
REPLACE(TRANSLATE(some_str, X'FF', X'050D14'), X'FF', '')


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Thu Sep 10, 2009 5:55 am
Reply with quote

This is the very simple query which I created to myself to remove the unreadable characters from table column, or any text, using DB2.

Unreadable characters how you know from X'00' to X'3F' and X'FF'.

You can create an UDF based on this query:

Code:
select
text "Original Text",
hex(text) hx_org_text,
translate(text, chars_translate_to, chars_to_be_remove       ) trt_text,
hex(translate(text, chars_translate_to, chars_to_be_remove ) ) hx_trt_text,
replace
(translate(text, chars_translate_to, chars_to_be_remove),
    substr(chars_translate_to, 1, 1), '') "Cleared text"   

from
(select
   'This' || x'0024' || ' text' || x'213733' || ' is unreadable' || x'383912'  as text
   from sysibm.sysdummy1 ) tx

join

(select
x'000102030405060708090a0b0c0d0e0f' ||
x'101112131415161718191a1b1c1d1e1f' ||
x'202122232425262728292a2b2c2d2e2f' ||
x'303132333435363738393a3b3c3d3e3f' as chars_to_be_remove,
repeat(x'FF', 80) chars_translate_to
from sysibm.sysdummy1 ) gt
on 1 = 1


Lenny
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Thu Sep 10, 2009 7:19 pm
Reply with quote

Hello,

Quote:
Unreadable characters how you know from X'00' to X'3F' and X'FF'.
There are many more than these. . . .

This is neither a UNIX nor Windows (ascii) environment and the character set is not sensitive to values "below spaces". For example x'59' and x'DA' are fairly useless as screen/report values. . .

I believe the time spent trying to fix junk values in database coumns would be better spent preventing them in the first place. . .
Back to top
View user's profile Send private message
Terry Heinze

JCL Moderator


Joined: 14 Jul 2008
Posts: 1249
Location: Richfield, MN, USA

PostPosted: Thu Sep 10, 2009 9:54 pm
Reply with quote

dick scherrer wrote:
I believe the time spent trying to fix junk values in databast coumns would be better spent preventing them in the first place. . .
Can I hear an "Amen" to that? icon_smile.gif
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Fri Sep 11, 2009 8:27 am
Reply with quote

dick scherrer wrote:
Hello,

Quote:
Unreadable characters how you know from X'00' to X'3F' and X'FF'.
There are many more than these. . . .

This is neither a UNIX nor Windows (ascii) environment and the character set is not sensitive to values "below spaces". For example x'59' and x'DA' are fairly useless as screen/report values. . .

I believe the time spent trying to fix junk values in database coumns would be better spent preventing them in the first place. . .


You are right as usual. icon_exclaim.gif icon_exclaim.gif icon_exclaim.gif

You can add all unreadable characters known to you to this string:
X'000102...3F' where doesn't matter what order.

They'll will convert all of them to X'FF' by Translate function , then remove by Replace function.
How I shown before.

You can't prevent all mistakes, but you have to be ready to fix them. icon_rolleyes.gif icon_surprised.gif icon_eek.gif

Thanks, Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Fri Sep 11, 2009 8:29 am
Reply with quote

Terry Heinze wrote:
dick scherrer wrote:
I believe the time spent trying to fix junk values in databast coumns would be better spent preventing them in the first place. . .
Can I hear an "Amen" to that? icon_smile.gif

Amen !
If you want....

Lenny
Back to top
View user's profile Send private message
GuyC

Senior Member


Joined: 11 Aug 2009
Posts: 1281
Location: Belgium

PostPosted: Fri Sep 11, 2009 12:45 pm
Reply with quote

Try this with a unicode table and you're in trouble.
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Fri Sep 11, 2009 4:11 pm
Reply with quote

GuyC wrote:
Try this with a unicode table and you're in trouble.

I made it for Mainframe (code EBCDIC).

For UNICODE has to be something different.

Thanks
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 13, 2009 1:13 am
Reply with quote

Common to this problem is the problem of selecting rows with only numeric data in some column of the table, which consists the mixed data.

For example in table1 we have column1 Varchar(100):

Column1
Quote:
'abc122'
'cdef6789'
'123654'
' 234 456'
'345.67'
'345...67'
'-787.001'
'567+675.44'
'777.888-'
'+987234570911'


So, we have to create SQL query, which will select
Column1
Quote:
'123654'
'345.67'
'-787.001'
'777.888-'
'+987234570911'

only....

How we can understand we have to select string with
'0123456789+-.', also in string could one and only one '-', '+' and '.'.

In practice is not so seldom problem, how I know.

Quote:

step1: Remove space from the string. String has not to be change
step2: Transate all characters in string '+-.123456' to space.
step3: Remove '-', '+', '.' from the string. For each remove length of the string has to become 1 character less
step4. Find position of '+' and '-'. Possition has to be first or last in the string
step5. '+' and '-' has not to be in the same string


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 13, 2009 1:51 am
Reply with quote

Something like this. But it's not complete. I forget something.
Maybe you know what I forgot ?

Code:
select column1
from
(select strip( column1) from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-')  in (0, 1, length(column1) )
and
posstr(column1, '+')  in (0, 1, length(column1) )


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 13, 2009 2:50 am
Reply with quote

The complete statement:

Code:
select column1
from
(select strip( column1) from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-')  in (0, 1, length(column1) )
and
posstr(column1, '+')  in (0, 1, length(column1) )
and not exists
(select * from sysibm.sysdummy1
  where posstr(column1, '-') > 0
      and
            posstr(column1, '+') > 0)


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 13, 2009 9:35 am
Reply with quote

I have to make small but important correction to the code:

Quote:
(select strip(column1) column1 from table1) ii




Code:
select column1
from
(select strip(column1) column1 from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-')  in (0, 1, length(column1) )
and
posstr(column1, '+')  in (0, 1, length(column1) )
and not exists
(select * from sysibm.sysdummy1
  where posstr(column1, '-') > 0
      and
            posstr(column1, '+') > 0)


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 13, 2009 6:52 pm
Reply with quote

Some of my friends let me know about
Quote:
'777.888-'

which is not valid in DB2.
I corrected the query in 2nd time,

Code:
select column1
from
(select strip(column1) column1 from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-')  in (0, 1)
and
posstr(column1, '+')  in (0, 1)
and not exists
(select * from sysibm.sysdummy1
  where
posstr(column1, '-') > 0
and
posstr(column1, '+') > 0)


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Mon Sep 14, 2009 3:09 am
Reply with quote

Now it looks good, but not a perfect !

I suppose to change query:

1. string has to have at least one digit
2. I want accept strings: '777.888-', '- 12345', '+ 456.98'
3. Inner table has to bring as only decimal likes stings

Code:
select column1
from
(select   case when substr(strip(column1), length(strip( column1)), 1)
                                        in ('+', '-')
                    then
                    substr(strip(column1), length(strip( column1)), 1) 
                    ||
                    substr(strip(column1), 1, length(strip( column1)) - 1)
                    when substr(strip( column1), 1, 1) in ('+', '-')
                    then
                    substr(strip( column1), 1, 1)
                    || 
                    strip(substr(strip(column1), 2))
                    else    strip( column1) 
                    end AS column1
   from table1
   where Translate(column1, ' ', '0123456789+-.') = ' '
     and  column1 > Translate(column1, ' ', '0123456789')
) ii
where
replace( column1, ' ', '') = column1 
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-')  in (0, 1)
and
posstr(column1, '+')  in (0, 1)
and not exists
(select * from sysibm.sysdummy1
  where
posstr(column1, '-') > 0
and
posstr(column1, '+') > 0)


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Fri Sep 25, 2009 7:18 am
Reply with quote

There are no functions exist to do it.

But you can create your own UDF, using this code:

Code:
with
Source(source_str) as
(select 'You       are   the    best DB2      programmers in       the     world !!!!'
from sysibm.sysdummy1
)
,
String_cleaner(result_string) as
(
select strip(source_str) from Source
union All
select replace(result_string, '  ', ' ') from String_cleaner 
where Length(replace(result_string, '  ', ' ')) < length(result_string)
)
,
Final_operation(result_string, source) as ( select result_string, source_str
  from String_cleaner, Source
 where length(result_string)
         = (select min(length(result_string)) from String_cleaner)
)
select source "Source String", result_string "Result String"
  from Final_operation


Result of run will be:

Quote:
You are the best DB2 programmers in the world !!!!


Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Fri Sep 25, 2009 7:46 am
Reply with quote

We can remove extra spaces in more tricky way:

Code:
with
Source(source_str) as
(select 'You     are   the    best DB2      programmers in       the     world !!!! '
from sysibm.sysdummy1
)
,
string_posn(posn, charposn, iterno) as
(
select int(1), substr(source_str, 1, 1), length(source_str) 
       from Source
union all
select posn + 1, substr(source_str, posn + 1, 1), iterno   
  from string_posn, Source
where posn + 1 <= length(source_str)
)
,
Final_operation(result_string, lastchar, K) as
(select varchar(charposn, 1000), charposn, 1
   from string_posn where posn = 1
union all
select result_string  ||
          case
          when lastchar = ' ' and charposn = ' '
          then ''
          else charposn 
          end, charposn, K + 1
from Final_operation, string_posn
where posn   = K + 1
   and K + 1 <= iterno
)
,
Get_result_string(source_str, result_string) as
( select source_str, result_string
from Final_operation, Source
where K = (select max(iterno) from string_posn)
)
select source_str, result_string
from Get_result_string


Result will be the same.

Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Sun Sep 27, 2009 2:45 am
Reply with quote

I don't know easy way to find out where position of the first digit in a string.

Maybe you know ? Let me know.

Lenny
Back to top
View user's profile Send private message
lkhiger

New User


Joined: 28 Oct 2005
Posts: 89

PostPosted: Tue Sep 29, 2009 7:41 am
Reply with quote

Another way to cleanup string from unreadable characters.

For example we have to remove from the string all characters with hexadecimal code less then X'40' (space).

We can do this in following way:

Code:
With
Source (source_str) as
(select 'Wel' || X'00070d33' || 'come ' || X'33343f' || 'to DB2 !'
from sysibm.sysdummy1
)
,
Clean_up(source_str, result_str, posn, maxposn) as
(select source_str, varchar('', length(source_str)), 0, length(source_str)
from  Source
Union All
select source_str,
result_str || case When substr(source_str, posn + 1, 1) < X'40' Then ''
                          Else substr(source_str, posn + 1, 1) End,
posn + 1,  maxposn
From Clean_up Where posn + 1 <= maxposn
)
select source_str as "Source string", result_str "Result String"
From Clean_up
Where posn = maxposn
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DB2

 


Similar Topics
Topic Forum Replies
No new posts Replace each space in cobol string wi... COBOL Programming 2
No new posts PARSE Syntax for not fix length word ... JCL & VSAM 7
No new posts Sortjoin and Search for a String and ... DFSORT/ICETOOL 1
No new posts Substring number between 2 characters... DFSORT/ICETOOL 2
No new posts Reading dataset in Python - New Line ... All Other Mainframe Topics 22
Search our Forums:

Back to Top