|
View previous topic :: View next topic
|
| Author |
Message |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
How to clean up DB2 string from unreadable characters ?
That's easy and usable. You have to use function TRANSLATE to do it.
You have to remove all characters having hex code less than X'40' and X'FF'.
| Code: |
Select
REPLACE(TRANSLATE(some_str, ' ', X'0001020304050607...3839FF'), ' ','')
as some_result
from some table |
You can save X'0001020304050607...3839FF' somewhere and use by name.
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Or you can use following:
| Code: |
| REPLACE(TRANSLATE(some_str, X'41', X'0001...3839FF'), X'41', '') |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Or you can use following:
| Code: |
| REPLACE(TRANSLATE(some_str, X'FF', X'0001...3839'), X'FF', '') |
All depends on your fantasy....
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
That was the common solution.
If you know what the unreadable characters consists your string (for example X'05', X'0D', X'14') you can use shorter statement, like:
| Code: |
| REPLACE(TRANSLATE(some_str, X'FF', X'050D14'), X'FF', '') |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
This is the very simple query which I created to myself to remove the unreadable characters from table column, or any text, using DB2.
Unreadable characters how you know from X'00' to X'3F' and X'FF'.
You can create an UDF based on this query:
| Code: |
select
text "Original Text",
hex(text) hx_org_text,
translate(text, chars_translate_to, chars_to_be_remove ) trt_text,
hex(translate(text, chars_translate_to, chars_to_be_remove ) ) hx_trt_text,
replace
(translate(text, chars_translate_to, chars_to_be_remove),
substr(chars_translate_to, 1, 1), '') "Cleared text"
from
(select
'This' || x'0024' || ' text' || x'213733' || ' is unreadable' || x'383912' as text
from sysibm.sysdummy1 ) tx
join
(select
x'000102030405060708090a0b0c0d0e0f' ||
x'101112131415161718191a1b1c1d1e1f' ||
x'202122232425262728292a2b2c2d2e2f' ||
x'303132333435363738393a3b3c3d3e3f' as chars_to_be_remove,
repeat(x'FF', 80) chars_translate_to
from sysibm.sysdummy1 ) gt
on 1 = 1 |
Lenny |
|
| Back to top |
|
 |
dick scherrer
Moderator Emeritus

Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
| Quote: |
Unreadable characters how you know from X'00' to X'3F' and X'FF'.
|
There are many more than these. . . .
This is neither a UNIX nor Windows (ascii) environment and the character set is not sensitive to values "below spaces". For example x'59' and x'DA' are fairly useless as screen/report values. . .
I believe the time spent trying to fix junk values in database coumns would be better spent preventing them in the first place. . . |
|
| Back to top |
|
 |
Terry Heinze
JCL Moderator
Joined: 14 Jul 2008 Posts: 1248 Location: Richfield, MN, USA
|
|
|
|
| dick scherrer wrote: |
| I believe the time spent trying to fix junk values in databast coumns would be better spent preventing them in the first place. . . |
Can I hear an "Amen" to that?  |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
| dick scherrer wrote: |
Hello,
| Quote: |
Unreadable characters how you know from X'00' to X'3F' and X'FF'.
|
There are many more than these. . . .
This is neither a UNIX nor Windows (ascii) environment and the character set is not sensitive to values "below spaces". For example x'59' and x'DA' are fairly useless as screen/report values. . .
I believe the time spent trying to fix junk values in database coumns would be better spent preventing them in the first place. . . |
You are right as usual.
You can add all unreadable characters known to you to this string:
X'000102...3F' where doesn't matter what order.
They'll will convert all of them to X'FF' by Translate function , then remove by Replace function.
How I shown before.
You can't prevent all mistakes, but you have to be ready to fix them.
Thanks, Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
| Terry Heinze wrote: |
| dick scherrer wrote: |
| I believe the time spent trying to fix junk values in databast coumns would be better spent preventing them in the first place. . . |
Can I hear an "Amen" to that?  |
Amen !
If you want....
Lenny |
|
| Back to top |
|
 |
GuyC
Senior Member
Joined: 11 Aug 2009 Posts: 1281 Location: Belgium
|
|
|
|
| Try this with a unicode table and you're in trouble. |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
| GuyC wrote: |
| Try this with a unicode table and you're in trouble. |
I made it for Mainframe (code EBCDIC).
For UNICODE has to be something different.
Thanks |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Common to this problem is the problem of selecting rows with only numeric data in some column of the table, which consists the mixed data.
For example in table1 we have column1 Varchar(100):
Column1
| Quote: |
'abc122'
'cdef6789'
'123654'
' 234 456'
'345.67'
'345...67'
'-787.001'
'567+675.44'
'777.888-'
'+987234570911' |
So, we have to create SQL query, which will select
Column1
| Quote: |
'123654'
'345.67'
'-787.001'
'777.888-'
'+987234570911' |
only....
How we can understand we have to select string with
'0123456789+-.', also in string could one and only one '-', '+' and '.'.
In practice is not so seldom problem, how I know.
| Quote: |
step1: Remove space from the string. String has not to be change
step2: Transate all characters in string '+-.123456' to space.
step3: Remove '-', '+', '.' from the string. For each remove length of the string has to become 1 character less
step4. Find position of '+' and '-'. Possition has to be first or last in the string
step5. '+' and '-' has not to be in the same string |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Something like this. But it's not complete. I forget something.
Maybe you know what I forgot ?
| Code: |
select column1
from
(select strip( column1) from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-') in (0, 1, length(column1) )
and
posstr(column1, '+') in (0, 1, length(column1) ) |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
The complete statement:
| Code: |
select column1
from
(select strip( column1) from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-') in (0, 1, length(column1) )
and
posstr(column1, '+') in (0, 1, length(column1) )
and not exists
(select * from sysibm.sysdummy1
where posstr(column1, '-') > 0
and
posstr(column1, '+') > 0) |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
I have to make small but important correction to the code:
| Quote: |
| (select strip(column1) column1 from table1) ii |
| Code: |
select column1
from
(select strip(column1) column1 from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-') in (0, 1, length(column1) )
and
posstr(column1, '+') in (0, 1, length(column1) )
and not exists
(select * from sysibm.sysdummy1
where posstr(column1, '-') > 0
and
posstr(column1, '+') > 0) |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Some of my friends let me know about
which is not valid in DB2.
I corrected the query in 2nd time,
| Code: |
select column1
from
(select strip(column1) column1 from table1) ii
where
replace( column1, ' ', '') = column1
and
Translate(column1, ' ', '0123456789+-.') = ' '
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-') in (0, 1)
and
posstr(column1, '+') in (0, 1)
and not exists
(select * from sysibm.sysdummy1
where
posstr(column1, '-') > 0
and
posstr(column1, '+') > 0) |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Now it looks good, but not a perfect !
I suppose to change query:
1. string has to have at least one digit
2. I want accept strings: '777.888-', '- 12345', '+ 456.98'
3. Inner table has to bring as only decimal likes stings
| Code: |
select column1
from
(select case when substr(strip(column1), length(strip( column1)), 1)
in ('+', '-')
then
substr(strip(column1), length(strip( column1)), 1)
||
substr(strip(column1), 1, length(strip( column1)) - 1)
when substr(strip( column1), 1, 1) in ('+', '-')
then
substr(strip( column1), 1, 1)
||
strip(substr(strip(column1), 2))
else strip( column1)
end AS column1
from table1
where Translate(column1, ' ', '0123456789+-.') = ' '
and column1 > Translate(column1, ' ', '0123456789')
) ii
where
replace( column1, ' ', '') = column1
and
length(replace( column1, '+', '')) >= length(column1) - 1
and
length(replace( column1, '-', '')) >= length(column1) - 1
and
length(replace( column1, '.', '')) >= length(column1) - 1
and
posstr(column1, '-') in (0, 1)
and
posstr(column1, '+') in (0, 1)
and not exists
(select * from sysibm.sysdummy1
where
posstr(column1, '-') > 0
and
posstr(column1, '+') > 0) |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
There are no functions exist to do it.
But you can create your own UDF, using this code:
| Code: |
with
Source(source_str) as
(select 'You are the best DB2 programmers in the world !!!!'
from sysibm.sysdummy1
)
,
String_cleaner(result_string) as
(
select strip(source_str) from Source
union All
select replace(result_string, ' ', ' ') from String_cleaner
where Length(replace(result_string, ' ', ' ')) < length(result_string)
)
,
Final_operation(result_string, source) as ( select result_string, source_str
from String_cleaner, Source
where length(result_string)
= (select min(length(result_string)) from String_cleaner)
)
select source "Source String", result_string "Result String"
from Final_operation |
Result of run will be:
| Quote: |
| You are the best DB2 programmers in the world !!!! |
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
We can remove extra spaces in more tricky way:
| Code: |
with
Source(source_str) as
(select 'You are the best DB2 programmers in the world !!!! '
from sysibm.sysdummy1
)
,
string_posn(posn, charposn, iterno) as
(
select int(1), substr(source_str, 1, 1), length(source_str)
from Source
union all
select posn + 1, substr(source_str, posn + 1, 1), iterno
from string_posn, Source
where posn + 1 <= length(source_str)
)
,
Final_operation(result_string, lastchar, K) as
(select varchar(charposn, 1000), charposn, 1
from string_posn where posn = 1
union all
select result_string ||
case
when lastchar = ' ' and charposn = ' '
then ''
else charposn
end, charposn, K + 1
from Final_operation, string_posn
where posn = K + 1
and K + 1 <= iterno
)
,
Get_result_string(source_str, result_string) as
( select source_str, result_string
from Final_operation, Source
where K = (select max(iterno) from string_posn)
)
select source_str, result_string
from Get_result_string |
Result will be the same.
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
I don't know easy way to find out where position of the first digit in a string.
Maybe you know ? Let me know.
Lenny |
|
| Back to top |
|
 |
lkhiger
New User

Joined: 28 Oct 2005 Posts: 89
|
|
|
|
Another way to cleanup string from unreadable characters.
For example we have to remove from the string all characters with hexadecimal code less then X'40' (space).
We can do this in following way:
| Code: |
With
Source (source_str) as
(select 'Wel' || X'00070d33' || 'come ' || X'33343f' || 'to DB2 !'
from sysibm.sysdummy1
)
,
Clean_up(source_str, result_str, posn, maxposn) as
(select source_str, varchar('', length(source_str)), 0, length(source_str)
from Source
Union All
select source_str,
result_str || case When substr(source_str, posn + 1, 1) < X'40' Then ''
Else substr(source_str, posn + 1, 1) End,
posn + 1, maxposn
From Clean_up Where posn + 1 <= maxposn
)
select source_str as "Source string", result_str "Result String"
From Clean_up
Where posn = maxposn |
|
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|