开发者

DB2/iSeries SQL clean up CR/LF, tabs etc

开发者 https://www.devze.com 2023-04-09 07:34 出处:网络
I need to find and clean up line breaks, carriage returns, tabs and \"SUB\"-characters 开发者_JS百科in a set of 400k+ string records, but this DB2 environment is taking a toll on me.

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters 开发者_JS百科in a set of 400k+ string records, but this DB2 environment is taking a toll on me.

Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.

DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC

Any hints on what I can do with this?


I used this SQL to find x'25' and x'0D':

SELECT 
     <field>
    , LOCATE(x'0D', <field>) AS "0D" 
    , LOCATE(x'25', <field>) AS "25" 
    , length(trim(<field>)) AS "Length"
FROM <file> 
WHERE   LOCATE(x'25', <field>) > 0 
    OR  LOCATE(x'0D', <field>) > 0 

And I used this SQL to replace them:

UPDATE <file> 
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE   LOCATE(x'25', <field>) > 0 
    OR  LOCATE(x'0D', <field>) > 0 


If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.

If you just want to remove undisplayable characters then look for anything under x'40'.


Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:

UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40', 
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'    

If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.


Try TRANSLATE or REPLACE.

The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.

The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.


For details on the available functions see the DB2 for i5/OS SQL Reference.


Perhaps the TRANSLATE() function will serve your needs.

    TRANSLATE( data, tochars, fromchars )

...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with. Using the DSPFFD command on your table should show the CCSID of your fields.


we struggled a lot to replace the new line char and carriage return from flat file.

Finally we used below sql to sort the issue.

REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')

Try it out

CR = CHR(13)
LF = CHR(10) 
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号