开发者

Help Importing CSV file with Variable Columns per Row into SQL Table using Import tool or SSIS

开发者 https://www.devze.com 2023-04-06 23:32 出处:网络
I am stuck with a CSV file with over 100,000 rows that contains product images from a provider. Here are the details of the issue, I would really appreciate some tips to help resolve this. Thanks.

I am stuck with a CSV file with over 100,000 rows that contains product images from a provider. Here are the details of the issue, I would really appreciate some tips to help resolve this. Thanks.

The File has 1 Row per product and the following 4 columns. ID,URL,HEIGHT,WIDTH example: 1,http://i.img.com,100,200

Problem starts when a product has multiple images. Instead of having 1 row per image the file has more columns in same row.

example: 1,http://i.img.com,100,200,//i.img.com,20,100,//i.img.com,30,50

Note that only first image has "http://" remaining images start with "//"

There is no telling how many images per product hence no way to tell how many total c开发者_如何学编程olumns per row or max columns.

How can I import this using SSIS or sql import wizard.

Also I need to do this on regular intervals.

Thank you for your help.


I don't think that you can use any standard SSIS task or wizard to do this. You're going to have to write some custom code which parses each line. You can do this in SSIS using VB code or you can import the file into a staging table that's just a single column to hold each row and do the parsing in SQL. SSIS will probably be faster for this kind of operation.

Another possibility is to preprocess the file using regex or a search-and-replace command. Try to get double-quotes around the image list then you should be able to import the whole file fine, with the quoted part going into a single column. Catching the start of the string should be easy enough given the "http:\" for which you can search. Determining where the end quote goes might be more of a problem.

A third potential solution would be to get the source to fix the data. Even if you can't get the images in separate rows (or another file with separate rows, which would be ideal), maybe you can get the double-quotes added from the source as part of the export. This would likely be less error-prone than using the search-and-replace method.

Good luck!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号