开发者

Uploading multi-line records into SQL Server

开发者 https://www.devze.com 2023-04-06 10:41 出处:网络
We receive fixed length datasets from a client that look something like this: 1 SOMEFILE20110922 2 20110101ABC999

We receive fixed length datasets from a client that look something like this:

1 SOMEFILE   20110922
2 20110101ABC999  
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999  
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

where the number in the first position on each line indicates the type of information in the line. The types are:

1  Header record (only appears once, in the first line)  
2  Contract record  
3  Person record  
4  Amounts record  
5  Trailer record (only appears once, in the last line)

The information in 2, 3 and 4 all actually relate to one record, and I need to find a way at upload stage to combine them into one. There are no identifiers that explicitly specify which combinations of 2, 3 and 4 belong with one another, but in all cases they have been ordered in the raw data to appear directly below one another.

What I need is a preprocessing step that will take the original data and then combine the correct 2,3 and 4 lines into one record (and then output again as a txt file), like this:

20110101ABC999JOHN         SMITH     198002015000000       1000
20060101DEF999JANE         KOTZE     19811001200000        800

I have thought of bcp'ing into SQL (or even just using Access) and assigning an auto-incremented integer as PK. i.e:

  PK Type  Record 
  1  1     SOMEFILE   20110922
  2  2     20110101ABC999  
  3  3     JOHN         SMITH     19800201
  4  4     5000000       1000
  5  2     20060101DEF999  
  6  3     JANE         KOTZE     19811001
  7  4     200000        800
  8  5     5200000       1800

and then doing something like:

select 
type2.[record]+type3.[record]+type4.[record]
from

(select [record] from uploaded where [type]=2) as type2

join
(select [record] from uploaded where [type]=3) as type3
on type2.PK + 1 = type3.PK

join
(select [record] from uploaded where [type]=4) as type4
on type2.PK + 2 = type4.PK

But what I am worried about is that this is entirely dependent on SQL Server assigning the PKs in the order that the data appears in die input file; I am not sure that this would necessarily be the case.

Does anyo开发者_运维问答ne know? Or know of a better way to do this?

Thanks

Karl


Edit: added second solution

Solution 1:

You can not be sure regarding SQL Server insert order. You have to do some text file processings before importing your data in SQL Server. For example, you can use PowerShell to add a PK into file thus:

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

for($i=0; $i -lt $rows.length; $i++)
{
    $row = $rows[$i]
    $temp=("00000"+[string]($i+1))
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults.txt $rows

Before (MyData.txt content):

1 SOMEFILE   20110922
2 20110101ABC999
3 JOHN         SMITH     19800201
4 5000000       1000
2 20060101DEF999
3 JANE         KOTZE     19811001
4 200000        800
5 5200000       1800

After PowerShell processing (MyDataResults.txt content):

00001 1 SOMEFILE   20110922
00002 2 20110101ABC999
00003 3 JOHN         SMITH     19800201
00004 4 5000000       1000
00005 2 20060101DEF999
00006 3 JANE         KOTZE     19811001
00007 4 200000        800
00008 5 5200000       1800

In both PS scripts I assume you can insert max. 99999 rows.

Solution 2:

$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt

$rows[0]="00000 "+$row
$rows[$rows.length-1]="99999 "+$row

$groupid=0

for($i=1; $i -lt $rows.length-1; $i=$i+3)
{
    $groupid++

    $row = $rows[$i]
    $temp=("00000"+[string]$groupid)
    $rows[$i]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+1]
    $temp=("00000"+[string]$groupid)
    $rows[$i+1]=$temp.substring($temp.length-5)+" "+$row

    $row = $rows[$i+2]
    $temp=("00000"+[string]$groupid)
    $rows[$i+2]=$temp.substring($temp.length-5)+" "+$row        

}

SET-CONTENT -PATH D:\BD\Samples\MyDataResults2.txt $rows

Results:

00000 4 200000        800
00001 2 20110101ABC999
00001 3 JOHN         SMITH     19800201
00001 4 5000000       1000
00002 2 20060101DEF999
00002 3 JANE         KOTZE     19811001
00002 4 200000        800
99999 4 200000        800
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号