开发者

Piping unzip in a SAS infile

开发者 https://www.devze.com 2023-04-08 20:23 出处:网络
Suppose I do the following in SAS: filenametmp pipe \'unzip -c -qq ./data_xml.zip\'; libnametmp xml xmlmap=TMMap access=READONLY;

Suppose I do the following in SAS:

filename  tmp pipe 'unzip -c -qq ./data_xml.zip';
libname   tmp xml xmlmap=TMMap access=READONLY;

data header; set tmp.header; run;
data owners; set tmp.owners; run;

This will unzip the data_xml.zip file and use the SAS xmlmap file to generate two data sets, header and owners.

My question is, how many times will unzip run on data_xml.zip? Wil开发者_如何学JAVAl the unzipping just happen once, or will it happen twice because I'm setting a data set from the tmp libname twice?


The short answer is, YES, it will unzip it twice.

As I understand it, the unzip -c essentially turns that data into a sequential source because it is streaming from the unzip command directly into the PIPE libname.

Presumably, you want to stream via the -c and the PIPE because of disk space and/or performance concerns with landing the file to disk first. Unfortunately, I'm fairly certain that the way things are set up, the only way to minimize the CPU of an additional unzip will be to land it to disk on a temporary file first.

However, depending on the size of the file, the CPU hit for a second unzip might not outweigh the I/O hit for having to read an expanded file from disk at least one extra time.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号