开发者

C++ - How to use a stream to parse a file?

开发者 https://www.devze.com 2023-03-10 00:30 出处:网络
I have a file and I need to loop through it assigning an int foo, string type, 64/128 bit long. How would I use a stream to parse these lines into the following variables - I want to stick with the st

I have a file and I need to loop through it assigning an int foo, string type, 64/128 bit long. How would I use a stream to parse these lines into the following variables - I want to stick with the stream syntax ( ifs >> foo >> type ) but in this case type would end up being the rest of the line after the 0/52 ... and at that point I'd just get a char* and use strtoull and such so why use the stream in the first place... I'm hoping f开发者_JAVA百科or readable code without horrid performance over char strings / strtok / strtoull

//input file:
0ULL'04001C0180000000000000000EE317BC'
52L'04001C0180000000'
//ouput:
//0 ULL 0x04001C0180000000 0x000000000EE317BC
//52 L 0x04001C0180000000

  ifstream ifs("input.data");
  int foo;
  string type;
  unsigned long long ull[2];


Boost Spirit implementation

Here is the mandatory Boost Spirit (Qi) based implementation. For good measure, including formatting using Boost Spirit (Karma):

#include <string>
#include <iostream>
#include <fstream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

namespace karma=boost::spirit::karma;
namespace qi   =boost::spirit::qi;

static qi::uint_parser<unsigned long long, 16, 16, 16> hex16_p; // parse long hex
static karma::uint_generator<unsigned long long, 16>   hex16_f; // format long hex

int main(int argc, char** args)
{
    std::ifstream ifs("input.data");
    std::string line;
    while (std::getline(ifs, line))
    {
        std::string::iterator begin = line.begin(), end = line.end();

        int                             f0;
        std::string                     f1;
        std::vector<unsigned long long> f2;

        bool ok = parse(begin, end,
                qi::int_                    // an integer
                >> *qi::alpha               // alternatively: *(qi::char_ - '\'')
                >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
            , f0, f1, f2);

        if (ok)
            std::cout << "Parsed: " << karma::format(
                 karma::int_ 
                 << ' ' << karma::string 
                 << ' ' << ("0x" << hex16_f) % ' '
             , f0, f1, f2) << std::endl;
        else
            std::cerr << "Parse failed: " << line << std::endl;
    }

    return 0;
}

Test run:

Parsed: 0 ULL 0x4001c0180000000 0xee317bc
Parsed: 52 L 0x4001c0180000000

see Tweaks and samples below for info on how to tweak e.g. hex output

Benchmark

I had benchmarked @Cubbi's version and the above as written on 100,000x the sample inputs you provided. This initially gave Cubbi's version a slight advantage: 0.786s versus 0.823s.

Now, that of course wasn't fair comparison, since my code is constructing the parser on the fly each time. With that taken out of the loop like so:

typedef std::string::iterator It;

const static qi::rule<It> parser = qi::int_ >> *qi::alpha >> '\'' >> +hex16_p >> '\'';
bool ok = parse(begin, end, parser, f0, f1, f2);

Boost Spirit comes out a clear winner with only 0.093s; already a factor 8.5x faster, and that is even with the karma formatter still being constructed each iteration.

with the output formatting commented out in both versions, Boost Spirit is >11x faster

Tweaks, samples

Note how you can easily tweak things:

//  >> '\'' >> +hex16_p >> '\'' // accepts 'n x 16' hex digits
    >> '\'' >> qi::repeat(1,2)[ hex16_p ] >> '\'' // accept 16 or 32 digits

Or format the hex output just like the input:

// ("0x" << hex16_f) % ' '
karma::right_align(16, '0')[ karma::upper [ hex16_f ] ] % ""

Changed sample output:

0ULL'04001C0180000000000000000EE317BC'
Parsed: 0 ULL 04001C0180000000000000000EE317BC
52L'04001C0180000000'
Parsed: 52 L 04001C0180000000

HTH


This is a rather trivial task for a more sophisticated parser such as boost.spirit.

To solve this using just the standard C++ streams you would need to

  • a) treat ' as whitespace and
  • b) take an extra pass over the string "04001C0180000000000000000EE317BC" which has no separators between the values.

Borrowing Jerry Coffin's sample facet code,

#include <iostream>
#include <fstream>
#include <locale>
#include <vector>
#include <sstream>
#include <iomanip>
struct tick_is_space : std::ctype<char> {
    tick_is_space() : std::ctype<char>(get_table()) {}
    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask>
               rc(table_size, std::ctype_base::mask());
        rc['\n'] = std::ctype_base::space;
        rc['\''] = std::ctype_base::space;
        return &rc[0];
    }
};

int main()
{
    std::ifstream ifs("input.data");
    ifs.imbue(std::locale(std::locale(), new tick_is_space()));
    int foo;
    std::string type, ullstr;
    while( ifs >> foo >> type >> ullstr)
    {
        std::vector<unsigned long long> ull;
        while(ullstr.size() >= 16) // sizeof(unsigned long long)*2
        {
            std::istringstream is(ullstr.substr(0, 16));
            unsigned long long tmp;
            is >> std::hex >> tmp;
            ull.push_back(tmp);
            ullstr.erase(0, 16);
        }
        std::cout << std::dec << foo << " " << type << " "
                  << std::hex << std::showbase;
        for(size_t p=0; p<ull.size(); ++p)
            std::cout << std::setw(16) << std::setfill('0') << ull[p] << ' ';
        std::cout << '\n';
    }
}

test: https://ideone.com/lRBTq

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号