开发者

C/C++/Java/C#: help parsing numbers

开发者 https://www.devze.com 2023-01-05 19:54 出处:网络
I\'ve got a real problem (it\'s not homework, you can check my profile). I need to parse data whose formatting is not under my control.

I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.

The data look like this:

6,852:6,100,752

So there's first a number made of up to 9 digits, followed by a colon.

Then I know for sure that, after the colon:

  • there's at least one valid combination of numbers that add up to the number before the column
  • I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)

In this case, 6开发者_如何学Python852 is 6100 + 752.

My problem: I need to find these numbers (in this example, 6100 + 752).

It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).

Once again: that unfortunate formatting is not under my control and, once again, this is not homework.

I need to solve this for up to 10 numbers that need to add up.

Here's an example with three numbers adding up to 6855:

6,855:360,6,175,320

I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.

How do you typically solve such a problem in a C-style bracket language?


Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.

(1) You know that any group of 1 or 2 digits must begin a term.

(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)


Recursive implementation (pseudo code):

int total;  // The total read before the colon

// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
    if isEmpty(tokens)
         if sum = total return partialList
         else return null  // Got to the end with the wrong sum
    var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
    var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
    if result1 <> null return result1
    if result2 <> null return result2
    return null  // No solution overall

You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app. Divide-and-conquer would make for a nice improvement.


I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.

One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.


Reading in C++:

std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
  std::pair<int,std::vector<int> > result;
  if(!is >> result.first) throw "foo!"
  for(;;) {
    int i;
    if(!is >> i) 
      if(is.eof()) return result;
      else throw "bar!";
    result.second.push_back(i);
    char ch;
    if(is >> ch) 
      if(ch != ',') throw "foobar!";
    is >> std::ws;
  }
}

void f()
{
  std::istringstream iss("6,852:6,100,752");
  std::pair<int,std::vector<int> > foo = read_numbers(iss);
  std::vector<int> result = get_winning_combination( foo.first
                                                   , foo.second.begin()
                                                   , foo.second.end() );
  for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
    std::cout << *i << " ";
}

The actual cracking of the numbers is left as an exercise to the reader. :)


I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.

For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).

Actually checking for which numbers add up is easy to do using a recursive search.

List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
    if (numberOfElements == 0)
    {
        if (total == 0)
            return new List<int>(); // Empty list.
        else
            return null; // Indicate no solution.
    }
    else if (total < 0)
    {
        return null; // Indicate no solution.
    }
    else
    {
        for (int i = 0; i < values.Count; ++i)
        {
            List<int> summands = GetSummands(
                total - values[i], numberOfElements - 1, values.Skip(i + 1));
            if (summands != null)
            {
                // Found solution.
                summands.Add(values[i]);
                return summands;
            }
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消