开发者

List processing with intermediate state

开发者 https://www.devze.com 2023-04-06 23:02 出处:网络
I am processing a list of strings, you can think of them as lines of a book. When a line is empty it must be discarded. When it is a title, it is \"saved\" as the current title. Every \"normal\" line

I am processing a list of strings, you can think of them as lines of a book. When a line is empty it must be discarded. When it is a title, it is "saved" as the current title. Every "normal" line must generate an object with its text and the current title. In the end you have a list of lines, each with its corresponding title.

Ex.:

- Chapter 1

Lorem ipsum dolor sit amet
consectetur adipisicing elit

- Chapter 2

sed do eiusmod tempor
incididunt u

First line is a title, second line must be discarded, then two lin开发者_如何学Ces are kept as paragraphs, each with "Chapter 1" as title. And so on. You end up with a collection similar to:

{"Lorem ipsum...", "Chapter 1"},
{"consectetur...", "Chapter 1"},
{"sed do...", "Chapter 2"},
{"incididunt ...", "Chater 2"}

I know the title/paragraph model doesn't make 100% sense, but I simplified the model to illustrate the problem.

This is my iterative solution:

let parseText allLines =
    let mutable currentTitle = String.Empty
    seq {
        for line in allLines do
            match parseLine line with
            | Empty -> 0 |> ignore
            | Title caption ->
                currentTitle <- caption
            | Body text ->
                    yield new Paragraph(currentTitle, text)
    }

First issue is I have to discard empty lines, I do it with 0 |> ignore but it looks quite bad to me. What is the proper to do this (without pre-filtering the list)?

A tail-recursive version of this function is straightforward:

let rec parseText allLines currentTitle paragraphs =
    match allLines with
    | [] -> paragraphs
    | head :: tail -> 
        match head with
        | Empty -> parseText tail currentTitle paragraphs
        | Title caption -> parseText tail caption paragraphs
        | Body text -> parseText tail currentTitle (new Paragraph(currentTitle, text) :: tail)

The question(s):

  • Is there a significant difference between the two versions (style/performance/etc)?
  • Is there a better approach to solve this problem? Is it possible to do it with a single List.map?


Although not a single List.Map, here is the solution I came up with:

let parseText allLines = 
    allLines 
    |> Seq.fold (fun (currentTitle,paragraphs) line -> 
        match parseLine line with
        | Empty -> currentTitle,paragraphs
        | Title caption -> caption,paragraphs
        | Body text -> String.Empty,Paragraph(currentTitle, text)::paragraphs
        ) (String.Empty,[])
    |> snd

I'm using a fold with (currentTitle,paragraphs) as state. snd is used to extract the result (It is the second part of the state tuple).

When you do most of your processing in F#, using lists is very tempting, but other data structures, even plain sequences have their uses.

BTW, your sequence code does compile? I had to replace mutable currentTitle = String.Empty with currentTitle = ref String.Empty.


You can replace 0 |> ignore with () (unit), which is a no-op. The biggest difference between your two implementations is the first one is lazy, which might be useful for large input.

The following might also work for you (it's the simplest solution I can think of):

let parseText (lines:seq<string>) =
  lines
  |> Seq.filter (fun line -> line.Trim().Length > 0)
  |> Seq.pairwise (fun (title, body) -> Paragraph(title, body))

If not, maybe this will work:

let parseText (lines:seq<string>) =
  lines
  |> Seq.choose (fun line -> 
    match line.Trim() with
    | "" | null -> None
    | Title title -> Some title
    | Body text -> Some text)
  |> Seq.pairwise (fun (title, body) -> Paragraph(title, body))


Below is one such implementation ( although not tested, but I hope it does give you the idea )

let isNotEmpty l = match l with
                   | Empty -> false
                   | _ -> true

let parseText allLines =
    allLines |> Seq.map parseLine |> Seq.filter isNotEmpty
    |> Seq.scan (fun (c,t,b) i -> match i with
                                  | Title tl -> (0,tl,"")
                                  | Body bb -> (1,t,bb) 
                                  | _ -> (0,t,b)) (0,"","")
    |> Seq.filter (fun (c,_,_) -> c > 0)
    |> Seq.map (fun (_,t,b) -> Paragraph(t,b) )
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号