开发者

FileStream position is off after calling ReadLine() from C#

开发者 https://www.devze.com 2023-01-02 00:11 出处:网络
I\'m trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.

The problem is, after the very first call to

streamReader.ReadLine();

the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file开发者_运维问答 has more than one line :-)

For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.

How can I find the actual position where the 1st line ends, so I can return there later?

I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:

  • ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
  • I'm not sure if this would work OK with variable-length characters

...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.

If it helps, I open my file like this:

using (var reader = new StreamReader(
        new FileStream(
                       m_path, 
                       FileMode.Open, 
                       FileAccess.Read, 
                       FileShare.ReadWrite)))
{...}

Any suggestions?


If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.

You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.


I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.

The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.

Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.

My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    int ch;
    int currentLine = 1, offset = 0;

    while ((ch = fs.ReadByte()) >= 0)
    {
        offset++;

        // This covers all cases: \r\n and only \n (for UNIX files)
        if (ch == 10)
        {
            currentLine++;

            // ... do sth such as log current offset with line number
        }
    }
}

And to go back to logged offset:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    fs.Seek(yourOffset, SeekOrigin.Begin);
    TextReader tr = new StreamReader(fs);

    string line = tr.ReadLine();
}

Also note there is already buffering mechanism built into FileStream.


StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.


A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.

If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.

edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号