With many .NET developers moving from the traditional (and broken) System.IO.Ports.SerialPort DataReceived
event handling to either the correct and more efficient BaseStream.BeginRead
/ BaseStream.EndRead
pair I promoted in my last post or the newer BaseStream.ReadAsync
method introduced in .NET Framework 4.5 along with the C# async
and await
keywords, a common complaint is that BaseStream
doesn’t provide any ReadLine()
method. They try assuming that each EndRead
or ReadAsync
will be exactly one line, and get wrong results.
For some developers, it is enough to point out that byte-oriented streams don’t preserve message boundaries, so it’s possible for a message to be split across multiple buffer transfers (from hardware FIFO to application). For others, including both users of serial ports, TCP sockets and pipes, they just don’t know what to do next, so they ask on StackOverflow. Because there are so many questions about fixing code that badly handles message fragmentation, I wanted to share an elegant solution to the problem.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
class LineSplitter { public event Action LineReceived; public byte Delimiter = (byte)'n'; byte[] leftover; public void OnIncomingBinaryBlock(object sender, byte[] buffer) { int offset = 0; while (true) { int newlineIndex = Array.IndexOf(buffer, Delimiter, offset); if (newlineIndex offset) leftover = ConcatArray(leftover, buffer, offset, buffer.Length - offset); return; } ++newlineIndex; byte[] full_line = ConcatArray(leftover, buffer, offset, newlineIndex - offset); leftover = null; offset = newlineIndex; LineReceived?.Invoke(full_line); // raise an event for further processing } } static byte[] ConcatArray(byte[] head, byte[] tail, int tailOffset, int tailCount) { byte[] result; if (head == null) { result = new byte[tailCount]; Array.Copy(tail, tailOffset, result, 0, tailCount); } else { result = new byte[head.Length + tailCount]; head.CopyTo(result, 0); Array.Copy(tail, tailOffset, result, head.Length, tailCount); } return result; } } |
The logic is quite simple. Between data blocks, the variable leftover
holds an incomplete message, if any. When a new block arrives, it is separated (ala String.Split
) at each occurrence of the line delimiter (here, 'n'
, but any other single end-of-message byte can easily be substituted). The first substring of the new data is concatenated with the leftover partial message to form a complete line. Ranges between delimiters are extracted and forwarded likewise. And any data following the last delimiter becomes the leftover saved for the next incoming block (including the old leftovers, in the case that no delimiters were found at all).
Although there are many cases which must be dealt with: zero, one, or multiple delimiters found, whether the last byte is a delimiter or not, and whether there was leftover data from the last call, instead of handling all combinations explicitly, introducing a helper method that gets called twice greatly simplifies matters.
You may have noticed that there are no serial port or stream calls here at all. The separation of buffer processing from I/O calls here is intentional, and an approach I very strongly recommend that you adopt in your own applications. The benefits are the usual ones associated with adhering to the Single Responsibility Principle — easier testing, lower coupling, more flexibility. For example, an class containing just this code can be inserted between various types of data sources — serial port, TCP stream, logfile replay — and the application code to log, parse, and otherwise process incoming lines.
I’ve also intentionally NOT followed the EventHandler
pattern. It had value when it was introduced, but now the C# language supports variable capture in anonymous delegates and lambda expressions, so the sender
parameter is useless. As a benefit, the event is now compatible with the Add
method of a System.Collections.Generic.List
making unit testing very easy:
1 2 3 4 5 6 7 8 9 |
[TestMethod] void DetectorTestTwoLinesArrivingTogether() { var result_lines = List(); var dut = new LineDetector(); dut.LineReceived += result_lines.Add; dut.OnIncomingBinaryBlock(new[] { 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0x72, 0x65, 0x61, 0x64, 0x65, 0x72, 0x73, 0x21, 0x0D, 0x0A, 0x53, 0x70, 0x61, 0x72, 0x78, 0x20, 0x69, 0x73, 0x20, 0x74, 0x68, 0x65, 0x20, 0x62, 0x65, 0x73, 0x74, 0x2E, 0x0D, 0x0A }); Assert.AreEqual(2, result_lines.Count); } |
With a little care to the parameter types of the trigger methods and events, your controller code may not need to do anything more than compose objects:
1 2 3 4 5 |
serialPortController.BinaryBlockReceived += lineSplitter.OnIncomingBinaryBlock; lineSplitter.LineReceived += messageParser.OnIncomingLine; messageParser.ValuesParsed += realtimePlot.PlotValues; messageParser.ValuesParsed += UpdateLabels; messageParser.ValuesParsed += dataLogger.WriteValues; |
11 Responses
I know this is an old post, having said that, this is very nice, very handy.
Thanks for sharing.
Useful topic.
There seem to be a few problems in class LineSplitter:
Line 13: Missing comparison operator? Also missing open brace?
Lines 14, 15: Odd indent suggests something missing
Line 21: Missing one of the expressions?
if (newlineIndex < offset)
{
leftover = ConcatArray(leftover, buffer, offset, buffer.Length – offset);
return;
}
and…
public event Action LineReceived;
It didn’t post it right the first time…
public event Action LineReceived;
Still not posting it right…
Action should have “byte[]” as its T.
Thanks for the suggestions made!
Is there memory leak here with reassigning result to a new byte[]?
This is .NET and C#. The garbage collector takes care of it so there is no memory leak.
Is article this still valid in 2021?
is it still valid in 2024?