Htykuut

Question

Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.

My implementation:

using System;

using System.Collections.Generic;

using System.Linq;



namespace SO

{

    internal class Program

    {

        public static void Main(string args)

        {

            List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};

            List<String> listB = new List<string> {"1", "2", "3"};



            var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);



            foreach (var m in mix)

            {

                Console.WriteLine(m);

            }

        }

    }



    public static class Impl

    {               

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>( 

            this IEnumerable<TFirst> first, 

            IEnumerable<TSecond> second, 

            Func<TFirst, TSecond, TResult> resultSelector) 

        { 

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;





                while(true) 

                {

                    i1 = iterator1.MoveNext();

                    i2 = iterator2.MoveNext();



                    if (!i1 && (i1Shorter || firstRun))

                    {

                        iterator1.Reset();

                        i1 = iterator1.MoveNext();

                        i1Shorter = true;

                        firstRun = false;

                    }



                    if (!i2 && (i2Shorter || firstRun))

                    {

                        iterator2.Reset();

                        i2 = iterator2.MoveNext();

                        i2Shorter = true;

                        firstRun = false;

                    }



                    if (!(i1 && i2))

                    {

                        break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

            } 

        }

    }

}

And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.

Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended. — Nov 21 at 14:33
Never Reset an enumerator. Just get a new enumerator. Reset was a misfeature intended for COM interop scenarios. — Nov 21 at 19:45

score 8 · Answer 1 · 2018-11-23 08:33:34Z

I noticed a few things that can be improved:

Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.

i1 and i2 can be declared inside the while loop.

Addendum:

As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:

class RepeatableEnumerator<T> : IDisposable

{

    private IEnumerable<T> _enumerable;

    private IEnumerator<T> _enumerator;





    public bool IsRepeating { get; private set; }

    public T Current => _enumerator.Current;





    public RepeatableEnumerator(IEnumerable<T> enumerable)

    {

        _enumerable = enumerable;

        _enumerator = enumerable.GetEnumerator();

    }



    public void Dispose()

    {

        _enumerator.Dispose();

        _enumerator = null;

    }



    public bool MoveNext() => _enumerator.MoveNext();



    public bool Repeat()

    {

        IsRepeating = true;

        _enumerator.Dispose();

        _enumerator = _enumerable.GetEnumerator();

        return _enumerator.MoveNext();

    }

}

Which can then be used for both enumerables (and possibly in other extension methods as well):

public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(

    this IEnumerable<TFirst> first,

    IEnumerable<TSecond> second,

    Func<TFirst, TSecond, TResult> resultSelector)

{

    // Eager parameter validation:

    if (first == null) throw new ArgumentNullException(nameof(first));

    if (second == null) throw new ArgumentNullException(nameof(second));

    if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));



    // Local function for lazy zipping:

    IEnumerable<TResult> ZipLongestImpl()

    {

        using (var enum1 = new RepeatableEnumerator<TFirst>(first))

        using (var enum2 = new RepeatableEnumerator<TSecond>(second))

        {

            // Up-front check for empty collections:

            if (!enum1.MoveNext() || !enum2.MoveNext())

                yield break;



            while (true)

            {

                yield return resultSelector(enum1.Current, enum2.Current);



                var is1Empty = !enum1.MoveNext();

                var is2Empty = !enum2.MoveNext();

                if (is1Empty)

                {

                    if (enum2.IsRepeating || is2Empty || !enum1.Repeat())

                        yield break;

                }

                else if (is2Empty)

                {

                    if (enum1.IsRepeating || !enum2.Repeat())

                        yield break;

                }

            }

        }

    }

    return ZipLongestImpl();

}

At this point it would be a good idea to add some documentation...

Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started. — Nov 22 at 7:52
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function. — Nov 22 at 8:25
Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two. — Nov 23 at 7:41
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped. — Nov 23 at 7:50
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little. — Nov 23 at 8:31

Peter Taylor 15.4k2657 · Answer 2 · 2018-11-21 14:54:59Z

    public static class Impl

    {

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;

The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as

                var firstEnded = false;

                var secondEnded = false;



                while (true) 

                {

                    if (!iterator1.MoveNext())

                    {

                        if (secondEnded) yield break;

                        firstEnded = true;

                        iterator1.Reset();

                        if (!iterator1.MoveNext()) yield break;

                    }

                    if (!iterator2.MoveNext())

                    {

                        if (firstEnded) yield break;

                        secondEnded = true;

                        iterator2.Reset();

                        if (!iterator2.MoveNext()) yield break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

and the almost repeated code might be worth pulling out as an inner method:

                var firstEnded = false;

                var secondEnded = false;



                bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)

                {

                    if (it.MoveNext()) return true;

                    // `it` has done a full cycle; if the other one has too, we've finished

                    if (otherEnded) return false;

                    thisEnded = true;

                    // Start again, although if `it` is empty we need to abort

                    it.Reset();

                    return it.MoveNext();

                }



                while (true)

                {

                    if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;

                    if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;

                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?

I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-) — Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty. — Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection. — Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P — Nov 21 at 17:19

score 5 · Answer 3 · 2018-11-22 08:42:23Z

If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:

public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  if (first == null) throw new ArgumentNullException(nameof(first));

  if (second == null) throw new ArgumentNullException(nameof(second));

  if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));



  return InnerZipNew(first, second, resultSelector);

}



private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))

  {

    while (zipEnumerator.MoveNext())

    {

      yield return zipEnumerator.Current;

    }

  }

}

As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.

In this way you're back on the using track.

The ZipEnumerator it self could be something like:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumerT;

    IEnumerator<S> m_enumerS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_first = true;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));



    }



    public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);



    object IEnumerator.Current => Current;



    public void Dispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    public bool MoveNext()

    {

      m_enumerT = m_enumerT ?? GetTEnumerator();

      m_enumerS = m_enumerS ?? GetSEnumerator();



      if (m_first)

      {

        if (m_enumerT.MoveNext())

        {

          if (!m_enumerS.MoveNext())

          {

            m_enumerS = GetSEnumerator();

            m_secondReloaded = true;

            if (!m_enumerS.MoveNext())

              return false;

          }

          return true;

        }

        else

        {

          m_first = false;

        }

      }



      if (!m_first && !m_secondReloaded)

      {

        if (m_enumerS.MoveNext())

        {

          if (!m_enumerT.MoveNext())

          {

            m_enumerT = GetTEnumerator();

            if (!m_enumerT.MoveNext())

              return false;

          }



          return true;

        }

      }



      return false;

    }



    public void Reset()

    {

      m_secondReloaded = false;

      m_first = true;

      m_enumerT = null;

      m_enumerS = null;

      Dispose();

    }

  }

It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?

The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.

Edit

A refactored version of ZipEnumerator:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumeratorT;

    IEnumerator<S> m_enumeratorS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_isInitilized = false;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

    }



    public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);

    object IEnumerator.Current => Current;



    public void Dispose()

    {

      DoDispose();

    }



    private void RegisterDisposable(IDisposable disposable)

    {

      m_disposables.Add(disposable);

      if (m_disposables.Count > 10)

      {

        DoDispose();

      }

    }



    private void DoDispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private Func<bool> CurrentMover = null;



    private bool FirstMover()

    {

      if (m_enumeratorT.MoveNext())

      {

        if (!m_enumeratorS.MoveNext())

        {

          m_enumeratorS = GetSEnumerator();

          m_secondReloaded = true;

          if (!m_enumeratorS.MoveNext())

            return false;

        }

        return true;

      }

      else if (!m_secondReloaded)

      {

        CurrentMover = SecondMover;

        return CurrentMover();

      }



      return false;

    }



    private bool SecondMover()

    {

      if (m_enumeratorS.MoveNext())

      {

        if (!m_enumeratorT.MoveNext())

        {

          m_enumeratorT = GetTEnumerator();

          if (!m_enumeratorT.MoveNext())

            return false;

        }



        return true;

      }



      return false;

    }



    private void Initialize()

    {

      m_enumeratorT = GetTEnumerator();

      m_enumeratorS = GetSEnumerator();

      CurrentMover = FirstMover;

      m_isInitilized = true;

    }



    public bool MoveNext()

    {

      if (!m_isInitilized)

      {

        Initialize();

      }

      return CurrentMover();

    }



    public void Reset()

    {

      m_isInitilized = false;

      m_secondReloaded = false;

      CurrentMover = null;

      m_enumeratorT = null;

      m_enumeratorS = null;

      DoDispose();

    }

  }

I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-) — Nov 21 at 19:22
I think the word unconventional describes them pretty good... although ts and ss were below that level ;-] — Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public. — Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example. — Nov 22 at 7:50
@HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation. — Nov 22 at 8:09

score 8 · Answer 4 · 2018-11-23 08:33:34Z

I noticed a few things that can be improved:

Not all enumerators support Reset. Generator methods don't, for example, so calling ZipNew on the result of a ZipNew call will fail with a NotSupportedException. Obtaining a new enumerator should work, at the cost of having to replace the convenient using statements with try/finally constructions. Edit: As Eric pointed out, Reset should not be used at all. It's been abandoned.

There's no need to call Reset (or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that.

Passing null causes either an unspecific NullReferenceException or an ArgumentNullException with parameter name source to be thrown. Throwing ArgumentNullExceptions with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.

i1 and i2 can be declared inside the while loop.

Addendum:

As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using. I would take a slightly different approach by creating a repeatable enumerator class:

class RepeatableEnumerator<T> : IDisposable

{

    private IEnumerable<T> _enumerable;

    private IEnumerator<T> _enumerator;





    public bool IsRepeating { get; private set; }

    public T Current => _enumerator.Current;





    public RepeatableEnumerator(IEnumerable<T> enumerable)

    {

        _enumerable = enumerable;

        _enumerator = enumerable.GetEnumerator();

    }



    public void Dispose()

    {

        _enumerator.Dispose();

        _enumerator = null;

    }



    public bool MoveNext() => _enumerator.MoveNext();



    public bool Repeat()

    {

        IsRepeating = true;

        _enumerator.Dispose();

        _enumerator = _enumerable.GetEnumerator();

        return _enumerator.MoveNext();

    }

}

Which can then be used for both enumerables (and possibly in other extension methods as well):

public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(

    this IEnumerable<TFirst> first,

    IEnumerable<TSecond> second,

    Func<TFirst, TSecond, TResult> resultSelector)

{

    // Eager parameter validation:

    if (first == null) throw new ArgumentNullException(nameof(first));

    if (second == null) throw new ArgumentNullException(nameof(second));

    if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));



    // Local function for lazy zipping:

    IEnumerable<TResult> ZipLongestImpl()

    {

        using (var enum1 = new RepeatableEnumerator<TFirst>(first))

        using (var enum2 = new RepeatableEnumerator<TSecond>(second))

        {

            // Up-front check for empty collections:

            if (!enum1.MoveNext() || !enum2.MoveNext())

                yield break;



            while (true)

            {

                yield return resultSelector(enum1.Current, enum2.Current);



                var is1Empty = !enum1.MoveNext();

                var is2Empty = !enum2.MoveNext();

                if (is1Empty)

                {

                    if (enum2.IsRepeating || is2Empty || !enum1.Repeat())

                        yield break;

                }

                else if (is2Empty)

                {

                    if (enum1.IsRepeating || !enum2.Repeat())

                        yield break;

                }

            }

        }

    }

    return ZipLongestImpl();

}

At this point it would be a good idea to add some documentation...

Worth adding that if you're going to throw ArgumentNullExceptions, make sure they're thrown eagerly, not only when iteration has started. — Nov 22 at 7:52
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function. — Nov 22 at 8:25
Ah, so that's what you meant with a local function. If you look at what System.Linq does, they put the lazy part in a separate private method. But I don't think there's much difference between those two. — Nov 23 at 7:41
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped. — Nov 23 at 7:50
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little. — Nov 23 at 8:31

Peter Taylor 15.4k2657 · Answer 5 · 2018-11-21 14:54:59Z

    public static class Impl

    {

        public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

Names? The class would be more descriptive as something like LinqExtensions; the method something like ZipLooped.

            using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 

            using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())

            {

                var i1 = true;

                var i2 = true;

                var i1Shorter = false;

                var i2Shorter = false;

                var firstRun = true;

The iterators have useful names, but what does i1 mean? And why five variables to track the state of two iterators? IMO it would be simpler as

                var firstEnded = false;

                var secondEnded = false;



                while (true) 

                {

                    if (!iterator1.MoveNext())

                    {

                        if (secondEnded) yield break;

                        firstEnded = true;

                        iterator1.Reset();

                        if (!iterator1.MoveNext()) yield break;

                    }

                    if (!iterator2.MoveNext())

                    {

                        if (firstEnded) yield break;

                        secondEnded = true;

                        iterator2.Reset();

                        if (!iterator2.MoveNext()) yield break;

                    }



                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

and the almost repeated code might be worth pulling out as an inner method:

                var firstEnded = false;

                var secondEnded = false;



                bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)

                {

                    if (it.MoveNext()) return true;

                    // `it` has done a full cycle; if the other one has too, we've finished

                    if (otherEnded) return false;

                    thisEnded = true;

                    // Start again, although if `it` is empty we need to abort

                    it.Reset();

                    return it.MoveNext();

                }



                while (true)

                {

                    if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;

                    if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;

                    yield return resultSelector(iterator1.Current, iterator2.Current); 

                }

I notice that you've decided to yield break if either of the enumerables is empty. Would an exception be a better choice?

I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-) — Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break) if both are empty. — Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump(); The result is an empty collection. — Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per Average). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P — Nov 21 at 17:19

score 5 · Answer 6 · 2018-11-22 08:42:23Z

If you should respect that not all Enumerators implement Reset() then it is not possible to use using statements for the two IEnumerators. But you could introduce an IEnumerator<TResult> for the zipped result and use it like this:

public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  if (first == null) throw new ArgumentNullException(nameof(first));

  if (second == null) throw new ArgumentNullException(nameof(second));

  if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));



  return InnerZipNew(first, second, resultSelector);

}



private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(

this IEnumerable<TFirst> first,

IEnumerable<TSecond> second,

Func<TFirst, TSecond, TResult> resultSelector)

{

  using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))

  {

    while (zipEnumerator.MoveNext())

    {

      yield return zipEnumerator.Current;

    }

  }

}

As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.

In this way you're back on the using track.

The ZipEnumerator it self could be something like:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumerT;

    IEnumerator<S> m_enumerS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_first = true;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));



    }



    public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);



    object IEnumerator.Current => Current;



    public void Dispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      m_disposables.Add(enumerator);

      return enumerator;

    }



    public bool MoveNext()

    {

      m_enumerT = m_enumerT ?? GetTEnumerator();

      m_enumerS = m_enumerS ?? GetSEnumerator();



      if (m_first)

      {

        if (m_enumerT.MoveNext())

        {

          if (!m_enumerS.MoveNext())

          {

            m_enumerS = GetSEnumerator();

            m_secondReloaded = true;

            if (!m_enumerS.MoveNext())

              return false;

          }

          return true;

        }

        else

        {

          m_first = false;

        }

      }



      if (!m_first && !m_secondReloaded)

      {

        if (m_enumerS.MoveNext())

        {

          if (!m_enumerT.MoveNext())

          {

            m_enumerT = GetTEnumerator();

            if (!m_enumerT.MoveNext())

              return false;

          }



          return true;

        }

      }



      return false;

    }



    public void Reset()

    {

      m_secondReloaded = false;

      m_first = true;

      m_enumerT = null;

      m_enumerS = null;

      Dispose();

    }

  }

It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator itself is disposed off?

The MoveNext() method went a little more complicated than I like, so feel free to edit or suggest improvements.

Edit

A refactored version of ZipEnumerator:

  public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>

  {

    IEnumerable<T> m_dataT;

    IEnumerable<S> m_dataS;

    IEnumerator<T> m_enumeratorT;

    IEnumerator<S> m_enumeratorS;

    List<IDisposable> m_disposables = new List<IDisposable>();

    Func<T, S, TResult> m_selector;

    bool m_secondReloaded = false;

    bool m_isInitilized = false;



    public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)

    {

      m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));

      m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));

      m_selector = selector ?? throw new ArgumentNullException(nameof(selector));

    }



    public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);

    object IEnumerator.Current => Current;



    public void Dispose()

    {

      DoDispose();

    }



    private void RegisterDisposable(IDisposable disposable)

    {

      m_disposables.Add(disposable);

      if (m_disposables.Count > 10)

      {

        DoDispose();

      }

    }



    private void DoDispose()

    {

      foreach (IDisposable disposable in m_disposables)

      {

        disposable.Dispose();

      }

      m_disposables.Clear();

    }



    private IEnumerator<T> GetTEnumerator()

    {

      var enumerator = m_dataT.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private IEnumerator<S> GetSEnumerator()

    {

      var enumerator = m_dataS.GetEnumerator();

      RegisterDisposable(enumerator);

      return enumerator;

    }



    private Func<bool> CurrentMover = null;



    private bool FirstMover()

    {

      if (m_enumeratorT.MoveNext())

      {

        if (!m_enumeratorS.MoveNext())

        {

          m_enumeratorS = GetSEnumerator();

          m_secondReloaded = true;

          if (!m_enumeratorS.MoveNext())

            return false;

        }

        return true;

      }

      else if (!m_secondReloaded)

      {

        CurrentMover = SecondMover;

        return CurrentMover();

      }



      return false;

    }



    private bool SecondMover()

    {

      if (m_enumeratorS.MoveNext())

      {

        if (!m_enumeratorT.MoveNext())

        {

          m_enumeratorT = GetTEnumerator();

          if (!m_enumeratorT.MoveNext())

            return false;

        }



        return true;

      }



      return false;

    }



    private void Initialize()

    {

      m_enumeratorT = GetTEnumerator();

      m_enumeratorS = GetSEnumerator();

      CurrentMover = FirstMover;

      m_isInitilized = true;

    }



    public bool MoveNext()

    {

      if (!m_isInitilized)

      {

        Initialize();

      }

      return CurrentMover();

    }



    public void Reset()

    {

      m_isInitilized = false;

      m_secondReloaded = false;

      CurrentMover = null;

      m_enumeratorT = null;

      m_enumeratorS = null;

      DoDispose();

    }

  }

I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-) — Nov 21 at 19:22
I think the word unconventional describes them pretty good... although ts and ss were below that level ;-] — Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world, tt and ss on the other hand were public. — Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the yield return lazy evaluation. Example. — Nov 22 at 7:50
@HenrikHansen I saw that. The problem is that if a method is using the yield method of returning a IEnumerable, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation. — Nov 22 at 8:09

搜尋此網誌

Htykuut

Custom implementation of the linq Zip operator for different length lists

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

Dijon

Sphinx de Gizeh

xlwings: Save and Close

Custom implementation of the linq Zip operator for different length lists

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Dijon

Sphinx de Gizeh

xlwings: Save and Close

3 Answers
3

3 Answers
3

3 Answers
3