[Feature Request] Suggest Linq simplifications

There are plenties of situations where Linq queries could be simplified.

For example:

var foo = bar.Select(x => y).ToArray();
// simplified in
var foo = bar.ToArray(x => y);

var foo = bar.Select(x => y).Count();
// simplifed in
var foo = bar.Count(x => y);

Then, there is the naive redundancies:

var foo = bar.ToArray().ToArray();
// simplified in
var foo = bar.ToArray();

var foo = bar.ToList().ToArray();
// simplified in
var foo = bar.ToArray()

Then, there are calls to Count() that do not have good perf (did the check, perf is poor), that should usually be replaced by

var foo = mylist.Count();
// simplified in
var foo = mylist.Count;

var foo = myarray.Count();
// simplify
var foo = mylist.Length;


It would be really nice if R# was adding some deeper analysis for Linq queries.

Best regards,
Joannes Vermorel
Lokad Sales Forecasting

7 comments
Comment actions Permalink

I'm a bit sceptical about your Count() suggestions - Count() contains code specifically to optimise for these sort of cases (if they're what I think they are).

I suspect array.Length might still win on a for() loop termination expression but, in general, Count() isn't linearly stepping through collections where it doesn't need to.

How did you test the count() performance change, and how big a change was it?

0
Comment actions Permalink
I'm a bit sceptical about your Count() suggestions - Count() contains code specifically to optimise for these sort of cases (if they're what I think they are).


Oups, sorry, you are right on that one. It was the Last() that got us terrible performance. Although, I would guess that Count() is about 10x slower than .Length; but well, it might not be relevent enough to suggest a refactoring.

Best regards,
Joannes Vermorel

0
Comment actions Permalink

Jon Skeet just wrote an article on a similar subject - did you see it:  http://msmvps.com/blogs/jon_skeet/archive/2010/02/10/optimisations-in-linq-to-objects.aspx

Your 'last' case is interesting - be nice to know exactly what is taking the time in there, as it may have wider implications.

Alternatively it might just be an unrealistic micro-performance issue with no implications in the real world...  I can't see the code you used to repro that - did you call 'Last' several billion times?

0
Comment actions Permalink
I can't see the code you used to repro that - did you call 'Last' several billion times?

Sorry, can't remember, but yes, that was idea (although more proabably a million time rather than a billion).

Then, Linq has been designed for querying databases, but people like us end up primarily using Linq2objects for algorithmic purposes; and those constant factors while negligible for DB accesses, tend to badly hurt in CPU intensive apps.

0
Comment actions Permalink

This does sound like it would be a nice feature but I have to imagine the scope of this would be very large aside from removing duplicate casting ones like the ToList().ToArray().

0
Comment actions Permalink

I've just been profiling this - the time is all spent in the IList<T> cast which Enumerable.Count() does.

Interestingly enough, casting to IList rather than IList<T> roughly doubles the performance, but there's still expensive boxing (at least for an array of ints) because you have to cast the IList[] result to T before returning it.

More interestingly though, adding

            TSource[] arr = source as TSource[];
            if (arr != null)
            {
                int count = arr.Length;
                if (count > 0)
                {
                    return arr[count - 1];
                }
            }

Before the IList<T> cast takes the performance back to roughly what you get from a direct arr[arr.len-1] implementation.

So it's not casting per se which is expensive, it's casting arrays to IList<T>s.

I will add a comment to your Connect bug, though obviously nothing's going to happen to .NET4 now.    Of course, this is an optimisation for arrays and probably a pessimisation for ILists (I haven't checked).  There is a school of thought that arrays are an obsolete type, so I doubt anyone at MS will be implementing this.  But it would be useful if you wanted to do a 'Fast Linq to objects' for specialised cases.

0
Comment actions Permalink

Will, thanks a lot for your follow-up on that one.

0

Please sign in to leave a comment.