-
Notifications
You must be signed in to change notification settings - Fork 38
LINQ and Collections
Often programming involves processing many objects of the same type. For example: iterate through an array of people to find their average height, find a minimum number in an array, order an array of people by their age and so on. You already know how to do all that in a loop. However, loop operations are often redundant. C# is a good language, because it avoids such redundancies. In order to simplify processing of multiple objects- we use LINQ.
In order to understand LINQ well, you will need to have foundational knowledge to collections: List, Dictionary and IEnumerable. Also, generics, LINQ and extension methods- are integral part of it all and thus we will go through it as well.
Collections are data structures specifically designed to act as a container for items. Most collections will work just for 1 type (for example a collection of int, of string or of Person).
3 most common collection types are: List, Dictionary and IEnumerable.
List is very similar to an array.
When we create a new List, we need to specify a type (a list of what exactly?). For example:
List of ints:
var numbers = new List<int>();or
List of Persons:
var people = new List<Person>();The key difference between a List and an array is- it can expand dynamically without the need of re-creating it. The key methods on a list are Add and Remove.
Add a single element:
numbers.Add(2);Remove a single element:
numbers.Remove(2);or
numbers.RemoveAt(0);Accessing elements of a list is no different than accessing array- it's done using [index].
If you want to wipe the list, but keep it created- you can call Clear().
Lastly, List supports object initialization. If you want to create a prepulated list, you can:
var numbers = new List<int>{1,2,3};This create a List of int with elements 1, 2 and 3 in it.
When you have the most basics needs of adding or removing elements in a collection.
Prefer this over an array.
Dictionary is similar to a list in that it grows and shrinks on demand. However, Dictionary, as the name implies, is used for lookups.
Instead of storing 1 item at a time, Dictionary stores 2: key and value. Therefore, Dictionary is like a List of key-value pairs.
For example, let's create a Dictionary of people. Key- is person's id:
var people = new Dictionary<long, Person>();Person class:
class Person
{
public long Id{get;}
public string Name{get;}
public DateTime Bday{get;}
public Person(long id, string name, DateTime bday)
{
Id = id;
Name = name;
Bday = bday;
}
} Create 2 people:
var tom = new Person(2, "Tom", new DateTime(1997,07,11));
var jane = new Person(5, "Jane", new DateTime(2000, 01, 20));Add them to a Dictionary:
people.Add(2, tom);
people.Add(5, jane);Or create a Dictionary with 2 people in it:
var people = new Dictionary<long, Person>
{
{2, tom},
{5, jane}
}Please note that unlike an array or a List, Dictionary does not have a generated index as we add elements to it. After all, there is no need- we get items by their key. Key acts as an index. Accessing element from a Dictionary can be done:
var tom = people[2];Often, we need to search some object by its identifying property. What happens when we perform such a search in a List or an array? - we do a for loop. If such searches are often, Dictionary will speed things up significantly, because there is no need of a loop- we just specify a key and immediately get a value back.
The only constraint in a Dictionary- no duplicate keys.
What if the only thing we want to do with a collection of objects- for-each them and do something as we go through the loop? How to make such an intention explicit?
IEnumerable is an abstraction type for all collections- it is compatible with an array, a List, a Dictionary and anything else that contains many elements. In other words- all collections implement IEnumerable.
For example, instead of an array, we can write:
IEnumerable<int> numbers = new int[]{1,2,3}; or
IEnumerable<int> numbers = new List<int>{1,2,3};The resulting IEnumerable<int> will be exactly the same in both scenarios. You cannot use an index to access elements in IEnumerable- that's an implementation detail, but you can iterate through the collection:
foreach(var number in numbers)
{
// do something with a number.
}Prefer to use IEnumerable over a concrete collection type when you return it from a public method, when the intention is- iterate through. That way you ensure nice encapsulate not revealing the underlying implementing collection type and minimize the impact of changes.
Another beatiful quality of IEnumerable- it hides implementation details so well, that in fact under it there might be no collection at all! It loads elements on demand and it's even possible to have elements returned on demand, rather than having them in memory. This is called lazy lading. For example:
// Returns 1 number at a time
public IEnumerable<int> GetNumbers()
{
// return just 1
yield return 1;
// don't keep 1 in memory- return just 2
yield return 2;
// don't keep 2- return just 3
yield return 3;
}In a loop like this:
foreach(var number in GetNumbers())
{
// using 1 number at a time.
// Memory is only allocated for 1 of them at a time.
}You may have already asked yourself: "what is <> after a type or method?". That's a great question! It specifies a generic type. Just like objects and primitives can be passed as arguments, you can also pass a type, specifying of what things are. Once again, List<int> is a generic list of integers. Generics, because if we want a different type, all we have to do is to change what is in between <>.
We can have generics classes or methods.
A generic method allows changing a type of input or an output. What if we wanted to print a collection with a single method? What if we wanted to apply that for every single collection, regardless of type?
We could:
public static string ToStringMany<T>(IEnumerable<T> items)
{
var sb = new StringBuilder();
foreach(var item in items)
{
sb.AppendLine(item.ToString());
}
return sb.ToString();
}Example uses:
// Returns:
// 1
// 2
// 3
var numbers1 = ToStringMany(new int[]{1, 2, 3});
// Returns:
// Tom
// Bill
var names = ToString(new List<string>{"Tom", "Bill"});Generics on a class are very similar to a method. Key difference is that whatever generic is on a class- can be used for any other method in it.
For example, let's make a class Cup. Naturally, we start with a question- it's a cup of something. Of what? Of Liquid.
Let's create a bunch of Liquids. First, a base class:
public abstract class Liquid
{
public float Amount {get;}
public Liquid(float amount)
{
Amount = amount;
}
}public class Tea : Liquid
{
public Tea(float amount): base(amount){}
}public class Vodka : Liquid
{
public Vodka(float amount): base(amount){}
}Let's make 2 cups: one that holds tea-only and another with vodka-only. Without generics, we would have 2 classes:
CupOfTea:
public class CupOfTea
{
public float Current {get; private set;}
private readonly float _capacity;
public CupOfTea(float capacity, float current)
{
Current = current;
_capacity = capacity;
}
public void Add(Tea liquid)
{
Current += liquid.Current;
// overflow is okay, it-s a cup- it won't explode.
if(Current > _capacity)
{
current = capacity;
}
}
public bool IsFull()
{
return Math.Abs(capacity - current) < 0.001f;
}
}CupOfVodka:
public class CupOfVodka
{
public float Current {get; private set;}
private readonly float _capacity;
public CupOfVodka(float capacity, float current)
{
Current = current;
_capacity = capacity;
}
public void Add(Vodka liquid)
{
Current += liquid.Current;
// overflow is okay, it-s a cup- it won't explode.
if(Current > _capacity)
{
current = capacity;
}
}
public bool IsFull()
{
return Math.Abs(capacity - current) < 0.001f;
}
}If you are bothered by these 2 classes- you're not alone. They are exactly the same, except for a type of liquid goes. With our knowledge from previous lesson, we could accept a base liquid type and would be close to what we need- but that still would pose as a problem to not allow mixing different types of liquids together.
This is where the generics shine. When you need a concrete type (either as a part of requirements, specific return type or similar)- use generics. The two cup classes can be refactored into a generic class:
public class Cup<TLiquid>
{
public float Current {get; private set;}
private readonly float _capacity;
public CupOfVodka(float capacity, float current)
{
Current = current;
_capacity = capacity;
}
public void Add(TLiquid liquid)
{
// Does not work!
// TLiquid does make it liquid.
// It can be anything!
Current += liquid.Current;
// overflow is okay, it-s a cup- it won't explode.
if(Current > _capacity)
{
current = capacity;
}
}
public bool IsFull()
{
return Math.Abs(capacity - current) < 0.001f;
}
}Using a generic type in a class, allows all methods and fields in it to make use of that generic type consistently. Regardless of this nice feature, we still have one problem to solve- how to specify that the generic type used is in fact a liquid?
Generic constraints answer this question. Using a where keyword we can limit the amount of possible types in a generic. In our case, we want only Liquid. Therefore, we can change the class declaration to this:
public class Cup<TLiquid> where TLiquid : Liquid
{
// other code is the same...
}Et Voalia! It works :)
Let's test the two cups:
var cupOfTea = new Cup<Tea>();
var cupOfVodka = new Cup<Vodka>();
cupOfTea1.Add(new Tea(10)); // works
cupOfVodka.Add(new Tea(50)); // does not work- can't fill tea in a cup of vodka. Constraint failed.Once again, you could have done the same with polymorphic functions using a base type. However, a validation for matching type of liquid would be possible only during runtime, rather than what we have right now- compile time.
You can apply multiple generics constraints on a class as long as you separate them with a comma (,). And a single class can have multiple generic arguments. But with every arg you increase the complexity- so don't overdo it.
Dictionary is a good example of a generic class with multiple generic types:
public class Dictionary<TKey, TValue>
{
// ...
}Factory classes often initialize generic types assigning specific values to them. On top of that- those generics are usually non-primitives. new- is a constraint for something that has a default ctor. class- requires a generic type to be a class. new constraint must go last.
public class Factory<T> where T: class, new
{
//...
}Delegate- is a function as an argument (other known as pointer to a function). It allows passing functions themselves around as if they were normal variables.
C# has delegate keyword for declaring custom delegates, but this is hardly ever used, because we have delegates premade for the most common scenarios. All of those delegates have generic versions of them.
Action- is a delegate with return type void. Generic action specifies the types of arguments that will go to that action.
For example, function with return type void, without any arguments:
Action printHello => Console.WriteLine("Hello");Please note that instead of printHello = we wrote printerHello =>. => is a symbol for lambda- a function defined on the fly. If lambdas weren't supported, we would have to define our own function which takes no arguments and prints hello- then we would be able to assign that to printHello.
Calling a delegate is the same as calling any other function: printHello().
Action can have up to 17 arguments. They are specified with generic parameters like this:
Action<T> print = (text) => Console.WriteLine(text); Calling print("Hello") will print Hello. Please note the (text) => part. That's how you specify an argument to a lambda. It picks up a type of an argument based on a generic in declaration.
Func is a delegate with a return type. A return type is specified through a generic type, therefore Func, unlike Action will always go with at least 1 generic type. The last generic type in a Func is a return type.
For example:
Func<int> get1 = () => 1;Calling get1() will return 1.
Another example of a func could be sorting. Sorting can be done in multiple different algorithms, but the result is always the same type that was passed. Therefore:
Func<IEnumerable<int>,IEnumerable<int>> sortNumbers = (unsorted) =>
{
// implement sort
return sorted;
} With {} we can have more than single-line lambda. Calling sortNumbers(unsorted) will return a sorted IEnumerable of numbers.
Predicate is the final kind of premade delegates. It's a Func which has a return type of bool. For example:
Predicate alwaysTrue = () => true;Calling alwaysTrue() will return true.
And once again, specifying generic types will mean specifying the types for arguments that go in a Predicate.
A good example of that is filtering:
Predicate<Person> isAdult = (person) => person.Age >= 18; Calling isAdult(tom) will return true or false based on the age.
Delegates are similar to interfaces. They add a layer of abstraction, they don't contain implementation and you can swap them out at will. They do, however, are a bit more complex due to the nature of lambdas. For simple operations- prefer to use them over an interface.
Many foreach loop operations are redundant: filter, count, find first element by some property... Like most things redundant things, in C#- they are reduced to a minimum. LINQ (Language Integrated Query) solves this problem. It gives a set of methods that simplify such operations involving a loop.
If you understand lambdas- you understand LINQ. LINQ usually take a lambda as an argument and then applies that lambda for every member in a collection.
Let's illustrate LINQ using this sets of numbers:
int[] numbers = {1, 2, 1, 3, 1};The most typical scenario of LINQ- filtering. This is done using a Where keyword.
For example, find all numbers which are are equal to 1:
var numbesEqualTo1 = numbers.Where(number => number == 1);It's worth emphasising- LINQ almost always returns IEnumerable. This is because if you settle working with just 1 type for all the different operations on a collection- you are considerably simplifying things. After all, it's easier to understand 1 type rather than trying to understand 2 or more.
There are cases when we want to get an object from a collection, but we are fine not finding one as well. FirstOrDefault comes to rescue. It returns the first item in collection which return true after a lambda is applied.
For example, let's look for 4 in our numbers collection:
var number4 = numbers.FirstOrDefault(number => number == 4);Will return 0, because 4 does not exist in a collection.
Any is used for checking if there are any items in a collection that fits some predicate.
var any4s = numbers.Any(number => number == 4);Will return false, because no 4s are in the collection.
Count is used to count how many items in a collection meet a given condition.
For example, to find how many 1 are there, you can:
var count1s = numbers.Count(number => number == 1);This returns 3.
Sometimes we might want to map every member of a collection to another value or map to an object. This can be done using Select.
For example, let's return every number squared:
var squaredNumbers = numbers.Select(number => number * number);Lastly, the last, yet very common scenario is sorting. Use OrderBy when you want to sort items in a collection by some property.
For example, we can sort the numbers collection like this:
var sortedNumbersInAsc = numbers.OrderBy(n => n);If you want to sort in reverse order, you can:
var sortedNumbersInDesc = numbers.OrderByDesc(n => n);Please note that if you had a non primitive, for example an object, the sorting would look a bit different. For example. Sorting people by their birthdays:
var poeopleSortedByBday = people.OrderBy(p => p.Bday);Last note- in some cases within a lambda we use a single letter, in other cases a full word. It doesn't matter which approach you choose, because the lambda argument is just a placeholder. However, staying consistent works the best, so choose either 1 appraoch. You could also use a single letter for long arg names in lambda or when it's crystal clear (simple) lambda.
Extension Method- is a special kind of method which allows adding behavior to an existing class and call it as if it belonged there.
Extension Method- is a static method where the first argument is decorated with this keyword. Also, all extension methods must be in a static class.
Previously, we wrote a static method to return all elements in a collection as a string. This can be converted to an extension method:
public static class CollectionExtensions
{
public static string ToStringMany<T>(this IEnumerable<T> items)
{
var sb = new StringBuilder();
foreach(var item in items)
{
sb.AppendLine(item.ToString());
}
return sb.ToString();
}
}Nothing changed in this method, other than a this keyword and other than the fact that it is now inside a static class. The benefit: we can now convert any collection to a string and will get all items converted to a string, rather than a type of that collection. Calling the extension method now becomes more direct- numbers.ToStringMany().
We managed to use an extension method and apply on every collection. Doesn't it remind you anything? That's right, it must look awfully familiar to LINQ.
LINQ is built on top of extension methods. Every LINQ method is just an extension method on top of IEnumerable<T>.
Since most LINQ methods return IEnumerable<T>, that means you can fluently chain most of them.
For example:
numbers
.Where(number => number % 2 == 0)
.Select(number => number * number)
.Print(); // let's assume this extension method existsThis code will filter only even numbers, then square then and print all the results. Fluent, elegant, readable, concise- that's LINQ for you :)
Are you implementing business logic or is it something more general? If it's business logic, it often does not fit as a part of an existing type. However, if it is a general extension with a feature that you always wanted on a type- go ahead and make an extension method for it.
- Good: list.Sort(lambda), list.Print(), list.Average(lambda)
- Bad: list.GetCustomerXRatio(), person.Move(), ...
TBD
LINQ can be written in two ways. So far, all the examples used method syntax. It's worth exploring an alternative way of writing LINQ- Qeury syntax.
Both compile to the same thing. This GitHub repository compares the two.
TBD
Refactor Homework 2 to use only LINQ and only in Extension methods.
- When should you prefer an array over a list?
- When should you use a dictionary?
- What's the purpose of IEnumerable?
- "You should always return IEnumerable on every method that returns a collection." What do you think?
- What's the difference between a polymorphic function and a generic function?
- What is a generic class?
- When to use generics?
- What's the difference between a delegate and a lambda?
- What are the 3 main delegates in C#?
- What does an interface and a delegate have in common?
- What is LINQ used for?
- Give at least 3 examples of a practical LINQ application
- How does LINQ work? What's the basis of it?
- Can you chain LINQ operations?
- What's the purpose of a
yieldstatement and how is that related toIEnumerable?
Fundamentals of practical programming
Problem 1: International Recipe ConverterLesson 1: C# Keywords and User Input
Lesson 2: Control Flow, Array and string
Lesson 3: Files, error handling and debugging
Lesson 4: Frontend using WinForms
RESTful Web API and More Fundamentals
Problem 2: Your Online Shopping ListLesson 5: RESTful, objects and JSON
Lesson 6: Code versioning
Lesson 7: OOP
Lesson 8: Understanding WebApi & Dependency Injection
Lesson 9: TDD
Lesson 10: LINQ and Collections
Lesson 11: Entity Framework
Lesson 12: Databases and SQL