Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFR] Eager loading support of Phalcon\Mvc\Model #1117

Closed
hilyjiang opened this issue Aug 19, 2013 · 79 comments
Closed

[NFR] Eager loading support of Phalcon\Mvc\Model #1117

hilyjiang opened this issue Aug 19, 2013 · 79 comments

Comments

@hilyjiang
Copy link

hilyjiang commented Aug 19, 2013

Once we use model, we always define relationships between models.

In most cases, we need to implement a list view, displaying objects with its foreign key field's value.

This will cause a N+1 selects problem (http://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem), slowing down the performance.

So I think it is necessary to implement the eager loading feature of Phalcon\Mvc\Model.

@joeyew
Copy link

joeyew commented Aug 28, 2013

+1

2 similar comments
@Raistlfiren
Copy link

+1

@ghost
Copy link

ghost commented Dec 4, 2013

+1

@ovr
Copy link
Contributor

ovr commented Dec 4, 2013

Your can use leftJoint at another ORM but in phalcon leftJoin relations isn`t selected to object only run in sql #1111?

@nazieb
Copy link

nazieb commented Apr 15, 2014

+1

1 similar comment
@kamote
Copy link

kamote commented Apr 20, 2014

+1

@olegrom32
Copy link

What's the status of this one? Or do I have to give up phalcon?..

@dcwq
Copy link

dcwq commented Sep 18, 2014

+1

2 similar comments
@mrthebob
Copy link

+1

@calinrada
Copy link

+1

@dexx086
Copy link

dexx086 commented Sep 22, 2014

+1

Can we expect anything to happen with it? It's quite a basic feature I think, it should be handled with high priority if I may categorize its severity.

But at least a tiny feedback would be good about it to know if we can expect it to be implented or should learn a different ORM. Because without this feature this ORM handles only separate entities, not entities with relations. Lazy loading is only a tiny help but doesn't care about data integrity and mostly not about performance. But on the other hand, Phalcon is so good, should make its ORM to be usable as well, big +1 vote for this feature.

@boedy
Copy link

boedy commented Sep 22, 2014

+1

1 similar comment
@AndreBaumeier
Copy link

+1

@boedy
Copy link

boedy commented Sep 24, 2014

Is there an alternative to solving this N+1 problem?

@AndreBaumeier
Copy link

I could think of creating stubs and populating explicitly joined data in those stubs, for other unavailable attributes I'd fall back to lazy loading/ load them once requested.

@tanagorns
Copy link

+1

5 similar comments
@giftymalik
Copy link

+1

@scarbo87
Copy link

scarbo87 commented Oct 7, 2014

+1

@titus-toia
Copy link

+1

@fatihkurt
Copy link

+1

@mapirelli
Copy link

+1

@fsieduc
Copy link

fsieduc commented Jan 19, 2015

+10

@noelmrnd
Copy link

noelmrnd commented Feb 5, 2015

+1

This is the only workaround I could find:

$rows = $this->modelsManager->createBuilder()
    ->from(array(
        'Persona'=>'Personas', // model: Personas, alias: Persona
        'Grupo'=>'Grupos' // model: Grupos, alias: Grupo
    ))
    ->where('Persona.grupo_id=Grupo.id')
    ->getQuery()
    ->execute();

foreach($rows as $row) {
    $persona = $row->Persona;
    $grupo = $row->Grupo;
}

@fsieduc
Copy link

fsieduc commented Feb 6, 2015

Many Thanx

@erickskrauch
Copy link

Is there at least some news regarding this functional? It really must have functional.

@Surt
Copy link
Contributor

Surt commented Apr 8, 2015

+1
phalcon really needs it

@erickskrauch
Copy link

2.0 is released. Please add this functionality, it is the only thing that spoils the impression from working with the framework. This problem for 2 years (in fact).

@erickskrauch
Copy link

@andresgutierrez, yes, it can also be done, but that would be correct if I do something:

$comments = Comments::query()->with(['author', 'author.skin'])->...;
foreach($comments as $comment) {
    echo $comment->text;
    echo $comment->author->nickname;
    echo $comment->author->skin->faceUrl;
}

At least in Yii2 it is applied such method than to access relations through StdClass.

@sszdh
Copy link

sszdh commented Aug 10, 2015

I think the age of using DEEP RELATIONs an also eager loadig is done!

We should think different. Redesign UXs an data manipulations...

Eager loading has so much disadvantages nowadays!

Sorry but that is my POV.

@andresgutierrez
Copy link
Contributor

@erickskrauch But, if you only need $comment->text why do you have to fetch the full Comments instance, the full Author instance? Are we writing high performance applications or are we wasting memory fetching and hydrating data that we aren't using at all?

@erickskrauch
Copy link

@andresgutierrez, I just show you example of real using eager loading in real project.

Many people are puzzled, using PhalconPHP, how to get one select for model and related data. Perhaps in PhalconPHP a different approach to it (as shown by you through PHQL), but no such examples in the documentation.

On the other hand Yii (Yii2), Laralev, etc. provide a method with() and the developers would like to have this in PhalconPHP. Otherwise would not have been much of an issue here on github.

@ghost
Copy link

ghost commented Aug 10, 2015

Eager loading in laravel and yii isn't real eager loading if you look at the database logs. I always see eloquent make two queries. One for the base model, then another for the related models via an IN(). What would really be nice in Phalcon is to have the same syntax, but that syntax be a join on a relationship. We would, of course need to specify columns, but I feel like this grants the best mix of utility and performance. We hydrate the result set as instances of the base model with the related columns attached via properties. I.e. I join do Robot->with("parts",["name"]) I would be able to reference that as a result by $robot->parts->name. That's a bad example because it's a many to one, and that's really the thought behind a one to one.

@Surt
Copy link
Contributor

Surt commented Aug 11, 2015

@andresgutierrez usually the eager loading is done through multiple queries, with the use of IN

Eager loading

$results = User::with('avatar', 'posts')->find_many();
//will use 3 querys to fetch the users and the relationships:

  SELECT * FROM user
  SELECT * FROM avatar WHERE user_id IN (....)
  SELECT * FROM posts WHERE user_id IN (....)

// so you already have all the data in place, ready to be used

  foreach($results as $result){
      echo $result->avatar->img;
      foreach($result->posts as $post){
         echo $post->title;
      }
  }

Working over the example: If your abstraction sets the indexes of the users resultset as the primary key of the row you just need to use array_keys to generate the second query. Voila, you have the first depth for each relationship with a simple array_keys

Of course, you can calculate the number of queries if I have 100 users with 100 avatars and 100 post each one. With eager loading there is only 3 queries. And there is no memory wasted, I need that data, I will show it on a Json or a web page.

The advantages are what I expect of programming: to go through already implemented abstractions so I don't need to rewrite it every time. In this case I prefer to define wich relationships I want to fetch and I want to fetch them easily, with the less queries to database and having them in memory to use them when I want.

Imagine if I want to get users with posts and from each post the tags related.... In phalcon I would need to write 3 foreachs to fetch the relationships. Not to mention deeper relations.
Lazy loading does not give me any advantages if for example I want to return a JSON from the last example. I want that data without the need of writing the foreachs, I want a clean interface to specify what I need and of course, I need it when I ask for it, I don't want to ask for it on each iteration.

@erickskrauch But, if you only need $comment->text why do you have to fetch the full Comments instance, the full Author instance? Are we writing high performance applications or are we wasting memory fetching and hydrating data that we aren't using at all?

Of course eager loading is to use it as another tool, it does not eliminate the need to think. In the case you expose maybe you don't need to use it. (but imagine that the comments entity does have any kind of domain logic or behaviour needed to get the text... then the use of eager will be an advantage.

On Web each time I need data I know exactly wich data I need, let me post a quote from here http://blog.bemycto.com/software-architecture/2015-05-17/doctrine-orm-not-suited-php/

Lazy Loading is a nonsense
In a stateless environment, Lazy Loading data is a bad practice. Nothing to say more. There may be some cases where it could be slightly more performant to use it, but there are edge cases.

Eloquent does it in a great way: http://laravel.com/docs/5.1/eloquent-relationships#eager-loading and if you read the documentation you can understand the need of any ORM to have it. Nested eager loading is where the abstraction really shines.

Of course, there is advantages on lazy loading for other purposes as background process with high memory consumption.

@andresgutierrez There is a great implementation here https://github.com/stibiumz/phalcon.eager-loading althought it still needs more developing and would be great to sync with phalcon to solve some drawbacks.

@sszdh disadvantages? I don't understand what are the disadvantages of making less database queries through an intelligent management of the database. Let me say that when I work with objects I hate to work with lower abstractions like arrays or database queries. (usually that's my problem with legazy code). The same paradigm I use when I need to work with sql or PHQL.... I don't want data, I want entities with behaviours and domain logic not arrays that I need to fix on each iteration.

P.D. sorry for the rush :p I'm really used to eager loading and I would like to see it on phalcon, because when you use it and understand it you know when to make use of it and you need it. It is just another tool and would be great to have it.

@erickskrauch
Copy link

@Surt, +1

@alrik11es
Copy link

@Surt, +1 (Clap, clap)

@valVk
Copy link

valVk commented Aug 11, 2015

CakePHP has params called "recursive" for this.

Correct me if I'm wrong.

does this code produces selects in db?

foreach ($this->Robots as $robot) {
    foreach ($robot as $parts) { // SELECT * FROM parts WHERE robot_id = $robot->id
         foreach ($parts as $part) {
               echo $part->name;
         }
    }
}

@andresgutierrez
Copy link
Contributor

IN queries

@Surt That implementation is only suitable for small/medium resultsets (maybe large depending on the values). If you fetch the parent records (in this case users), then you have to traverse this resultset to obtain the "unique" user IDs (found across few, dozens, hundred, thousands or records?). Then these IDs must to be bound to the IN operator to obtain the list. This means keep in memory the list of IDs, then create and send a large SQL statement to the database (ie. using 50 values):

SELECT `robots_parts`.`id`, `robots_parts`.`robots_id`, `robots_parts`.`parts_id` 
FROM `robots_parts` WHERE `robots_parts`.`robots_id` IN (:APR0, :APR1, :APR2, :APR3, :APR4, :APR5, :APR6, :APR7, :APR8, :APR9, :APR10, :APR11, :APR12, :APR13, :APR14, :APR15, :APR16, :APR17, :APR18, :APR19, :APR20, :APR21, :APR22, :APR23, :APR24, :APR25, :APR26, :APR27, :APR28, :APR29, :APR30, :APR31, :APR32, :APR33, :APR34, :APR35, :APR36, :APR37, :APR38, :APR39, :APR40, :APR41, :APR42, :APR43, :APR44, :APR45, :APR46, :APR47, :APR48, :APR49)

While I'm pretty sure this should be faster than individual SELECTs, some database systems start having problems with thousands of values passed in an IN operator:

A database like PostgreSQL always create a temporary table to store the values in the IN list to make the comparison using an inner join (which seems to be a good idea). A database like Oracle transform the values in OR comparisons and also it doesn't allow more than 1000 items in an IN clause (to fix it requires split the values into ORs of 1000 items each one).

This variation in behaviors in which large list of values in an IN operator is handled lead me to think
that the behavior on large resultsets could be unpredictable in terms of performance or functionality.

Caching

Regarding caching, eager loading always retrieves the values from the database as I see. There's no way to cache related records (unless we think up a way to do that) and retrieve data from the database engine instead of a cache backend (Redis, Memcached, APC, etc) can impact performance (or at least it loses the opportunity to improve it). For instance a customer has a location (id, city, country, etc) this is stored in a second table:

$robots = Customers::query()
                      ->with("location") // always hit the database
                      ->execute();
foreach ($customers as $customer) {
   // ...
}

Locations never change and they could be stored in a fastest cache backend reducing unnecessary access to relational databases. Only the parent record can be cached, if a caching policy is attached to the query it does not necessary apply to the related records.

Hydration

In eager loading, related records are fully fetched in memory, this means every request needs to allocate more memory:

$users = Users::query()
                      ->with("posts") 
                      ->execute();
foreach ($users as $user) { // fetch one user from the database + n posts
    foreach ($user->posts as $post) { 
       // ...       
    }
}

In the other hand using lazy loading we have just one user + one post at the same time in memory:

$users = Users::find();
foreach ($users as $user) { // fetch one user from the database
    foreach ($user->posts as $post) { // fetch one post from the database
       // ...
    }
}

Of course, this depends on the number of related records fetched in associated relationships. Just added this part to know if you are aware of this.

My conclusion

I can go ahead and implement this feature using IN queries however it will not be perfect and a solution to everyone's problem. As I mentioned before, IMHO, the best option is always fetch exactly what you need using PHQL or the QueryBuilder (currently available in Phalcon).

@Surt
Copy link
Contributor

Surt commented Aug 12, 2015

@valVk if I don't remember it wrong CakePHP fetch relationships with eager loading by default, unless you specify lazy loading, so, in your example it does not produce a query for each iteration. In the other hand it uses JOINS instead of IN when it's possible, but I think that it's really difficult to accomplish in terms of complexity.

@andresgutierrez

IN queries

Usually on web applications or APIs there is a need for pagination of results. If you make a query with thousands or records you probably are doing something wrong.
By the way, the transverse across the resultset to obtain the ID is the SAME transverse you do with lazy loading when you want to access to a relationship for the model. So no diference in time here. (except that you will end with just 1 query for each relationship instead of thousands.) All we are doing is to hide the foreachs as a framework implementation. Of course, at the cost of memory.

Hydration

In the other hand using lazy loading we have just one user + one post at the same time in memory

Most of the time, on web pages applications you will take those user + post and stash into a variable to use on the template, or for example, to write a JSON. So in the end your memory consumption is the same => all users with posts on a variable.
Of course you can transverse the models on the template but then probably you will end with domain logic inside templates.

On the other hand I'm really afraid when I look at the Database queries log and I see thousand of queries. I'm afraid of responsiveness, concurrency...

Actually, the https://github.com/stibiumz/phalcon.eager-loading (added to the phalcon incubator) is doing a great job, but it can be merged with phalcon to fully develop its potential. For example, if I don't remember it wrong, the new relationship with conditions on the definition does not work with the eager loading functionality)

@williamokano
Copy link

In my case I was writing a API using phalcon and using the "concept" of resources, which is a model on Phalcon.

On my approach, I had the following syntax to fetch a resource:

http://ENDPOINT/resource_name?parameters

As the resources have the same superclass (Phalcon\Model) I can write this to fetch the data dynamically using a generic code, something like that

public function handleResource($resourceName)
{
$resource = new $resourceName($someAdditionalParam);
$resource-> ...... (get the query from the parameters in url)
echo json_encode($resource->toList());
}

This is just an example and doesn't reflect actual phalcon code!

Now, my problem is: lets suppose I need to fetch the Books resource, but I only need the books that was written by George R. R. Martin, so my URL should be something like that

http://ENDPOINT/books?q=author.name=George%20R.%20.%20Martin

I know I could do something like Author::FindFirst('name=blablalba')->books, but I will have to change a lot of things on my resource handler and, in my humble opinion, looks semantically wrong to fetch Author to find the authors to only then fetch the book, since the straight forward idea is find the books from this author.

@andresgutierrez
Copy link
Contributor

@williamokano You can always use a join to fetch all the data in a single query

@nsossonko
Copy link
Contributor

+1 for this. My need for this is only for climbing the parent chain of a record. Eg:

$grandChild->$parent->$grandParent results in 3 SQL queries now:

(assuming the ids of each is 1)

  1. for the grandChild (SELECT * FROM GrandChild WHERE id = 1)
  2. for the parent (SELECT * FROM Parent WHERE id = 1)
  3. for the grandParent (SELECT * FROM GrandParent WHERE id = 1)

I have a need to check the entire chain sometimes, not just the grandParent or grandChild, but the parent as well. I would much rather do this once as follows:

SELECT grandChild.*, parent.*, grandParent.*
FROM GrandChild as grandChild
JOIN Parent as parent
  on parent.id = grandChild.parent_id
JOIN GrandParent as grandParent
  on grandParent.id = parent.grandParent_id
WHERE grandChild.id = 1

Obviously all an abstraction, but that's the general idea. If I run that now with PHQL it will return a Row instead of a ResultSet or Model with the relations set.

@mzf
Copy link

mzf commented Feb 4, 2016

+1 for this feature from box

@borisdelev
Copy link
Contributor

+1

1 similar comment
@tmihalik
Copy link
Contributor

+1

@sergeyklay
Copy link
Contributor

👎
Eager Loading is a really bad idea popularized by frameworks that do not care about performance.

@Jurigag
Copy link
Contributor

Jurigag commented Feb 19, 2016

I agree.

@Surt
Copy link
Contributor

Surt commented Feb 21, 2016

Well another eager loading issue closed. Anyway the https://github.com/stibiumz/phalcon.eager-loading works well and we can work with zephir to a lower level implementation.
Actually, on web programming, persistence read-write are the most used functionality and it's the most looked tool (routers, request, responses,caché, etc are nowadays practically standardars (even more thanks to phpfig)
About performance.... It just means memory vs processing time... Making thousand of calls to database to prevent bad use of memory .. Well, is just one side of a coin.

@ghost
Copy link

ghost commented Feb 21, 2016

You can't just dismiss this out of hand Sergey. I don't disagree that most implementations of eager loading in frameworks are memory hogging abominations because, like stibiumz's implementation, they just pull in the entire model. But I assure you that with careful consideration to performance, an imementation could be done in Phalcon that is no more memory intensive or detrimental to performance that a query builder with joins.
Eager loading isn't a bad idea, it's an idea that is implemented poorly in most if not all of its current incarnations. Phalcon already redefined the performance that a framework can deliver, so why are we so unwilling to redefine the performance of the ORM?

@josefguenther
Copy link
Contributor

You could say relational databases in general are a bad idea. However, many of us are using SQL variants with success. We use relationships. We do not get the best performance, but we are not programming in ASM either. For us (probably 90% of phalcon users), eager loading makes sense.

Just a quick note. The current implementation of ResultSet makes it VERY difficult to do eager loading manually. Because of the way it does not store all results in memory, it is impossible to add in the eager loaded data to the result set array.

I myself use the stibiumz project linked to above but it is buggy and no longer in active development...

@Jurigag
Copy link
Contributor

Jurigag commented Feb 25, 2016

Just whats the problem with joins ?

@josefguenther
Copy link
Contributor

So far I have only been able to use joins as conditionals in queries, I have not been able to retrieve data from the joined table. Am I missing something?

@Jurigag
Copy link
Contributor

Jurigag commented Feb 25, 2016

I guess you are missing modelsManager ? In my project i normally retrieve data from some table and joined table in one query(instruction).

@nsossonko
Copy link
Contributor

Make sure you select the columns of the joined tables as well, by default they are not selected.

@josefguenther
Copy link
Contributor

Aha, I see. This returns the Complex resultset... This is possible but could be a lot of work.

I do not understand why eager loading is a bad idea. Hibernate (java) allows you to eager load with a separate SELECT or with a JOIN. I think that would be an excellent model to work off of. Just because it may negatively impact performance does not mean it would not be helpful.

@Green-Cat
Copy link
Contributor

Hibernate (java) allows you to eager load with a separate SELECT

You can already do that: $model->getRelated('othermodel');

@phalcon phalcon locked and limited conversation to collaborators Feb 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests