Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch model from ID when filling results #55

Open
hipsterjazzbo opened this issue Jan 26, 2017 · 7 comments
Open

Fetch model from ID when filling results #55

hipsterjazzbo opened this issue Jan 26, 2017 · 7 comments

Comments

@hipsterjazzbo
Copy link
Contributor

At the moment, if I specify only certain fields to be indexed (with $searchable for example), only those fields will be filled in search results.

EloquentFiller::fill() should re-retrieve the models from the DB fresh, and use those to populate the results.

An example of such a modification:

public function fill(Model $model, Result $result)
{
    // Get all the _id values from the elastic results
    $ids = $result->hits()->pluck('_id')->toArray();

    // Re-fetch all those models in a single query
    $hits = $model->newQueryWithoutScopes()->whereIn($model->getQualifiedKeyName(), $ids)->get();

    // Populate the results
    $result->setHits($hits);
}

The reason I haven't made this a PR is because I can see that EloquentFiller is doing a number of other things besides just spinning up Model instances. If you could enumerate what else it's doing for me I am happy to do the work. I'm using this package quite extensively :)

@sleimanx2
Copy link
Owner

sleimanx2 commented Jan 26, 2017

Having the data fetched from the relational database will help with the consistency of the data however we might want to look at the drawbacks that this approach might cause.

1- at each search request we are calling both elasticsearch and rdb which is avoidable in the majority of the scenarios.

2- there are a lot of scenarios when you want to store a relationship value with the parent object in elastic search for instance a post tags. In those scenarios we won't be able eager load the relationships so we will face n+1 problem.

3- in a read only application where the data is stored by another application only in elasticsearch. In this case we won't be able to use the package because we will try to hit the rdb !

I think having something like that could help

$result = Book::search()->match('title','foo')->get();

$resultFresh = $result->hits()->fresh()->with('tags','author')->get();

what do you think ?

@hipsterjazzbo
Copy link
Contributor Author

Those are good points. My thoughts:

  1. IMHO, selecting from the DB isn't a big deal — Laravel itself does it all the time (there's even a trait for it), and selecting by ID is nice and fast.

  2. Regarding relationships, we absolutely could handle that if I understand you correctly. Given your example, if the raw elastic results include a tags property, we could simply use lazy eager loading:

    // ... looping through elastic result properties
    if ($property instanceof Relationship)
        // Eager load relationship from DB in a single query
        $model->load($property);
    }
  3. Again, IMHO, that fact the Plastic always returns Model instances implicitly links them to the DB. I believe the correct approach would be to fetch from the DB by default, and to continue to offer raw results on demand, via the direct access to the DSL that Plastic offers.

    In other words, I don't believe that read-only application example is valid, because the application would already have to have Model classes defined which would by definition break without a DB anyway.

Again, if you agree on these points, I'm more than happy to contribute a PR :)

@sleimanx2
Copy link
Owner

@hipsterjazzbo I think we are getting closer to a solution.

The lazy-eager loading idea could solve a lot of performance issues when re-fetching the document from the DB.

On the other side the performance difference can still be significant the SerializesModels that you mentioned is used in queues which totally makes sense in that context as for search requests where you need your app to be as interactive as possible you need to reduce unnecessary IO take advantage of the speed of ES.

Moreover, since we are going to hit the database another time why not give the user the ability to add additional query filters before executing the query?

What are your concerns for the proposed API?

As as side note the models dont break unless you try to persist so read-only is okey and convenient. I know a couple of projects where they use it as read only.

Finally the most important thing is that in production most probably you will have ES clustered as opposed to MYSQL where you might have only one node. if you force retrieval from DB you are discarding all the benefits of having a cluster (Load, Speed, Max Requests Number ...).

@hipsterjazzbo
Copy link
Contributor Author

I still feel like it wouldn't be an issue, in terms of hitting db. It'd be one fast select statement.

Also, the whole reason this came up is that my models are in fact broken — I customise the data I store in elastic for performance reasons. One of the things I don't store is a team_id column, which means I can no longer load that relationship for models that come from Plastic.

@nmkr
Copy link

nmkr commented Apr 4, 2017

+1 for an option to fetch fresh values..
i have a trait and a custom method for outputting an image from storage, it gets detected as realtionship and we cannot output the right format.

@hipsterjazzbo
Copy link
Contributor Author

I've been thinking about this more.

The core issue here is that hits() returns model instances that are not real model instances. If one customises the data that is stored in the index, instead of just using the default, the models are then broken.

I propose this:

  1. hits() returns elastic documents directly, Basically just the json_decode()-ed results.
  2. A new method on the results class, models(), which does similar to my suggestion above: Fetches the models on demand from the DB, in a single query.

This keeps the advantages of the current implementation, in that is stays fast, and doesn't force models to be selected, while also 1. making models available very easily if needed, and 2. not presenting models that don't actually relate to the database like they should.

Thoughts, @sleimanx2?

@hipsterjazzbo
Copy link
Contributor Author

Further thoughts:

Doing it this way also opens up the door for future, more advanced usage. For example:

In additon to hits() and models(), we could add something like query() which would return a Illuminate\Database\Eloquent\Query like I put above:

$model->newQueryWithoutScopes()->whereIn($model->getQualifiedKeyName(), $ids);

Which could then be built on if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants