Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better caching #421

Merged
merged 19 commits into from
Feb 8, 2021
Merged

Better caching #421

merged 19 commits into from
Feb 8, 2021

Conversation

tomaspavlin
Copy link
Contributor

@tomaspavlin tomaspavlin commented Feb 3, 2021

Dřív jsme při jakémkoli editu pacientů mazali všechny konfigurace, teď zavádíme hash pacientů a ukládáme ji do db s konfigurací. Při fetchováním předpočítaných matchingů kontrolujeme tuto hash. Konfigurace nemažeme.

Testing

Několik unittestů a pak jsem to testoval v UI. Změnil jsem nastavení donora nebo recipienta -> vypočítaný matching se mi nezobrazil, změnil jsem nastavení zpět -> matching se zobrazil.

Closes #394

@tomaspavlin tomaspavlin self-assigned this Feb 3, 2021
@tomaspavlin tomaspavlin mentioned this pull request Feb 3, 2021
Copy link
Member

@kubantjan kubantjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vypada moc fajn, male komentare :)

db.session.commit()
return get_donor_from_donor_model(DonorModel.query.get(donor_update_dto.db_id))


def get_patients_hash(txm_event: TxmEvent) -> int:
donors = tuple(txm_event.active_donors_dict.values()) \
if txm_event.active_donors_dict is not None else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tohle se nikdy nemuze stat, viz ten post init v txm eventu. V tom txm eventu to vlastne jde prepsat tak jak jsi zminoval jinde, pres to init=false, tady to dava dobry smysl.


# pylint: disable=too-many-branches, too-many-statements
# Many branches and statements are still readable here
def _update_hash(hash_, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bylo by fajn tady dodat proc je to potreba delat takhle. Fakt nejde naimplementovat hashovani u tech objektu primo?

koukam jeste na https://stackoverflow.com/questions/52390576/how-can-i-make-a-python-dataclass-hashable

a uvedomuju si, ze nekde v kodu mame unsafe_hash = true, tam by asi bylo lepsi dat custom fci, at je to spravne safe a podobne by se to dalo dodelat i tady ne?

Copy link
Contributor Author

@tomaspavlin tomaspavlin Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proč to takhle dělám popisuji tady v Teamsech. Hashování s hash() funkcema by šlo docílit pomocí přenastavení PYTHONHASHSEED, ale nepřipadá mi to jako good practise. Takhle navíc můžeme hashovat jen to, na čem může být závislý výsledek algoritmu, tedy si myslím, že se může hodit to mít udělané takhle ručně a na jednom místě.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok a jeste je tu teda nebezpeci, ze kdyz se zmeni libovolny objekt, ktery se tady resi, tak ze bude potreba zmenit kod tady, na coz hrozi, ze zapomeneme,

neslo by to tak, ze bys tu funkci persistent hash napsal u tech objektu primo? pak clovek kdyz meni ten objekt, rovnou tam uvidi, tu has fci a je to takovy, ze se na to spis nezapomene.

@tomaspavlin
Copy link
Contributor Author

Zapracoval jsem připomínky @kubantjan


# pylint: disable=too-many-branches, too-many-statements
# Many branches and statements are still readable here
def _update_hash(hash_, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok a jeste je tu teda nebezpeci, ze kdyz se zmeni libovolny objekt, ktery se tady resi, tak ze bude potreba zmenit kod tady, na coz hrozi, ze zapomeneme,

neslo by to tak, ze bys tu funkci persistent hash napsal u tech objektu primo? pak clovek kdyz meni ten objekt, rovnou tam uvidi, tu has fci a je to takovy, ze se na to spis nezapomene.

Copy link
Member

@kubantjan kubantjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Znova koukam na ten kod co se tyka hashovani a uvedomil jsem si zas jednu jinou vec: kdyz clovek zmeni hashovani, tak zmeni vlastne vsechny historicky pocitani pacientu, takze se potencialne to cely muze rozbit. Takze zmena hash by mela bejt promyslena a asi znamenat promazani historicky databaze, nebo zarizeni, ze ten vysledny hash bude furt stejny.

Ta verze, kterou jsi ted delal je dobra v tom smyslu, ze mozna budeme upravovat objekt pacienta, ale muzeme se mozna snazit ponechat ten hash bez zmeny co to jde. All in all, furt bych tu fci dal primo k tem objektum, nebo tam dal ten assert, ale asi bych nakonec tu hash neprepisoval, ptze ty pozadavky na tohle hashovani a to normalni se fakt nekdy muzou lisit a je nebezpecny, ze clovek bude predpokladat standartni chovani hash a to nenastane.

Jinak bylo by moc fajn tohle mergnout co nejdriv a nasadit v IKEM, ptze oni tam chteji mit dalsi fazi testovani.Jak se k tomu dostanes? Pokud by ses k tomu v nedeli/pondeli dopol nedostaval, tak bych skoro mergnul tu aktualni verzi a pak iteroval, at zrovna tohleto mame ready.

_update_hash(hash_, value.donor_type)
_update_hash(hash_, value.active)
if isinstance(value, Recipient):
_update_hash(hash_, value.related_donor_db_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kdyz jsi mluvil o ty performance, ze to potencialne muze bejt pomalejsi, testoval jsi, jak dlouho tohle zabere pro 100 pacientu treba?

asi by bylo fajn to vyzkouset,

pokud byc to byl netrivialni cas, asi by pomohlo dat nejdriv vsechny ty hodnoty z treba patient parameters do kupy a volat na to jeden _update_has (ale pokud je to rychly, tak to resit neni treba..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vyzkouším

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating persistent hash of 72 patients took about 0.014 seconds

Copy link
Member

@tomaskourim tomaskourim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Par otazek a par drobnosti.

@@ -35,7 +33,6 @@


def add_donor_recipient_pair(donor_recipient_pair_dto: DonorRecipientPairDTO, txm_event_db_id: int):
remove_configs_from_txm_event(txm_event_db_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

muze se rovnou odstranit i ta funkce, uz se nepouzije

@@ -22,6 +22,7 @@ class ConfigModel(db.Model):
id = db.Column(Integer, primary_key=True, autoincrement=True, nullable=False)
txm_event_id = db.Column(db.Integer, ForeignKey('txm_event.id', ondelete='CASCADE'), unique=False, nullable=False)
parameters = db.Column(db.JSON, unique=False, nullable=False)
patients_hash = db.Column(db.BigInteger, unique=False, nullable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fakticky se tohle prida jako posledni sloupecek. Nevim jestli neni lepsi ho mit i zde jako posledni

@@ -64,4 +64,5 @@ def post(self, txm_event_id: int) -> str:
:configuration.max_matchings_to_show_to_viewer]
calculated_matchings_dto.show_not_all_matchings_found = False

calculated_matchings_dto.calculated_matchings = calculated_matchings_dto.calculated_matchings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to je identita, ne?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mažu

recipients = tuple(txm_event.active_recipients_dict.values())

start = time.time()
hash_ = initialize_persistent_hash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ma to oznaceni _ zde nejaky vyznam? Jmeno treba persistent_hash by mi pripadalo lespi mozna. Pripadne bych upravil vsude.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash je build-in name. persistent_hash mi připadá možná trochu dlouhé. To třeba při použití v update_persistant_hash(), kde je označení hash_ zcela jasné. Co myslíš?

hash_digest = get_hash_digest(hash_)
end = time.time()

logger.debug(f'Creating persistent hash of {len(donors) + len(recipients)} patients took {end - start} seconds')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trva to dlouho?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netrvá, dám pryč

update_persistent_hash(hash_, Donor)
update_persistent_hash(hash_, self.related_recipient_db_id)
update_persistent_hash(hash_, self.donor_type)
update_persistent_hash(hash_, self.active)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kdyz delat hash pro config a txm event, tak beres jenom active donory. Ale myslim, ze by to nemelo hrat roli. Potencialne duplicita, ale podle me nevadi. (Jenom ze jsem si vsimnul)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ano, je to duplicita, ale připadá mi to takhle jistější a blbuvzdornější


@dataclass

@dataclass(init=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tenhle trik bych chtel vysvetlit :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protože definuji __init__ funkci ručně, nepotřebujeme ji nechat vytvářet automaticky. Použití init místo post_init je lepší v tom, že ty dva fieldy nemusí být optional

@@ -46,9 +57,13 @@ def __post_init__(self):
hla_types = [hla_type for hla_type in self.hla_types_list if hla_type.code]
self.hla_per_groups = _split_hla_types_to_groups(hla_types)

def update_persistent_hash(self, hash_):
update_persistent_hash(hash_, HLATyping)
update_persistent_hash(hash_, self.hla_per_groups)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a hla_types_list nechceme, protoze vlastne obsahuje stejne udaje, jenom neni roztrideny?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hlavně jestli se nepletu tak vstup do solveru je hla_per_groups a ne hla_types_list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jop tak tak

@@ -24,107 +21,9 @@
'DR15'
]

DONORS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako proc ne, ale presouval bych tedy vsechno nebo nic. Takhle tu par konstant zustalo, par ne. Idealne aby tahle data byla na jednom miste, podle me.

hash_1 = get_patients_persistent_hash(txm_event_1)
self.assertEqual(hash_1, 1602329590066289936)

# changing event db id, event name does not change the hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing event db id OR event name does not change the hash

Aby to bylo patrnejsi, chvili jsem dumal

@tomaskourim tomaskourim added the waiting for the author Waiting for author to respond label Feb 7, 2021
@@ -46,9 +57,13 @@ def __post_init__(self):
hla_types = [hla_type for hla_type in self.hla_types_list if hla_type.code]
self.hla_per_groups = _split_hla_types_to_groups(hla_types)

def update_persistent_hash(self, hash_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jeste prosim jen typ hash_ napsat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to moc dobře nejde. Typ je hashlib._Hash a jednoduše s ním anotovat nejde. Připadá mi, že nestojí za to se to snažit rozchodit python/typeshed#2928

Copy link
Member

@kubantjan kubantjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jen jeste chybejici typ u has_ prosim doplnit typovani.

u value asi patri proste any.

@kubantjan kubantjan dismissed tomaskourim’s stale review February 8, 2021 16:25

I checked and everything is resolved

@kubantjan kubantjan merged commit f511b36 into master Feb 8, 2021
@kubantjan kubantjan deleted the 394_better_caching branch February 8, 2021 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for the author Waiting for author to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better caching
3 participants