-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better caching #421
Better caching #421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vypada moc fajn, male komentare :)
db.session.commit() | ||
return get_donor_from_donor_model(DonorModel.query.get(donor_update_dto.db_id)) | ||
|
||
|
||
def get_patients_hash(txm_event: TxmEvent) -> int: | ||
donors = tuple(txm_event.active_donors_dict.values()) \ | ||
if txm_event.active_donors_dict is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tohle se nikdy nemuze stat, viz ten post init v txm eventu. V tom txm eventu to vlastne jde prepsat tak jak jsi zminoval jinde, pres to init=false, tady to dava dobry smysl.
|
||
# pylint: disable=too-many-branches, too-many-statements | ||
# Many branches and statements are still readable here | ||
def _update_hash(hash_, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bylo by fajn tady dodat proc je to potreba delat takhle. Fakt nejde naimplementovat hashovani u tech objektu primo?
koukam jeste na https://stackoverflow.com/questions/52390576/how-can-i-make-a-python-dataclass-hashable
a uvedomuju si, ze nekde v kodu mame unsafe_hash = true, tam by asi bylo lepsi dat custom fci, at je to spravne safe a podobne by se to dalo dodelat i tady ne?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proč to takhle dělám popisuji tady v Teamsech. Hashování s hash()
funkcema by šlo docílit pomocí přenastavení PYTHONHASHSEED
, ale nepřipadá mi to jako good practise. Takhle navíc můžeme hashovat jen to, na čem může být závislý výsledek algoritmu, tedy si myslím, že se může hodit to mít udělané takhle ručně a na jednom místě.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok a jeste je tu teda nebezpeci, ze kdyz se zmeni libovolny objekt, ktery se tady resi, tak ze bude potreba zmenit kod tady, na coz hrozi, ze zapomeneme,
neslo by to tak, ze bys tu funkci persistent hash napsal u tech objektu primo? pak clovek kdyz meni ten objekt, rovnou tam uvidi, tu has fci a je to takovy, ze se na to spis nezapomene.
Zapracoval jsem připomínky @kubantjan |
|
||
# pylint: disable=too-many-branches, too-many-statements | ||
# Many branches and statements are still readable here | ||
def _update_hash(hash_, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok a jeste je tu teda nebezpeci, ze kdyz se zmeni libovolny objekt, ktery se tady resi, tak ze bude potreba zmenit kod tady, na coz hrozi, ze zapomeneme,
neslo by to tak, ze bys tu funkci persistent hash napsal u tech objektu primo? pak clovek kdyz meni ten objekt, rovnou tam uvidi, tu has fci a je to takovy, ze se na to spis nezapomene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Znova koukam na ten kod co se tyka hashovani a uvedomil jsem si zas jednu jinou vec: kdyz clovek zmeni hashovani, tak zmeni vlastne vsechny historicky pocitani pacientu, takze se potencialne to cely muze rozbit. Takze zmena hash by mela bejt promyslena a asi znamenat promazani historicky databaze, nebo zarizeni, ze ten vysledny hash bude furt stejny.
Ta verze, kterou jsi ted delal je dobra v tom smyslu, ze mozna budeme upravovat objekt pacienta, ale muzeme se mozna snazit ponechat ten hash bez zmeny co to jde. All in all, furt bych tu fci dal primo k tem objektum, nebo tam dal ten assert, ale asi bych nakonec tu hash neprepisoval, ptze ty pozadavky na tohle hashovani a to normalni se fakt nekdy muzou lisit a je nebezpecny, ze clovek bude predpokladat standartni chovani hash a to nenastane.
Jinak bylo by moc fajn tohle mergnout co nejdriv a nasadit v IKEM, ptze oni tam chteji mit dalsi fazi testovani.Jak se k tomu dostanes? Pokud by ses k tomu v nedeli/pondeli dopol nedostaval, tak bych skoro mergnul tu aktualni verzi a pak iteroval, at zrovna tohleto mame ready.
_update_hash(hash_, value.donor_type) | ||
_update_hash(hash_, value.active) | ||
if isinstance(value, Recipient): | ||
_update_hash(hash_, value.related_donor_db_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kdyz jsi mluvil o ty performance, ze to potencialne muze bejt pomalejsi, testoval jsi, jak dlouho tohle zabere pro 100 pacientu treba?
asi by bylo fajn to vyzkouset,
pokud byc to byl netrivialni cas, asi by pomohlo dat nejdriv vsechny ty hodnoty z treba patient parameters do kupy a volat na to jeden _update_has (ale pokud je to rychly, tak to resit neni treba..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vyzkouším
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating persistent hash of 72 patients took about 0.014 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Par otazek a par drobnosti.
@@ -35,7 +33,6 @@ | |||
|
|||
|
|||
def add_donor_recipient_pair(donor_recipient_pair_dto: DonorRecipientPairDTO, txm_event_db_id: int): | |||
remove_configs_from_txm_event(txm_event_db_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
muze se rovnou odstranit i ta funkce, uz se nepouzije
@@ -22,6 +22,7 @@ class ConfigModel(db.Model): | |||
id = db.Column(Integer, primary_key=True, autoincrement=True, nullable=False) | |||
txm_event_id = db.Column(db.Integer, ForeignKey('txm_event.id', ondelete='CASCADE'), unique=False, nullable=False) | |||
parameters = db.Column(db.JSON, unique=False, nullable=False) | |||
patients_hash = db.Column(db.BigInteger, unique=False, nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fakticky se tohle prida jako posledni sloupecek. Nevim jestli neni lepsi ho mit i zde jako posledni
txmatching/web/api/matching_api.py
Outdated
@@ -64,4 +64,5 @@ def post(self, txm_event_id: int) -> str: | |||
:configuration.max_matchings_to_show_to_viewer] | |||
calculated_matchings_dto.show_not_all_matchings_found = False | |||
|
|||
calculated_matchings_dto.calculated_matchings = calculated_matchings_dto.calculated_matchings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to je identita, ne?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mažu
recipients = tuple(txm_event.active_recipients_dict.values()) | ||
|
||
start = time.time() | ||
hash_ = initialize_persistent_hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ma to oznaceni _
zde nejaky vyznam? Jmeno treba persistent_hash
by mi pripadalo lespi mozna. Pripadne bych upravil vsude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash
je build-in name. persistent_hash
mi připadá možná trochu dlouhé. To třeba při použití v update_persistant_hash()
, kde je označení hash_
zcela jasné. Co myslíš?
hash_digest = get_hash_digest(hash_) | ||
end = time.time() | ||
|
||
logger.debug(f'Creating persistent hash of {len(donors) + len(recipients)} patients took {end - start} seconds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trva to dlouho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netrvá, dám pryč
update_persistent_hash(hash_, Donor) | ||
update_persistent_hash(hash_, self.related_recipient_db_id) | ||
update_persistent_hash(hash_, self.donor_type) | ||
update_persistent_hash(hash_, self.active) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kdyz delat hash pro config a txm event, tak beres jenom active donory. Ale myslim, ze by to nemelo hrat roli. Potencialne duplicita, ale podle me nevadi. (Jenom ze jsem si vsimnul)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ano, je to duplicita, ale připadá mi to takhle jistější a blbuvzdornější
|
||
@dataclass | ||
|
||
@dataclass(init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tenhle trik bych chtel vysvetlit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protože definuji __init__
funkci ručně, nepotřebujeme ji nechat vytvářet automaticky. Použití init místo post_init je lepší v tom, že ty dva fieldy nemusí být optional
@@ -46,9 +57,13 @@ def __post_init__(self): | |||
hla_types = [hla_type for hla_type in self.hla_types_list if hla_type.code] | |||
self.hla_per_groups = _split_hla_types_to_groups(hla_types) | |||
|
|||
def update_persistent_hash(self, hash_): | |||
update_persistent_hash(hash_, HLATyping) | |||
update_persistent_hash(hash_, self.hla_per_groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a hla_types_list
nechceme, protoze vlastne obsahuje stejne udaje, jenom neni roztrideny?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hlavně jestli se nepletu tak vstup do solveru je hla_per_groups
a ne hla_types_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jop tak tak
@@ -24,107 +21,9 @@ | |||
'DR15' | |||
] | |||
|
|||
DONORS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jako proc ne, ale presouval bych tedy vsechno nebo nic. Takhle tu par konstant zustalo, par ne. Idealne aby tahle data byla na jednom miste, podle me.
hash_1 = get_patients_persistent_hash(txm_event_1) | ||
self.assertEqual(hash_1, 1602329590066289936) | ||
|
||
# changing event db id, event name does not change the hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing event db id OR event name does not change the hash
Aby to bylo patrnejsi, chvili jsem dumal
txmatching/patients/hla_model.py
Outdated
@@ -46,9 +57,13 @@ def __post_init__(self): | |||
hla_types = [hla_type for hla_type in self.hla_types_list if hla_type.code] | |||
self.hla_per_groups = _split_hla_types_to_groups(hla_types) | |||
|
|||
def update_persistent_hash(self, hash_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jeste prosim jen typ hash_ napsat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to moc dobře nejde. Typ je hashlib._Hash
a jednoduše s ním anotovat nejde. Připadá mi, že nestojí za to se to snažit rozchodit python/typeshed#2928
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jen jeste chybejici typ u has_ prosim doplnit typovani.
u value asi patri proste any.
# Conflicts: # txmatching/web/api/matching_api.py
I checked and everything is resolved
Dřív jsme při jakémkoli editu pacientů mazali všechny konfigurace, teď zavádíme hash pacientů a ukládáme ji do db s konfigurací. Při fetchováním předpočítaných matchingů kontrolujeme tuto hash. Konfigurace nemažeme.
Testing
Několik unittestů a pak jsem to testoval v UI. Změnil jsem nastavení donora nebo recipienta -> vypočítaný matching se mi nezobrazil, změnil jsem nastavení zpět -> matching se zobrazil.
Closes #394