How can I get the log during training the Yolov5 model? #3457

gganduu · 2021-11-11T08:13:54Z

Pytorch Yolov5 training loss will return two parameters, one is for the log, but there is only one return in Analytics Zoo loss, how can I get the training log then?

yangw1234 · 2021-11-11T11:53:18Z

Hi @gganduu I do not understand your question.

Could you provide more details like code snippets?

gganduu · 2021-11-12T00:02:52Z

Yes, we sent the code by email already.(To wang yang and qiu xin)

yangw1234 · 2021-11-12T07:05:42Z

Synced offline, here are the brief summary:

Feature Requests:

Automatically printing loss and other metrics of every epoch or iteration when calling Estimator.fit

Current workaround we provides:

fitting multiple epochs in a loop. E.g.

change

est = Estimator.from_pytorch(...)
est.fit(data, epochs=epochs)

to

est = Estimator.from_pytorch(...)
for i in range(num_epochs):
    result = est.fit(data, epochs=1)
    model = est.get_model()
    print(f"epoch {i}: {result}")

jason-dai · 2021-11-12T07:44:58Z

Synced offline, here are the brief summary:

Feature Requests:

Automatically printing loss and other metrics of every epoch or iteration when calling Estimator.fit

Current workaround we provides:

fitting multiple epochs in a loop. E.g.

change
est = Estimator.from_pytorch(...)
est.fit(data, epochs=epochs)
to
est = Estimator.from_pytorch(...)
for i in range(num_epochs):
    result = est.fit(data, epochs=1)
    model = est.get_model()
    print(f"epoch {i}: {result}")

What should be the desired behavior?

yangw1234 · 2021-11-12T08:11:31Z

What should be the desired behavior?

I think it is a trade off between flexibility and usability.

Our current behavior is more flexible, like pytorch, as it can allow users to checkpointing in their prefered frequency and printing log in their desired format. The downside is they have to write some code.

If we are tageting usability, I think it also make sense to implement a fixed number of checkpointing and logging stategies and ask user to configure the one closest to their needs.

It is a judgment call.

jason-dai · 2021-11-12T08:24:57Z

What should be the desired behavior?

I think it is a trade off between flexibility and usability.

Our current behavior is more flexible, like pytorch, as it can allow users to checkpointing in their prefered frequency and printing log in their desired format. The downside is they have to write some code.

If we are tageting usability, I think it also make sense to implement a fixed number of checkpointing and logging stategies and ask user to configure the one closest to their needs.

It is a judgment call.

What's the keras behavior?

yangw1234 · 2021-11-15T01:30:46Z

What's the keras behavior?

Keras will automatically output training loss, metrics and speed. Model checkpoint can be configured using model checkpoint callback.
Stick to Keras?

jason-dai · 2021-11-15T01:48:44Z

What's the keras behavior?

Keras will automatically output training loss, metrics and speed. Model checkpoint can be configured using model checkpoint callback. Stick to Keras?

Either Keras or PyTorch Lightening style?

* add hyperzoo for k8s support (intel-analytics#2140) * add hyperzoo for k8s support * format * format * format * format * run examples on k8s readme (intel-analytics#2163) * k8s readme * fix jdk download issue (intel-analytics#2219) * add doc for submit jupyter notebook and cluster serving to k8s (intel-analytics#2221) * add hyperzoo doc * add hyperzoo doc * add hyperzoo doc * add hyperzoo doc * fix jdk download issue (intel-analytics#2223) * bump to 0.9s (intel-analytics#2227) * update jdk download url (intel-analytics#2259) * update some previous docs (intel-analytics#2284) * K8docsupdate (intel-analytics#2306) * Update README.md * Update s3 related links in readme and documents (intel-analytics#2489) * Update s3 related links in readme and documents * Update s3 related links in readme and documents * Update s3 related links in readme and documents * Update s3 related links in readme and documents * Update s3 related links in readme and documents * Update s3 related links in readme and documents * update * update * modify line length limit * update * Update mxnet-mkl version in hyper-zoo dockerfile (intel-analytics#2720) Co-authored-by: gaoping <pingx.gao@intel.com> * update bigdl version (intel-analytics#2743) * update bigdl version * hyperzoo dockerfile add cluster-serving (intel-analytics#2731) * hyperzoo dockerfile add cluster-serving * update * update * update * update jdk url * update jdk url * update Co-authored-by: gaoping <pingx.gao@intel.com> * Support init_spark_on_k8s (intel-analytics#2813) * initial * fix * code refactor * bug fix * update docker * style * add conda to docker image (intel-analytics#2894) * add conda to docker image * Update Dockerfile * Update Dockerfile Co-authored-by: glorysdj <glorysdj@gmail.com> * Fix code blocks indents in .md files (intel-analytics#2978) * Fix code blocks indents in .md files Previously a lot of the code blocks in markdown files were horribly indented with bad white spaces in the beginning of lines. Users can't just select, copy, paste, and run (in the case of python). I have fixed all these, so there is no longer any code block with bad white space at the beginning of the lines. It would be nice if you could try to make sure in future commits that all code blocks are properly indented inside and have the right amount of white space in the beginning! * Fix small style issue * Fix indents * Fix indent and add \ for multiline commands Change indent from 3 spaces to 4, and add "\" for multiline bash commands Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com> * enable bigdl 0.12 (intel-analytics#3101) * switch to bigdl 0.12 * Hyperzoo example ref (intel-analytics#3143) * specify pip version to fix oserror 0 of proxy (intel-analytics#3165) * Bigdl0.12.1 (intel-analytics#3155) * bigdl 0.12.1 * bump 0.10.0-Snapshot (intel-analytics#3237) * update runtime image name (intel-analytics#3250) * update jdk download url (intel-analytics#3316) * update jdk8 url (intel-analytics#3411) Co-authored-by: ardaci <dongjie.shi@intel.com> * update hyperzoo docker image (intel-analytics#3429) * update hyperzoo image (intel-analytics#3457) * fix jdk in az docker (intel-analytics#3478) * fix jdk in az docker * fix jdk for hyperzoo * fix jdk in jenkins docker * fix jdk in cluster serving docker * fix jdk * fix readme * update python dep to fit cnvrg (intel-analytics#3486) * update ray version doc (intel-analytics#3568) * fix deploy hyperzoo issue (intel-analytics#3574) Co-authored-by: gaoping <pingx.gao@intel.com> * add spark fix and net-tools and status check (intel-analytics#3742) * intsall netstat and add check status * add spark fix for graphene * bigdl 0.12.2 (intel-analytics#3780) * bump to 0.11-S and fix version issues except ipynb * add multi-stage build Dockerfile (intel-analytics#3916) * add multi-stage build Dockerfile * multi-stage build dockerfile * multi-stage build dockerfile * Rename Dockerfile.multi to Dockerfile * delete Dockerfile.multi * remove comments, add TINI_VERSION to common arg, remove Dockerfile.multi * multi-stage add tf_slim Co-authored-by: shaojie <shaojiex.bai@intel.com> * update hyperzoo doc and k8s doc (intel-analytics#3959) * update userguide of k8s * update k8s guide * update hyperzoo doc * Update k8s.md add note * Update k8s.md add note * Update k8s.md update notes * fix 4087 issue (intel-analytics#4097) Co-authored-by: shaojie <shaojiex.bai@intel.com> * fixed 4086 and 4083 issues (intel-analytics#4098) Co-authored-by: shaojie <shaojiex.bai@intel.com> * Reduce image size (intel-analytics#4132) * Reduce Dockerfile size 1. del redis stage 2. del flink stage 3. del conda & exclude some python packages 4. add copies layer stage * update numpy version to 1.18.1 Co-authored-by: zzti-bsj <shaojiex.bai@intel.com> * update hyperzoo image (intel-analytics#4250) Co-authored-by: Adria777 <Adria777@github.com> * bigdl 0.13 (intel-analytics#4210) * bigdl 0.13 * update * print exception * pyspark2.4.6 * update release PyPI script * update * flip snapshot-0.12.0 and spark2.4.6 (intel-analytics#4254) * s-0.12.0 master * Update __init__.py * Update python.md * fix docker issues due to version update (intel-analytics#4280) * fix docker issues * fix docker issues * update Dockerfile to support spark 3.1.2 && 2.4.6 (intel-analytics#4436) Co-authored-by: shaojie <otnw_bsj@163.com> * update hyperzoo, add lib for tf2 (intel-analytics#4614) * delete tf 1.15.0 (intel-analytics#4719) Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com> Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com> Co-authored-by: pinggao187 <44044110+pinggao187@users.noreply.github.com> Co-authored-by: gaoping <pingx.gao@intel.com> Co-authored-by: Kai Huang <huangkaivision@gmail.com> Co-authored-by: GavinGu07 <55721214+GavinGu07@users.noreply.github.com> Co-authored-by: Yifan Zhu <zhuyifan@stanford.edu> Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com> Co-authored-by: Song Jiaming <litchy233@gmail.com> Co-authored-by: ardaci <dongjie.shi@intel.com> Co-authored-by: Yang Wang <yang3.wang@intel.com> Co-authored-by: zzti-bsj <2779090360@qq.com> Co-authored-by: shaojie <shaojiex.bai@intel.com> Co-authored-by: Lingqi Su <33695124+Adria777@users.noreply.github.com> Co-authored-by: Adria777 <Adria777@github.com> Co-authored-by: shaojie <otnw_bsj@163.com>

jason-dai · 2021-11-25T12:06:06Z

Do you need to backport to AZ?

yangw1234 · 2021-11-25T12:43:06Z

Do you need to backport to AZ?

I'll backport. This is closed automatically by github.

helenlly added the user issue label Nov 12, 2021

helenlly assigned yangw1234 Nov 12, 2021

yangw1234 mentioned this issue Nov 23, 2021

add stats log after each epoch for pytorch ray and pyspark #3552

Merged

yangw1234 closed this as completed in #3552 Nov 25, 2021

jason-dai reopened this Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I get the log during training the Yolov5 model? #3457

How can I get the log during training the Yolov5 model? #3457

gganduu commented Nov 11, 2021

yangw1234 commented Nov 11, 2021

gganduu commented Nov 12, 2021

yangw1234 commented Nov 12, 2021 •

edited

Loading

jason-dai commented Nov 12, 2021

Feature Requests:

Current workaround we provides:

yangw1234 commented Nov 12, 2021

jason-dai commented Nov 12, 2021

yangw1234 commented Nov 15, 2021

jason-dai commented Nov 15, 2021

jason-dai commented Nov 25, 2021

yangw1234 commented Nov 25, 2021

How can I get the log during training the Yolov5 model? #3457

How can I get the log during training the Yolov5 model? #3457

Comments

gganduu commented Nov 11, 2021

yangw1234 commented Nov 11, 2021

gganduu commented Nov 12, 2021

yangw1234 commented Nov 12, 2021 • edited Loading

Feature Requests:

Current workaround we provides:

jason-dai commented Nov 12, 2021

Feature Requests:

Current workaround we provides:

yangw1234 commented Nov 12, 2021

jason-dai commented Nov 12, 2021

yangw1234 commented Nov 15, 2021

jason-dai commented Nov 15, 2021

jason-dai commented Nov 25, 2021

yangw1234 commented Nov 25, 2021

yangw1234 commented Nov 12, 2021 •

edited

Loading