-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test final Monero POW cryptonight_v8 #1851
Comments
I did't run benchmark this time, just started mining for a few minutes and checked highest reported hashrate.
This is strange - I get 68.3 H/s with xmrig on this machine. It uses the same asm code, the only difference is that I used Visual Studio 2017 to compile xmr-stak and MSYS2 with GCC 8.2.0 to compile xmrig. |
xmrig showed identical performance this time. |
Again, xmrig showed a bit higher hashrate - 267.9 H/s. |
@SChernykh Thanks for your tests. A few hashes difference can be. Depending on which port of the test pool you are mining. If you are lucky and found a lot of hashes than the hash rate will go down a few hashes. But let us wait for other results. Big thanks again for the asm code. |
I've also tested RX 560: hashrate numbers here are what was reported as highest when mining. I double checked it - performance is identical, I was mining v7 against v7 pool and v8 against v8 pool, all shares were accepted. I tried a few different configs, but I couldn't find faster settings for v7.
|
60 second benchmark on the same RX 560 with configs listed above:
I'm not sure what numbers to trust more. |
@MoneroCrusher @mobilepolice @kio3i0j9024vkoenio @Bathmat This is the final code for the next Monero PoW, it would be good if you tested it on everything you got and posted results here. Edit: everything except NVIDIA GPUs, CUDA version is not ready yet. |
CPU is i7-2600 non-K cache 8M
Also tried prefetch both ways, all three asm, in all cases the above was winner |
@Spudz76 it's better to test without anything running in background, right after reboot. Double hash asm code is not added to xmr-stak yet. |
Nvidia OpenCL causes severe lag for my computer (1080ti) and makes it unusable while mining. Previously, using CUDA with bfactor=12 and bsleep=100 causes no slowdown of the computer while doing word processing and basic videos. Is there a tweak that I'm missing here? Currently it's so laggy that it causes music to stutter and even pressing "h" to show hashrate isn't working properly. |
@plavirudar you could try lowering the intensity... might help |
@Bathmat I started off with intensity 896 (the recommended) and it was hashing ~1000h/s with ethlargement+600MHz mem OC (which is similar to its performance on cnh/cnv1, however the computer was unusable. When I dropped intensity to 640, the computer was still unusable, however hashrate fell to 800, which was lower than its performance on cnv2 CUDA. |
@Spudz76 Are you sure you're using the right CPU? I have an i7-2600 non-k and I'm getting 220 with Not sure what you meant by OS:XX, I tried mine on Ubuntu 16.04 with threads 0,1,2,3. |
OS: Kubuntu
|
@plavirudar Win7 and all sorts of garbage running in the background (my daily driver desktop box) Definitely not rebooting let alone closing all these tabs. But I have others I can test that are only for mining / not a i7-2600 though / most of them are non-AES and Linux so I was testing the applicable AES-capable stuff I have first Also this is about comparison from v7 to v8 not global competition. I ran so many passes and took the highest which should account for background task variances So whatever I have holding me back is doing the same thing to v8 the delta still applies (and in this case v8 was faster) I don't mine on this box normally but it is an extra test point (and has a GTX970 as well). I did tell Firefox to quit fiddling with the GPU for offloading and a few other easy avoidance measures (mostly to open more VRAM). |
Same Win7 box as above GTX970-4GB stock / driver profile max performance + P0
|
amd.txt config:
Performance is in-line with what I was expecting. Power consumption is from HWMonitor for CNv8-2. Power use for CNv7 was lower by 3-7 watts depending on the GPU, but again, this was expected. Turns out that changing Note: I did try a single thread test; however, hashrate was 7% slower than dual thread, and power consumption was the same as dual thread. |
OS: Ubuntu 16.04 Backend: CPU Only
Note that using SChernykh XMR-Stak-CPU latest code with all the same v8 changes and the optimized asm for 1x and 2x threads produces:
Miner config:
|
OS: Ubuntu 16.04 Backend: 8x Nvidia GTX 750 1GB CUDA
OpenCL
Losing almost half of the hash rate going from V7 to V8 is brutal. Below are the auto config GPU config files from v7 and v8 for the first two GPU's the remaining six have the exact same settings as the second GPU. The first GPU has a display attached. Miner config: V7 CUDA Nvidia.txt configuration
V8 OpenCL AMD.txt Nvidia configuration
|
OS: Ubuntu 16.04 Backend: 8x Nvidia GTX 750 1GB CUDA
OpenCL
Going from V7 to V8 is now a little less brutal but still a 27.6% lower hash rate than v7. Below are the GPU config files for v7 and v8 I have tweaked the v8 settings from the auto generated to the best settings I could obtain by various changes and retesting. Unroll of 4 is the best, going to 8 reduces performance Miner config: V7 CUDA Nvidia.txt configuration
V8 OpenCL AMD.txt Nvidia configuration
|
I can confirm these numbers. I got 73.6 H/s on CNv7 and 74.2 H/s on CNv8 with Core i5-3210M and these settings. Even though it has only 3 MB cache, second CPU thread helps a lot more when running CNv8. |
@kio3i0j9024vkoenio did you try |
Just tried "strided_index" : 0 and the results are exactly the same as with "strided_index" : 2: cryptonight_v7 1x GTX 750 1GB: varies from 228 to 249 H/s for each card cryptonight_v8 1x GTX 750 1GB: varies from 156 to 180 H/s for each card EDIT I have tries many other changes to the config file and the absolute best I can get with OpenCL is: cryptonight_v8 1x GTX 750 1GB: varies from 158 to 181 H/s for each card The final config is:
I hope that the CUDA version can be made available soon and I hope for better results with it. |
I have a Win7 rig with 3 Nvidia GPUs that is giving me issues with OpenCL... GPUs are one GTX970, and 2 GTX1050. If I run just 1 gpu, hashrates are about what I expect for OpenCL; however, if I try to run all 3, hashrate drops significantly and watching HWmonitor shows that GPU Utilization will only be 100% for 1 gpu at a time and it rotates between the gpus (thus causing the low hashrate). Does anyone know how to force each GPU to work simultaneously using OpenCL and Win7? Thoughts @Spudz76, @kio3i0j9024vkoenio? I've tried Googling, but my searches are coming up empty. Perhaps something in nvidia-smi? I've never really used nvidia-smi, so I'm not very familiar. EDIT: P.S. this rig works just fine on CNv7 and CUDA |
Everyone can now check the performance of the native CUDA backend. Please take care the default config for CUDA devices is complete different to the old configs. |
My 550 hashes are good now ~450 h/s on v8 But vega 56 + 64 are really low hashes... ~1000h/s |
I will do a windows CUDA 10 build and try with my 970GTX and 411 driver on the test pool for a bit Update built with both cuda patches (cudaFunction and volatileCUDA): |
what's 550's config? |
I use 2 threads, intensity 448, worksize 32, unroll 8 strided:mem/2:2 |
XMR-Stak + latest 2.8.1 xmrig-proxy working flawlessly together on pool & CNv2. Great work guys.
|
@MoneroCrusher thanks a lot. I will try it out when i get back home. |
For documentation: GTX1080 on Linux (not overclocked)
-> 536H/s before:
-> 515 H/s |
Best is to open a new issue to not mix dofferent topics.
But it looks like you must reduce your intensity because you are running
out of memory
|
Can someone please test #1898 on Windows (NVIDIA GPUs) with CUDa8+ and/or CUDA10 and give me feedback if all is working. |
Win7 / driver 411.70 / CUDA10 / 970GTX works well @ 370H/s |
@kio3i0j9024vkoenio You've been clearing |
even if it is in the dev branch please open a new issue and point to the
used checksum or provide xmr-stak -V
I can not keep a overview if we discuss bugs and results in one issue.
Also provide a full log of the output your OS and driver.
You have linked all very well but I can not follow all links and collect
all information.
|
please try to delete the opencl cache and/or start the miner with
--noAMDCache
|
I finally got CUDA (1320 H/S) to near the OpenCL (1373) H/S that I was getting before on the 8x GTX 750's. It was a lot of trial and error doing it. This was the auto-generated config that produced only 816 H/s:
Changing "threads" : 4, "blocks" : 32 to "threads" : 32, "blocks" : 8 brought the hashes to 1169 H/s:
Finally changing "comp_mode" to false produced the best 1320 H/S:
So the V7 vs V8 numbers are as follows: System: HP DL580 G7 with 4x Xeon E7-8837 processors and 8x Nvidia GTX 750's OS: Ubuntu 16.04 Backend: CPU Only 4x Xeon E7-8837's
Backend: 8x Nvidia GTX 750 1GB Only CUDA
Complete System: HP DL580 G7 with 4x Xeon E7-8837 processors and 8x Nvidia GTX 750's
|
@kio3i0j9024vkoenio It looks like that with the current code, you must have |
I have found the problem using OpenCL on Nvidia. It turns out to be a brain fart on my part. During my testing both OpenCL and CUDA was running on the GPU's and that caused the Error CL_MEM_OBJECT_ALLOCATION_FAILURE when calling clEnqueueNDRangeKernel for kernel 0. This is the command line that I run now to only test OpenCL on Nvidia:
The --noNVIDIA is needed to not run CUDA @psychocrypt - Maybe a check could be added to not allow CUDA and OpenCL to run on the same GPU at the same time. And the reason that the older version xmr-stak 2.4.7 0fef2cf from Sept 24th worked was because I compiled that version without CUDA. Thanks for all the help. I will be deleting my posts above that had incorrect information. This is the V8 config I am using for OpenCL on Nvidia GTX 750's
Backend: 8x Nvidia GTX 750 1GB Only CUDA
OpenCL
|
The issue I was having has been resolved here: #1851 (comment) This is the V8 config I am using for OpenCL on Nvidia GTX 750's
Using the `intensity * work_size * 2 < GPU memory in MB from this config gives 352 * 8 * 2 = 5632 which is way higher than memory:848 and the above config works just fine.
You are correct in that OpenCL is worse than CUDA. Backend: 8x Nvidia GTX 750 1GB Only CUDA
OpenCL
|
It looks like CUDA has additional optimizations in the current version as I can now use "threads" : 32, "blocks" : 12 whereas before I could not get that to produce good hash rates. So now V8 CUDA is producing 1694 H/S on the 8x Nvidia GTX 750's. CUDA Nvidia.txt Config:
Backend: 8x Nvidia GTX 750 1GB Only CUDA
Complete System: HP DL580 G7 with 4x Xeon E7-8837 processors and 8x Nvidia GTX 750's
@psychocrypt - Thanks for all the hard work as the V8 91.3% vs V7 on pretty old hardware is very very nice. Xeon E7-8837: Introduction date | Apr 3, 2011, Microarchitecture | Westmere It is amazing how hardware prices drop over time as the E7-8837 was $2280 at introduction and I have been getting four of them for around $40 or $10 each. Nvidia GTX 750: Introduction date | Feb 18, 2014 |
@kio3i0j9024vkoenio Try threads=8,12,16 and blocks=32, should be a bit faster. |
OS: Windows 10 RX Vega56
Miner config:
|
@onweer try worksize=16 or 32, unroll=8 or 16 in different combinations. Also try two threads per GPU. |
OS: Windows 10 Backend: CPU Only
Miner config: quad core with four 1x threads (one per core) |
Final results for my HP DL580 G7 with 4x E7-8837 Xeons and 8x Nvidia GTX 750's running on Ubuntu 16.04 and CUDA version 9.2. V7 - xmr-stak 2.4.4 c0ab173 CUDA Nvidia.txt Config: // gpu: GeForce GTX 750 architecture: 50 // gpu: GeForce GTX 750 architecture: 50 Backend: 8x Nvidia GTX 750 1GB Only CUDA cryptonight_v7 8x GTX 750 1GB: 1896 H/s - 100% Backend: CPU Only 4x Xeon E7-8837's cryptonight_v7: 1637 H/s - 100% Complete System: HP DL580 G7 with 4x Xeon E7-8837 processors and 8x Nvidia GTX 750's cryptonight_v7 8x GTX 750 1GB: 3533 H/s - 100% |
Backend: 8x Nvidia GTX 750 1GB Only CUDA T32:B12: gets 1713.2 H/s without any issues T8,B32: gets 1494.4 H/s and produces this error: T12,B32: gets 1749.5 H/s and produces this error: T16:B32: doesn't even run: T12,B32: For an additional 2% gain it doesn't seem worthwhile to try to tweak it to not produce errors. |
If someone has problems that with 2.5.0 the hash rate for cryptonight_v7 is to low please have a look to #1930 |
First test of CN/2 vs CN/1
Version: xmr-stak 2.4.7 c5f0505
Version: xmr-stak 2.5.0 9012512
Version: xmr-stak 2.5.0 9012512
Version: xmr-stak 2.5.0 9012512
amd.txt RX 550 - 2 x Just before switching back to CN/1, I copied the report on CN/2, and noticed that "Highest" is close to what I'd expect on CN/1
|
Please, can ou write ho wto run xmr-stak-cpu from Visual Studio 2017? |
And why no to do so, that all parametrs have been written in config.txt? |
Monero is changing there POW in October 2018. Please test the implementation of the new algorithm against the test pool (http://killallasics.moneroworld.com/)
You can find the source code of xmr-stak in pull request #1850 or download the zipped source directly.
Please report here only the speed comparison between
cryptonight_v7
andcryptonight_v8
. If you fund any bugs please report it in the pull request #1850.Please also take the time to mine a few minutes against the testnet pool to check that you not get invalid results.
How to bench the system:
Please start the miner once with
./xmr-stak
to createpools.txt
and all other config files.Change
cryptonight_v8
intocryptonight_v7
to measure the performance of the current monero POW. Please do not forget to remove the backend configs if you switch the algorithm because"strided_index" : 1
is not allowed forcryptonight_v8
CPU:
CUDA/AMD OpenCL:
CUDA is currently not supported. I am currently try to get some performance out it.
NVIDIA via OpenCL
Template for speed reporting:
The text was updated successfully, but these errors were encountered: