Skip to content
This repository was archived by the owner on Aug 2, 2023. It is now read-only.

ReadableBuffer ctor tweaks #1227

Closed
wants to merge 9 commits into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Feb 18, 2017

No point in doing extra work for an out param that is never used.

out ReadCursor cursor variants of TryGetBuffer aren't used anymore but haven't removed them

Unless they were meant to be called with a param that was later used?

/cc @davidfowl

}
start.TryGetBuffer(end, out _first, out start);
_length = -1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it would; but setting value after function call prevents a tail jump; as it needs to return to set the variable. So set before the call as its uneffected by the result.

var segment = _segment;
var index = _index;

// Determine if we might attempt to copy data from segment.Next before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't quite true anymore. Pipelines takes a snapshot of start/end before these methods navigate the linked list.

@@ -57,12 +57,13 @@ internal ReadableBuffer(ReadCursor start, ReadCursor end)
{
_start = start;
_end = end;
_length = -1;

if (!end.IsEnd && !end.GreaterOrEqual(start))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of this bounds check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it unhappy, can take it out in later PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@benaadams
Copy link
Member Author

Not hot path but added an overflow avoidance and over checking params

@benaadams
Copy link
Member Author

@jkotas with Memory<byte>.Empty it is

                data = Memory<byte>.Empty;
00007FFD9C2CA78D  mov         rcx,7FFD9C272DB0h  
00007FFD9C2CA797  mov         edx,1  
00007FFD9C2CA79C  call        00007FFDFBB53BB0  
00007FFD9C2CA7A1  mov         rdx,12E10007140h  
00007FFD9C2CA7AB  mov         rdx,qword ptr [rdx]  
00007FFD9C2CA7AE  mov         ecx,dword ptr [rdx]  
00007FFD9C2CA7B0  mov         edi,dword ptr [rdx+1Ch]  
00007FFD9C2CA7B3  mov         ebx,dword ptr [rdx+24h]  
00007FFD9C2CA7B6  xor         ebp,ebp  
00007FFD9C2CA7B8  lea         rcx,[rsi]  
00007FFD9C2CA7BB  call        00007FFDFBB7C610  
00007FFD9C2CA7C0  mov         dword ptr [rsi+8],ebx  
00007FFD9C2CA7C3  mov         dword ptr [rsi+0Ch],ebp  
00007FFD9C2CA7C6  mov         dword ptr [rsi+10h],edi  
                return false;
00007FFD9C2CA7C9  xor         eax,eax  
00007FFD9C2CA7CB  add         rsp,38h  

With _emptyMemory it is

                data = _emptyMemory;
00007FFD9C2AA565  mov         rcx,7FFD9C23F698h  
00007FFD9C2AA56F  mov         edx,25h  
00007FFD9C2AA574  call        00007FFDFBB7E9A0  
00007FFD9C2AA579  mov         rax,1F4A71261D8h  
00007FFD9C2AA583  mov         rax,qword ptr [rax]  
00007FFD9C2AA586  lea         rsi,[rax+8]  
00007FFD9C2AA58A  call        00007FFDFBB7C730  
00007FFD9C2AA58F  movs        qword ptr [rdi],qword ptr [rsi]  
00007FFD9C2AA591  movs        qword ptr [rdi],qword ptr [rsi]  
                return false;
00007FFD9C2AA593  xor         eax,eax  
00007FFD9C2AA595  add         rsp,48h  

@benaadams
Copy link
Member Author

probably should turn

public static Memory<T> Empty => OwnerEmptyMemory<T>.Shared.Memory;

into a readonly static instead

@jkotas
Copy link
Member

jkotas commented Feb 19, 2017

with Memory.Empty it is
with _emptyMemory it is

There are two calls to JIT helpers in both cases. I would expect that these calls will dominate the performance. Do you see measurable perf difference between the two variants?

If there is a measurable difference, I would expect it to be caused by Memory<T> and friends being generic. Generic statics are slower than non-generic statics.

turn public static Memory<T> Empty => OwnerEmptyMemory<T>.Shared.Memory; into a readonly static instead.

I do not think it is a good idea. Properties are better - more flexible.

4 field structs will always result into somewhat bloated code, and the JIT optimization trade-offs make it even worse. Local micro-optimizations won't turn Memory<T> into a super fast type.

@benaadams
Copy link
Member Author

benaadams commented Feb 19, 2017

There are two calls to JIT helpers in both cases.

Looks like static base initialise and assign reference? Can get rid of the static base.

double

00007FFD9C2AA58F  movs        qword ptr [rdi],qword ptr [rsi]  
00007FFD9C2AA591  movs        qword ptr [rdi],qword ptr [rsi]  

is a bit weird

@mjp41
Copy link
Member

mjp41 commented Feb 19, 2017

I would guess, the 'movs' comes from a struct copy.

It will increase 'rsi' and 'rdi' both by 8, after performing the load from the address in 'rsi' and storing the result of the load to the address in 'rdi'.

@benaadams
Copy link
Member Author

benaadams commented Feb 19, 2017

@mjp41 it doesn't adjust rsi or rdi between the moves; I think you are right its struct copy. I think it maybe be coalescing two dword movs into a single qword movs; but still doing them twice

            var segment = _segment;
00007FFD9C2CA891  push        rsi  
00007FFD9C2CA892  push        rbp  
00007FFD9C2CA893  push        rbx  
00007FFD9C2CA894  sub         rsp,38h  
00007FFD9C2CA898  xor         eax,eax  
00007FFD9C2CA89A  mov         qword ptr [rsp+28h],rax  
00007FFD9C2CA89F  mov         qword ptr [rsp+30h],rax  
00007FFD9C2CA8A4  mov         rbx,r8  
00007FFD9C2CA8A7  mov         rax,qword ptr [rcx]  
00007FFD9C2CA8AA  cmp         qword ptr [rdx],rax  
00007FFD9C2CA8AD  jne         00007FFD9C2CA917  
            {
                var index = _index;
00007FFD9C2CA8AF  mov         ecx,dword ptr [rcx+8]  
00007FFD9C2CA8B2  mov         edx,dword ptr [rdx+8]  
00007FFD9C2CA8B5  sub         edx,ecx  
                if (segment != null && following > 0)
00007FFD9C2CA8B7  test        rax,rax  
00007FFD9C2CA8BA  je          00007FFD9C2CA8E7  
00007FFD9C2CA8BC  test        edx,edx  
00007FFD9C2CA8BE  jle         00007FFD9C2CA8E7  
                {
                    data = segment.ReadOnlyMemory.Slice(index, following);
00007FFD9C2CA8C0  add         rax,28h  
00007FFD9C2CA8C4  mov         r8,qword ptr [rax]  
00007FFD9C2CA8C7  mov         esi,dword ptr [rax+8]  
00007FFD9C2CA8CA  mov         edi,ecx  
00007FFD9C2CA8CC  add         edi,dword ptr [rax+0Ch]  
00007FFD9C2CA8CF  mov         ebp,edx  
00007FFD9C2CA8D1  lea         rcx,[rbx]  
00007FFD9C2CA8D4  mov         rdx,r8  
00007FFD9C2CA8D7  call        00007FFDFBB7C610  
00007FFD9C2CA8DC  mov         dword ptr [rbx+8],esi  
00007FFD9C2CA8DF  mov         dword ptr [rbx+0Ch],edi  
00007FFD9C2CA8E2  mov         dword ptr [rbx+10h],ebp  
00007FFD9C2CA8E5  jmp         00007FFD9C2CA904  
                else
                {
                    data = EmptyByteMemory.ReadOnlyEmpty;
00007FFD9C2CA8E7  mov         rax,1FF5C2F61A8h  
00007FFD9C2CA8F1  mov         rax,qword ptr [rax]  
00007FFD9C2CA8F4  lea         rsi,[rax+8]  
00007FFD9C2CA8F8  mov         rdi,rbx  
00007FFD9C2CA8FB  call        00007FFDFBB7C730  
00007FFD9C2CA900  movs        qword ptr [rdi],qword ptr [rsi]  
00007FFD9C2CA902  movs        qword ptr [rdi],qword ptr [rsi]  
00007FFD9C2CA904  cmp         dword ptr [rbx+10h],0  
00007FFD9C2CA908  setne       al  
00007FFD9C2CA90B  movzx       eax,al  
00007FFD9C2CA90E  add         rsp,38h  
00007FFD9C2CA912  pop         rbx  
00007FFD9C2CA913  pop         rbp  
00007FFD9C2CA914  pop         rsi  
00007FFD9C2CA915  pop         rdi  
00007FFD9C2CA916  ret  
            }

            return TryGetBufferMultiBlock(end, out data);
00007FFD9C2CA917  movdqu      xmm0,xmmword ptr [rdx]  
00007FFD9C2CA91B  movdqu      xmmword ptr [rsp+28h],xmm0  
00007FFD9C2CA921  lea         rdx,[rsp+28h]  
00007FFD9C2CA926  mov         r8,rbx  
00007FFD9C2CA929  call        00007FFD9C2C6F90  
00007FFD9C2CA92E  nop  
00007FFD9C2CA92F  add         rsp,38h  
00007FFD9C2CA933  pop         rbx  
00007FFD9C2CA934  pop         rbp  
00007FFD9C2CA935  pop         rsi  
00007FFD9C2CA936  pop         rdi  
00007FFD9C2CA937  ret  

@mikedn
Copy link

mikedn commented Feb 19, 2017

it doesn't adjust rsi or rdi between the moves; I think you are right its struct copy. I think it maybe be coalescing two dword movs into a single qword movs; but still doing them twice

They do adjust rsi and rdi. Those 2 instructions copy 8 byte each for a total of 16 bytes.

It's a bit bizarre to see movs in x64 code, I was under the impression that only JIT32 generates these. A SSE copy might be better.

@benaadams
Copy link
Member Author

Ah, yes, was confused by what movs did

@benaadams
Copy link
Member Author

@davidfowl
Copy link
Member

@benaadams I'm going to stomp on this change and we have conflicts. Can you build on top once I get my changes in?

@Drawaes
Copy link
Contributor

Drawaes commented Feb 23, 2017

Dam, no popcorn emoji....

@benaadams
Copy link
Member Author

Yep

@davidfowl
Copy link
Member

davidfowl commented Mar 13, 2017

@benaadams can you rebase and do-over? Not sure we need most of it anymore though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants