Skip to content

Conversation

@srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Jun 7, 2023

Motivated form the fact that textures can be allocated over a clBuffer object and the size of backing clBuffer can be computed based on hardware image pitch alignment.

This optimizes the overall memory allocation on device and helps greately the models with large memory requirements.

Improvised the graph memory planner to not differentiate buffer and texture storage tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool that handles allocation for both buffer and image objects.

NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView. These new API's acommodates accessing same physical memory as clBuffer / clImage objects.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 7, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

1 similar comment
@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 7, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@srkreddy1238 srkreddy1238 changed the title [OpenCL][Texture] Improved texture memmory planning and runtime memor… [OpenCL][Texture] Improved texture memory planning Jun 7, 2023
@srkreddy1238 srkreddy1238 requested a review from tqchen June 7, 2023 19:41
@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch from 35ca39f to fa689e5 Compare June 8, 2023 03:36
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.

@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch from fa689e5 to 33e854b Compare June 8, 2023 05:35
@srkreddy1238
Copy link
Contributor Author

Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.

I agree. I am working on it.

Appreciate a review on the interface changes (DeviceAPI, NDArray) and over all design aspects mean while.

@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch 2 times, most recently from 613d0f0 to 8351151 Compare June 9, 2023 03:12
@srkreddy1238 srkreddy1238 requested a review from echuraev June 9, 2023 20:09
@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch 6 times, most recently from 59cca16 to ed8b70e Compare June 11, 2023 17:43
@srkreddy1238
Copy link
Contributor Author

@echuraev can you take a look now ?

Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. It was a public holiday. Several comments.

* \param dtype The type of elements.
* \param mem_scope The memory scope of allocated tensor.
* \return The allocated device pointer.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this set of changes. I feel that maybe we need something different that makes the memory allocator more explicit. Let me elaborate in reply

@tqchen
Copy link
Member

tqchen commented Jun 13, 2023

Thanks @srkreddy1238 . Reading the set of changes, i feel that we need something that is more lifted out from the DeviceAPI since this is getting closer to the two stage concept.

  • Allocator allocates physical Storage object (in this case backed by clBuffer)
  • The storage object then allocates the NDArray(backed by the storage object).

We already have some of similar concept in memory manager, e.g. in relax, but of course the memory manager as it is does not yet reflect the need of supporting things like creating different memory scope. This would help to move some of the complexity in the texture part to a specialized allocator that creates the Storage with related behavior.

The main rationale is to keep the overall DeviceAPI minimum without behavior of things like pooling and move some of the logic to Storage, NDArray separation, that can hopefully also enable direct support in VM backed runtimes like unity. Maybe we can discuss a bit on how to make that part more generalized that can be reused.

@srkreddy1238
Copy link
Contributor Author

Thanks @echuraev and @tqchen for a quick review.

I see MemoryManager can meet the requirements of the pool concept we have here. I will improve the PR by

  • Initializing the MemoryManager (PooledAllocator) from graph runtime (StorageSetup).
  • NDArray->Empty => PooledAllocator->Empty (Assuming Pooled Allocator works for all runtimes or I cam limit this for texture scoped graphs).
  • NDArray->CreateView => DeviceAPI->AllocDataSpaceView : This will take care of various views over NDArray

Another situation I came across multiple is about runtime::GetDataSize(container->dl_tensor)

Can we move this to DeviceAPI ? In case of texture there is device specific attributes (image row pitch) that defines the underlaying memory size and accessing these attributes outside doesn't look to be a great idea.

@srkreddy1238
Copy link
Contributor Author

@tqchen later I realized that the MemoryManager interface is part of relax (unity branch, not mainline). Do I bring bring it into relay and tailor it ? Or Is there any other ideas ?

@tqchen
Copy link
Member

tqchen commented Jun 14, 2023

One thing that I can probably do is do do a bit of refactoring and bring it to runtime folder so we can leverage it in most places, I may need a few weeks to get to this. There is also already one in relay that can be used as getting started. GetDataSize is also a logic that can be specific to the pool.

@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch from ed8b70e to 31d7de0 Compare July 17, 2023 06:30
@srkreddy1238
Copy link
Contributor Author

@tqchen

I enhanced to reuse the existing VM runtime memory manager by graph executor at 31d7de0

Basically,

  • Promoted memory manager from runtime/vm to runtime level
  • Redirected graph executor to use Memory Manager interface for StorageSetup.
  • Corresponding changes in OpenCL Runtime to work with this MemoryManager.

Let me know your opinion on these.

Another recommendation I have is redirecting AllocWorkspace and FreeWorkspace to MemoryManager and drop workspace_pool. workspace_pool is slightly different from pooled_allocator (workspace pool looks for best fit where as pooled allocator look for exact fit) and we can amend this functionality.

@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch from 31d7de0 to d3dee67 Compare July 17, 2023 09:58
@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch 3 times, most recently from 788ffe0 to ae78943 Compare August 23, 2023 05:58
…y allocation

Motivated form the fact that textures can be allocated over a clBuffer object and
the size of backing clBuffer can be computed based on hardware image pitch alignment.

This optimizes the overall memory allocation on device and helps greately the models with
large memory requirements.

Improvised the graph memory planner to not differentiate buffer and texture storage
tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool
that handles allocation for both buffer and image objects.

NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView.
These new API's acommodates accessing same physical memory as clBuffer / clImage objects.

* MemoryPool test cases and lint errors.

* test cases and fallback support.

* bug fix and cpp-runtime tests cases for texture views.

* various cl device info organized

* fix graph plan memory bug and correct the testcase.

* device attribute handling

* Some fallback for texture plan on devices w/o cl_khr_image2d_from_buffer

 * Memory Manager

Move the VM memory manager to the runtime level.
Use this memory manager for graph runtime.

 *  Resolve conflicts for VerifyDataType and Buffer

* review comments
@srkreddy1238 srkreddy1238 requested a review from echuraev October 9, 2023 04:32
@srkreddy1238
Copy link
Contributor Author

@tqchen @echuraev can you take another look on this ?
I have rebased and improvised based on the new memory manager interface.

@srkreddy1238 srkreddy1238 requested a review from yongwww October 9, 2023 16:43
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Several comments.

srkreddy1238 and others added 2 commits October 10, 2023 18:07
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
@srkreddy1238 srkreddy1238 requested a review from echuraev October 11, 2023 05:13
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@srkreddy1238
Copy link
Contributor Author

@tqchen & @yongwww can you take a look on this PR ?

@srkreddy1238
Copy link
Contributor Author

@tqchen can you take a look on this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants