-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature: AEM Remote Assets #1294
New Feature: AEM Remote Assets #1294
Conversation
…complete but proves out the pattern and includes the main pieces necessary for the feature
…rs (such as for DAM Update Assets) can choose to ignore events from, so that we can work w/assets without triggering unnecessary workflows.
… resource resolver
… performed by service users
… the admin user runs subsystems like resource property observation which can then trigger unexpected binary syncs
…zed but are okay for now.
…LUTIONS/acs-aem-commons into feature/aem-remote-assets # Conflicts: # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsConfigImpl.java # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsNodeSyncImpl.java
…LUTIONS/acs-aem-commons into feature/aem-remote-assets # Conflicts: # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsConfigImpl.java # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsNodeSyncImpl.java
…th performance slightly.
…sting binaries and different temporary assets.
…feature/aem-remote-assets
…orrectly convert Double props that come over in JSON as strings.
…onfig component, refactoring.
…hitelisted service users
Updated remote asset sync of date properties to maintain timezone
…cleaner logging, handle jpg/jpeg, closed streams.
…able session saving/refreshing.
…, only create binaries if the lastModified has changed.
try { | ||
Session session = resourceResolver.adaptTo(Session.class); | ||
if (session != null) { | ||
session.logout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea if this logout()
is necessary - my gut says it isnt but I dont know enough of the internals to be sure. The reason I'm concerned in the first place is b/c the getResourceResolver()
function uses the session to setUserData()
on the workspace observation manager so as to prevent triggering workflows.
try { | ||
resource = remoteAssetsResolver.getResource(nextPath); | ||
if (resource == null) { | ||
Node node = JcrUtil.createPath(nextPath, primaryType, remoteAssetsResolver.adaptTo(Session.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If theres a simple way to do this w/o Node I'm happy to update this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think there is... best i know is RR.create(..) which i believe only does single nodes at a time..
@HitmanInWis @kaushalmall - so i was playing around this past 2 nights... and it seems that the sync times are very high (i.e. very noticeable as a user of the system). For the scheduled sync this is transparent since its an async job (though watching the logs it takes 2 mins or so to pull my 50 assets down from my remote AEM instance -- which is remote, it's not running on my local - im using a hosted Asset Share Commons author and syncing the ASC folder). The more concerning part is the renditions (binary) sync .. when i search and get a pull page of results, it tends to hang for quite a while (pulling in 10 asset x 5 renditions = 50 requests, some large)... I also noticed that certain requests, especially when accessing AEM Assets folders, tend to hit many assets in a folder.. so browsing, to a folder that contains other folders which in turn contain assets, will invoke the thumbnail folder servlet which seems to walk many assets in that folder (even though i never clicked in). I also noticed that occasionally the following...
Am I doing something wrong here? I watched your video and you didn't have delays but it sounded like you were running both Src and Dest locally removing the network time. |
Hi David. I think all of your concerns are imperfections in the system as it stands today. However, I think we should look at it from a right perspective of "is it useful for some use cases, even if not all use cases yet?". I think its valid to say that there will be a delay when the remote AEM server is truly remote. However, that's going to largely depend on connection quality between AEM instances. Two servers on the same network, or in the AWS cloud, may perform significantly better than what you're seeing (maybe not as fast as in my demo, but much closer). Regardless of the delay, keep in mind that this is a one-time "tax" you are paying for auto-sync of just the assets you want. For example, a developer working on a client website locally would get "taxed" the first time they go to a certain page on the site, but after that page loads the first time it will load w/o any delay in the future (the assets are now in the local DAM). The same can be said about a true Assets -> Sites scenario - the first time a search is performed it will delay in retrieving binaries, but the second time the assets will already be there. Down another thought path, the core tenet of this feature is pulling only the assets that are actually in use - hence why doing the "just in time" sync. I think for assets used in a site we could add a future enhancement to pre-sync binaries, thus eliminating the delay for that particular use case. For use cases like asset search and DAM traversal, however, we'd still be doing "just in time". Lastly, good catch on the race condition where the 2nd browser sees temporary binaries that the first browser is currently sync'ing. We could definitely have the 2nd request sleep while the first thread is pulling the asset - we already have logic in there that's preventing that 2nd request from fetching the same binary that the first request is pulling, so the change would be relatively trivial (sleep 100ms in a loop until the asset is sync'd). |
Also regarding "large" assets I think a potential future enhancement is having a config that sync's assets greater than the configured size in an asynchronous rather than synchronous fashion. I think this feature has huge potential for enhancement, but I also dont want to hold off the initial push of this feature until we've covered every single scenario. As it stands right now I think this feature is already sufficient for sync'ing assets to local and non-production server instances. A TON of interest has been shown in this feature, and I'm hoping that getting in an initial cut may set the stage for other companies to contribute the one or two things they need to make the feature work for them. |
…ad of metadata Fixed bug where subasset folders were being flagged as isRemoteAsset
@davidjgonzalez another thing I thought of - please make sure all of your asset workflow launchers have |
…or the first sync to complete before returning, thus fixing an edge case where Thread 1 fetches a big binary and Thread 2 skips sync and shows the end user a temporary asset.
…lver of the returned asset was the Remote Assets resource resolver. Not only was this a security flaw, but it also caused problems in that the resource resolver was closed and could not be used by the calling code. Instead, we now simply refresh the resource from within the decorator code, using the original resource resolver.
Fixed the race condition you found @davidjgonzalez, where Thread 2 does not wait for Thread 1's binary sync to complete. With the new impl, both Thread 1 and Thread 2 will correctly see the real asset when the page finishes loading. |
@davidjgonzalez is there anything remaining on this PR other than writing unit tests? I do plan on getting around to doing tests (especially since a couple companies have reached out to me directly about this feature) but just want to make sure that otherwise things are good. |
@HitmanInWis TBH I'm still not sold on this; when I use it it makes AEM feel broken/not working properly. That being said, I cant think of a better way to do this... I spent a bit of time trying out different methods to make the content syncing more robust [1] using packages/batching of request/async requests ... and a combo of them all... but nothing felt good. I'm also leery on the JSON -> Node transformation since it makes assumptions wrt to the content; which could be ok, but IDK. I'd rather it be an exact copy (which is where packages/XML help out). @badvision / @kaushalmall - have you tried this out and reviewed the approach? Would be good to get your (experience/review-based ;)) thoughts. [1] https://github.com/davidjgonzalez/acs-aem-commons/tree/feature/remote-assets-packages-poc |
Ok. I do want to get confirmation on whether or not this is a "go" before I invest time into a bunch of unit tests, so best we work through these questions now rather than later. I can say the I've had two separate companies reach out to me specifically about the feature, and have heard from someone at Adobe they were getting all sorts of requests for it, so there's definitely a high level of interest in the feature. Even though the feature is not yet currently perfect, I believe it provides a TON of developer value if nothing else, as it can be used to sync assets from a non-prod server to their local instance to have a full working local site. Getting the feature in at its current state will then allow for future iterations to make it even better. I'm less concerned about the JSON node transformation issue because a) it seems to be working well even if not 100% perfect and b) the server that is using remote assets is not the system of record, so a 99% perfect copy of meta data is probably going to work just fine nearly 100% of the time. |
I had some time to play around as well. Not as much as @davidjgonzalez though. My setup was running two AEM instances locally, so, I didn't really see any network latency issues. My opinion is to merge this, but document all the drawbacks and implementation details that we have discussed here. That ways anyone who decides to use this feature knows what they are getting and if they see issues, hopefully there will be more PRs in the future. |
@HitmanInWis The config should clarify if URL scheme is required or not. |
Also, this doesn't synchronize even the smaller 140x140 thumbnail images? |
Also, remote assets should not be editable because that would lead to a lot of confusion. |
Just a FYI, we are looking to get someone internally assigned to banging out these tests in the (hopefully) near future. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
It's fine if this rolls off as "stale" - we can resurrect it when we finally have the time to write some tests. At this point, I'm unsure when that will be. |
* Initial commit for AEM Remote Assets - this is extremely rough and incomplete but proves out the pattern and includes the main pieces necessary for the feature * Updated AEM Remote Assets to set event-user-data that worflow launchers (such as for DAM Update Assets) can choose to ignore events from, so that we can work w/assets without triggering unnecessary workflows. * Updated Remote Assets to leverage a service user instead of the admin resource resolver * Updated Remote Assets to avoid sync'ing binaries for resource fetches performed by service users * Misc cleanups to Remote Assets * Updated Remote Assets to avoid sync'ing binaries for admin user since the admin user runs subsystems like resource property observation which can then trigger unexpected binary syncs * Updated the placeholder assets - these can probably be further optimized but are okay for now. * Added the first batch of Remote Asset binary files * [AEMIP-174] - Remote asset content sync WIP. * Merge branch 'feature/aem-remote-assets' of https://github.com/HS2-SOLUTIONS/acs-aem-commons into feature/aem-remote-assets # Conflicts: # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsConfigImpl.java # bundle/src/main/java/com/adobe/acs/commons/remoteassets/impl/RemoteAssetsNodeSyncImpl.java * [AEMIP-174] - Remote Assets JCR Tree Sync - added temporary binaries (WIP). * [AEMIP-174] - Moved basic auth encoding to activate method to help with performance slightly. * [AEMIP-174] - Remote assets - updated sync logic to handle dates, existing binaries and different temporary assets. * [AEMIP-174] - Remote assets - minor fixes. * Cleaned up licensing * Added support for Long properties in remote assets. * [AEMIP-174] - Remote assets - felix trigger, tag syncing, fixes. * [AEMIP-174] - Remote assets - updated decorator to account for both sync paths. * Updated remote assets node sync to ignore binary properties, and to correctly convert Double props that come over in JSON as strings. * [AEMIP-174] - Remote assets - bug fixes for tags, removed checks on config component, refactoring. * Updated remote asset configs to require server/user/pass and to fix whitelisted service users * Updated remote asset configs to filter out blank sync paths Updated remote asset sync of date properties to maintain timezone * [AEMIP-174] - Remote assets - removed unneeded ParseException and code cleanup. * [AEMIP-174] - Remote assets - removed test servlets, added javadocs, cleaner logging, handle jpg/jpeg, closed streams. * [AEMIP-174] - Remote assets - added missing return type in javadoc. * [AEMIP-174] - Remote assets - corrected URI bug, implemented configurable session saving/refreshing. * [AEMIP-174] - Remote assets - added sync cronjob, added debug logging, only create binaries if the lastModified has changed. * [AEMIP-174] - Remote assets - added missing JavaDoc. * [AEMIP-174] - Remote assets - made logger static. * [AEMIP-174] - Remote assets - fixed mime type bugs. * [AEMIP-174] - Remote assets - fixed mime type bugs. * Added the second batch of Remote Asset binary files * [AEMIP-174] - Remote assets - corrected mimetype strings and removed unneeded code. * [AEMIP-174] - Remote assets - corrected bug with indd files, class name refactoring, code cleanup. * [AEMIP-174] - Remote assets - added support for more temporary binaries (files), code cleanup. * [AEMIP-174] - Remote assets - fixed binary sync bug with unknown rendition, alphabetized remote asset files condition, corrected mime types, updated logging. * [AEMIP-174] - Remote assets - refactored method names and added file extension mime types constant class. * Added the third batch of remote asset binary files * [AEMIP-174] - Remote assets - PR revisions. * [AEMIP-174] - Remote assets - changed JcrConstants to new version (old is deprecated in AEM 6.3). * [AEMIP-174] - Remote assets - added support for more temporary binaries and handle different file types with same mime type. * [AEMIP-174] - Remote assets - revisions from PR feedback. * Added entry for AEM Remote Assets * Remote assets code climate fixes * Remote assets code climate fixes * Updated setRenditionOnAsset to not suppress IOException - need to flag the asset sync as failed in the calling function. * Code refactors per PR feedback * Updated JSON parsing to Gson (away from deprecated sling JSON) Refactored a bunch of function and var names. * Updated remote assets to use HttpClient "fluent" functions per PR feedback * Fixed a bug with ignoring/removing renditions not found on remote server. * Updated everything to use resourceResolver instead of JCR interface to perform CRUD operations. Refactored unsafe class-level resourceResolvers to local instances. * Misc cleanups in docs * Fixed bug where asset node sync was putting tags on jcr:content instead of metadata Fixed bug where subasset folders were being flagged as isRemoteAsset * Updated a second request for a asset currently being sync'd to wait for the first sync to complete before returning, thus fixing an edge case where Thread 1 fetches a big binary and Thread 2 skips sync and shows the end user a temporary asset. * Fixed a bug with binary sync in remote assets where the resource resolver of the returned asset was the Remote Assets resource resolver. Not only was this a security flaw, but it also caused problems in that the resource resolver was closed and could not be used by the calling code. Instead, we now simply refresh the resource from within the decorator code, using the original resource resolver. * Update changelog for #1294 - New Remote Assets feature * Update AEM Remote Assets to OSGi annotations * Added unit tests for RemoteAssetsConfigImpl * Cleanups from updating AEM Remote Assets to OSGi annotations * Added unit tests for RemoteAssetDecorator * Added unit tests for RemoteAssetsBinarySyncImpl * Added unit tests for RemoteAssetsNodeSyncImpl * Refactored remoteassets test resources to remoteassetstest folder to avoid potential conflict w/src resources * Refactored out calls to IOUtils that were not closing the streams * Resolved an issue where doing a full maven build would cause mockito to fail to spy a Resource instance in the testSync test, with an error: Caused by: java.lang.IllegalAccessError: class org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource$$EnhancerByMockitoWithCGLIB$$5f8ad087 cannot access its superclass org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource * Update remote assets node sync to support tags at /content/cq:tags. Update remote assets config to more clearly indicate the format of server.url with an example. * Fixed issue with log file getting too long from Remote Assets tests * Cleaned up a bunch of codestyle issues. * Fixed another codeclimate issue for Remote Assets. * Refactored getRemoteAssetPlaceholder to reduce Cognitive Complexity * Renamed RemoteAssetsNodeSyncJob to RemoteAssetsNodeSyncScheduler to be more clear as to what it is. Also fixed the default cron expression, and removed the ability to configure to allow concurrent execution. * Moved /content/cq:tags folder to /content/_cq_tags to work on Windows machines. * Update remote assets service user permissions to be added by EnsureServiceUser to avoid creating an /etc/tags folder in AEM 6.4 or a /content/cq:tags folder in AEM 6.3 * Fixed issue where RemoteAssetsNodeSyncScheduler was not successfully registering * Update to CHANGELOG for remote assets * Cleanup of Remote Asset config descriptions * Reduce some logging clutter at the DEBUG level. * Reduce some logging clutter at the DEBUG level. * Ignore codeclimate test of switch case count for getRemoteAssetPlaceholder(), as refactoring will likely not make the code any more straightforward.
This is a PR for the AEM Remote Assets feature, to be presented at Adobe SUMMIT AEM Rockstar on Wednesday Mar 28.
The purpose of AEM Remote Assets is to allow an AEM server to be fully functional with assets from a remote AEM instance without sync'ing the entire set of DAM asset file binaries (the bulk of the disk space) up front. Asset file binaries are sync'd into the server the first time they are requested by a user. To the end user it appears all assets are present, but in actuality some assets may be getting fetched "just in time" from the source AEM server as the user accesses them. Once an asset file has been pulled in from the source server the asset is now a standard AEM asset. The bulk of rarely-used assets, however, remain un-synced.
I would consider this feature "beta" level. It's pretty solid in our local testing, but I encourage anyone looking to use this feature to do so with caution in non-production environments until the feature is further battle-tested. Though this feature ultimately supports use of a remote AEM Assets instance from an AEM Sites instance, which is a very important business use case, it is also extremely useful for developers working on local AEM instances of production websites, where pulling the entire DAM from production to the local server is simply not an option due to size.
I understand that unit testing will need to be added to this PR before it is accepted, but wanted to get feedback on the code before diving into that effort.
Other "nice to have" features that I can see for the future of AEM Remote Assets include: