Skip to content

Commit

Permalink
Merge remote-tracking branch 'IQSS/develop' into
Browse files Browse the repository at this point in the history
IQSS/10137-2-add_flag_to_make_reason_optional
  • Loading branch information
qqmyers committed Jun 26, 2024
2 parents 43a50f8 + 94b15e2 commit d946b06
Show file tree
Hide file tree
Showing 11 changed files with 331 additions and 110 deletions.
9 changes: 9 additions & 0 deletions doc/release-notes/10579-avoid-solr-deletes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
A features flag called "reduce-solr-deletes" has been added to improve how datafiles are indexed. When the flag is enabled,
Dataverse wil avoid pre-emptively deleting existing solr documents for the files prior to sending updated information. This
should improve performance and will allow additional optimizations going forward.

The /api/admin/index/status and /api/admin/index/clear-orphans calls
(see https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency)
will now find and remove (respectively) additional permissions related solr documents that were not being detected before.
Reducing the overall number of documents will improve solr performance and large sites may wish to periodically call the
clear-orphans API.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ While in the past Solr performance hasn't been much of a concern, in recent year
We are tracking performance problems in `#10469 <https://github.com/IQSS/dataverse/issues/10469>`_.

In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental.
Another flag, ``reduce-solr-deletes``, avoids deleting solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change).

Datasets with Large Numbers of Files or Versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 3 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3274,6 +3274,9 @@ please find all known feature flags below. Any of these flags can be activated u
* - add-publicobject-solr-field
- Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress.
- ``Off``
* - reduce-solr-deletes
- Avoids deleting and recreating solr documents for dataset files when reindexing.
- ``Off``
* - disable-return-to-author-reason
- Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call.
- ``Off``
Expand Down
11 changes: 10 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package edu.harvard.iq.dataverse;

import edu.harvard.iq.dataverse.DatasetVersion.VersionState;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.dataaccess.DataAccess;
import edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter;
Expand Down Expand Up @@ -383,7 +384,8 @@ public FileMetadata findMostRecentVersionFileIsIn(DataFile file) {
if (fileMetadatas == null || fileMetadatas.isEmpty()) {
return null;
} else {
return fileMetadatas.get(0);
// This assumes the order of filemetadatas is from first to most recent, which is true as of v6.3
return fileMetadatas.get(fileMetadatas.size() - 1);
}
}

Expand Down Expand Up @@ -759,6 +761,13 @@ public List<DataFile> findAll() {
return em.createQuery("select object(o) from DataFile as o order by o.id", DataFile.class).getResultList();
}

public List<VersionState> findVersionStates(Long fileId) {
Query query = em.createQuery(
"select distinct dv.versionState from DatasetVersion dv where dv.id in (select fm.datasetVersion.id from FileMetadata fm where fm.dataFile.id=:fileId)");
query.setParameter("fileId", fileId);
return query.getResultList();
}

public DataFile save(DataFile dataFile) {

if (dataFile.isMergeable()) {
Expand Down
40 changes: 33 additions & 7 deletions src/main/java/edu/harvard/iq/dataverse/GuestbookResponse.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import java.util.List;
import jakarta.persistence.*;
import jakarta.validation.constraints.Size;
import java.util.Collections;
import java.util.Comparator;

/**
*
Expand Down Expand Up @@ -178,7 +180,7 @@ public GuestbookResponse(GuestbookResponse source){
this.setSessionId(source.getSessionId());
List <CustomQuestionResponse> customQuestionResponses = new ArrayList<>();
if (!source.getCustomQuestionResponses().isEmpty()){
for (CustomQuestionResponse customQuestionResponse : source.getCustomQuestionResponses() ){
for (CustomQuestionResponse customQuestionResponse : source.getCustomQuestionResponsesSorted() ){
CustomQuestionResponse customQuestionResponseAdd = new CustomQuestionResponse();
customQuestionResponseAdd.setResponse(customQuestionResponse.getResponse());
customQuestionResponseAdd.setCustomQuestion(customQuestionResponse.getCustomQuestion());
Expand Down Expand Up @@ -254,6 +256,18 @@ public String getResponseDate() {
public List<CustomQuestionResponse> getCustomQuestionResponses() {
return customQuestionResponses;
}

public List<CustomQuestionResponse> getCustomQuestionResponsesSorted(){

Collections.sort(customQuestionResponses, (CustomQuestionResponse cqr1, CustomQuestionResponse cqr2) -> {
int a = cqr1.getCustomQuestion().getDisplayOrder();
int b = cqr2.getCustomQuestion().getDisplayOrder();
return Integer.valueOf(a).compareTo(b);
});


return customQuestionResponses;
}

public void setCustomQuestionResponses(List<CustomQuestionResponse> customQuestionResponses) {
this.customQuestionResponses = customQuestionResponses;
Expand Down Expand Up @@ -317,7 +331,11 @@ public void setSessionId(String sessionId) {
this.sessionId= sessionId;
}

public String toHtmlFormattedResponse() {
public String toHtmlFormattedResponse(){
return toHtmlFormattedResponse(null);
}

public String toHtmlFormattedResponse(AuthenticatedUser requestor) {

StringBuilder sb = new StringBuilder();

Expand All @@ -326,17 +344,25 @@ public String toHtmlFormattedResponse() {
sb.append(BundleUtil.getStringFromBundle("dataset.guestbookResponse.respondent") + "<br><ul style=\"list-style-type:none;\">\n<li>"
+ BundleUtil.getStringFromBundle("name") + ": " + getName() + "</li>\n<li>");
sb.append(" " + BundleUtil.getStringFromBundle("email") + ": " + getEmail() + "</li>\n<li>");
sb.append(
" " + BundleUtil.getStringFromBundle("institution") + ": " + wrapNullAnswer(getInstitution()) + "</li>\n<li>");
sb.append(" " + BundleUtil.getStringFromBundle("position") + ": " + wrapNullAnswer(getPosition()) + "</li></ul>\n");
sb.append(" " + BundleUtil.getStringFromBundle("institution") + ": " + wrapNullAnswer(getInstitution()) + "</li>\n<li>");
sb.append(" " + BundleUtil.getStringFromBundle("position") + ": " + wrapNullAnswer(getPosition()) + "</li>");

//Add requestor information to response to help dataset admin with request processing
if (requestor != null){
sb.append("\n<li>" + BundleUtil.getStringFromBundle("dataset.guestbookResponse.requestor.id") + ": " + requestor.getId()+ "</li>");
sb.append("\n<li>" + BundleUtil.getStringFromBundle("dataset.guestbookResponse.requestor.identifier") + ": " + requestor.getIdentifier()+ "</li></ul>\n");
} else {
sb.append("</ul>\n");
}

sb.append(BundleUtil.getStringFromBundle("dataset.guestbookResponse.guestbook.additionalQuestions")
+ ":<ul style=\"list-style-type:none;\">\n");

for (CustomQuestionResponse cqr : getCustomQuestionResponses()) {
for (CustomQuestionResponse cqr : getCustomQuestionResponsesSorted()) {
sb.append("<li>" + BundleUtil.getStringFromBundle("dataset.guestbookResponse.question") + ": "
+ cqr.getCustomQuestion().getQuestionString() + "<br>"
+ BundleUtil.getStringFromBundle("dataset.guestbookResponse.answer") + ": "
+ wrapNullAnswer(cqr.getResponse()) + "</li>\n");
+ wrapNullAnswer(cqr.getResponse()) + "</li>\n<br>");
}
sb.append("</ul>");
return sb.toString();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ public String getMessageTextBasedOnNotification(UserNotification userNotificatio
GuestbookResponse gbr = far.getGuestbookResponse();
if (gbr != null) {
messageText += MessageFormat.format(
BundleUtil.getStringFromBundle("notification.email.requestFileAccess.guestbookResponse"), gbr.toHtmlFormattedResponse());
BundleUtil.getStringFromBundle("notification.email.requestFileAccess.guestbookResponse"), gbr.toHtmlFormattedResponse(requestor));
}
return messageText;
case GRANTFILEACCESS:
Expand Down
Loading

0 comments on commit d946b06

Please sign in to comment.