Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRS-72 Handle deduplicated attachment multiparts #81

Merged
merged 14 commits into from
May 12, 2022
Merged

LRS-72 Handle deduplicated attachment multiparts #81

merged 14 commits into from
May 12, 2022

Conversation

milt
Copy link
Member

@milt milt commented May 11, 2022

LRS-72
Currently lrs expects attachment requests to contain one attachment part for every attachment reference in statement data. The spec language is ambiguous in this regard: https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Communication.md#requirements-for-attachment-statement-batches

It does however say that LRPs should normalize/deduplicate attachments being sent, so LRS (and its impls) should handle this.

Example with multiple references + deduplicated attachments that currently fails on our system:

failing example
POST /xapi/statements HTTP/1.1
X-Experience-API-Version: 1.0.3
Content-Type: multipart/mixed; boundary=105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Authorization: 
Host: localhost:8080
Connection: close
User-Agent: Paw/3.3.1 (Macintosh; OS X/12.3.1) GCDHTTPRequest
Content-Length: 1841


--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Content-Type:application/json

{
    "actor": {
        "mbox": "mailto:sample.agent@example.com",
        "name": "Sample Agent",
        "objectType": "Agent"
    },
    "verb": {
        "id": "http://adlnet.gov/expapi/verbs/answered",
        "display": {
            "en-US": "answered"
        }
    },
    "object": {
        "id": "http://www.example.com/tincan/activities/multipart",
        "objectType": "Activity",
        "definition": {
            "name": {
                "en-US": "Multi Part Activity"
            },
            "description": {
                "en-US": "Multi Part Activity Description"
            }
        }
    },
    "attachments": [
        {
            "usageType": "http://example.com/attachment-usage/test",
            "display": { "en-US": "A test attachment" },
            "description": { "en-US": "A test attachment (description)" },
            "contentType": "text/plain; charset=ascii",
            "length": 27,
            "sha2": "495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a"
        },
        {
            "usageType": "http://example.com/attachment-usage/test",
            "display": { "en-US": "A test attachment" },
            "description": { "en-US": "A test attachment (description)" },
            "contentType": "text/plain; charset=ascii",
            "length": 27,
            "sha2": "495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a"
        }
    ]
}
--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Content-Type:text/plain
Content-Transfer-Encoding:binary
X-Experience-API-Hash:495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a

here is a simple attachment
--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0--

Example with multiple references and duplicated attachments that succeeds:

passing example
POST /xapi/statements HTTP/1.1
X-Experience-API-Version: 1.0.3
Content-Type: multipart/mixed; boundary=105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Authorization: 
Host: localhost:8080
Connection: close
User-Agent: Paw/3.3.1 (Macintosh; OS X/12.3.1) GCDHTTPRequest
Content-Length: 2081


--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Content-Type:application/json

{
    "actor": {
        "mbox": "mailto:sample.agent@example.com",
        "name": "Sample Agent",
        "objectType": "Agent"
    },
    "verb": {
        "id": "http://adlnet.gov/expapi/verbs/answered",
        "display": {
            "en-US": "answered"
        }
    },
    "object": {
        "id": "http://www.example.com/tincan/activities/multipart",
        "objectType": "Activity",
        "definition": {
            "name": {
                "en-US": "Multi Part Activity"
            },
            "description": {
                "en-US": "Multi Part Activity Description"
            }
        }
    },
    "attachments": [
        {
            "usageType": "http://example.com/attachment-usage/test",
            "display": { "en-US": "A test attachment" },
            "description": { "en-US": "A test attachment (description)" },
            "contentType": "text/plain; charset=ascii",
            "length": 27,
            "sha2": "495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a"
        },
        {
            "usageType": "http://example.com/attachment-usage/test",
            "display": { "en-US": "A test attachment" },
            "description": { "en-US": "A test attachment (description)" },
            "contentType": "text/plain; charset=ascii",
            "length": 27,
            "sha2": "495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a"
        }
    ]
}
--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Content-Type:text/plain
Content-Transfer-Encoding:binary
X-Experience-API-Hash:495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a

here is a simple attachment
--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0
Content-Type:text/plain
Content-Transfer-Encoding:binary
X-Experience-API-Hash:495395e777cd98da653df9615d09c0fd6bb2f8d4788394cd53c56a3bfdcd848a

here is a simple attachment
--105423a5219f5a63362a375ba7a64a8f234da19c7d01e56800c3c64b26bb2fa0--

Additionally, passing to the impl is an open question:

So this leads to another question, which is passing them to the impl: Efficiency would dictate that it not do what xapipe does (denormalize/duplicate by copying attachments so they are 1-1 with attachment objects) and just pass them normalized/deduplicated to the impl. This would however likely require some code changes on the impl to handle it (since they have been getting denormalized attachments since their inception)

So the options for passing to the impl are:

  1. denormalize/duplicate them (no changes to current impls, less efficient)
  2. normalize/deduplicate them (some changes to current impls, more efficient)
  3. pass them as received (pushes handing to impl) - This scares me, because a client could duplicate some but not all attachments and still not run afoul of the spec

In its current state this PR takes option 2, but that's up for discussion!

@milt milt marked this pull request as ready for review May 11, 2022 20:58
README.md Outdated Show resolved Hide resolved
Comment on lines +178 to +181
(let [sha (get headers "X-Experience-API-Hash")]
(if-let [extant (get m sha)]
m
(assoc m sha multipart))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that attachments with different data will have different SHAs, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, even if it isn't mathematically true 🙈 . Unfortunately there isn't a stable way to tie an attachment to a specific referencing object in a statement batch really, so there isn't much to be done about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you may want to add that to the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remote, theoretical possibility of a hash collision is outside the scope of the docs. At any rate, the docs link to this PR 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so hash collisions aren't that common with SHAs? From our past convos I got the impression that the SHA2 algorithm was busted and that collisions were the order of the day. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's like SHA-256 it's quite unlikely. I'm actually a little confused about the spec language here, it only says that:

A Learning Record Provider SHOULD use SHA-256, SHA-384, or SHA-512 to populate the "sha2" property.

And of course doesn't require the LRS to validate it, because how would it know the alg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes we like to assume that impossible things will happen, it's just that in this instance the spec limits us.

@milt milt merged commit 6a61ad9 into master May 12, 2022
@milt milt deleted the LRS-72 branch May 12, 2022 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants