Skip to content

Embed behavior makes .frame's results hard to work with #119

@jmandel

Description

@jmandel

Executive summary

The framing algorithm's approach to "multiple embeds" makes it hard for developers to work with framed results.

Background

Developers want to frame JSON-LD payloads in ways that make them simple to work with. For example:

  • discover subjects of interest
  • loop over these subjects
  • resolve nested data with consistent paths

But in the current framing algorithm, machinery for avoiding circularity and avoiding verbose output introduces complexity for developers. Best to understand with an example.

Example

I'll illustrate with MedicationLists that have Medications that have DrugCodes with titles and identifiers:
Framing Problem: example in Playground

How developers want framing to work:

jsonld.frame(raw_data, function(err, response){
    response['@graph'].forEach(function(medlist){
        medlist.hasMedications.forEach(function(med){
            console.log("Drug: " + med.drugCode.title + "::" + med.drugCode.identifier);
        });
    });
});

... but in the example above, when we hit ['@graph'][0].hasMedication[2].drugCode we find a reference, not an embed! It takes severely defensive progrmaming to avoid this.

How developers need to work around the current framing behavior:

Since framed results don't reliably re-embed resources, developers need to check at each step whether an object is a reference or an embed. This means first creating a hash of known embeds, and then looking up values in this hash at every step through the framed result.

jsonld.frame(raw_data, medframe, function(err, response) {

    // identify an embed for each subject to resolve references 
    var subjects = {}
    findSubjects(subjects, med_response['@graph']);

    response['@graph'].forEach(function(medlist){
        medlist.hasMedications.forEach(function(med){

            // need to ensure drugCode is an embed, not a reference
            var drugCode = subjects[med.drugCode['@id']];

            console.log("Drug code: " + drugCode.title + "::" + drugCode.identifier);
        });
    });
});

// pseudocode for finding subject embds in framed results
function findSubects(subjects, subtree) {
    if (_isArray(subtree)) {
        subtree.forEach(function(elt){
            findSubject(subjects, elt);
        });

        return;
    }

    if (_isEmbed(subtree)) {
        subjects[subtree['@id']] = subtree;
    }

    if (_isObject(subtree)) {
        for (k in subtree) {
            findSubjects(subjects, subtree[k]);
        }
    }
};

And the workaround isn't complete

This workaround presents limitations. For instance:

  • How to deal with subjects that are supposed to be framed in different ways?
  • How to properly implement _isEmbed?

Proposal: aggressive re-embedding

I'd recommend re-embedding resources aggressively -- right up to (but not crossing) the point of creating circular references. There are some risks here, including an explosion in the framing output size for graphs rich in bidirectional links. Does anyone have ideas for mitigating this explosion?

(One alternative approach is to allow a mode of operation that doesn't produce a serializable framing output, but instead produces an in-memory structure with potential circularity. For many applications, this in-memory, potentially circular structure is a very natural fit for developers' goals. This could be separate from framing, if there were a simple, consistent way to take a serialized framed result and convert to an appropriate in-memory structure.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions