Conversation
dgarijo
left a comment
There was a problem hiding this comment.
I think this PR needs more discussion.
It is not clear from the doc and spec what is the range of consumes data and produces data. Is it a specific dataset? Or a dataset type? (I understand the latter). If that's the case, from my experience in workflows most people tend to think that's a data format. For example "this software consumes CSV files" rather than "this software consumes metereological files in CSV format", which is different. Also, what happens if it consumes several types of data, with different roles? (e.g., a metereological file and a snowmelt file). These two properties may open a can of worms, and need to be better scoped if we want them to work for simple cases.
Mostly the latter indeed, it acts like a template based on what's specified, but I'm trying to keep the options open to also accommodate the first. If one I may need to clarify a bit more that this is a pretty high-level and descriptive proposal because I'm indeed afraid to open a can of worms otherwise. My aim here is just that we can encode and communicate to the end-user, in the software metadata, at least some information on what input and output a particular piece of software can accept or produce. Currently in codemeta, we don't have this ability at all. What I'm proposing is deliberately limited and more descriptive that prescriptive, it's fairly open-ended. More in line with codemeta than with things like OpenAPI which go into machine-parseable detail (aka the can of worms). You're not going to be able to automate calling tools or APIs on the basis of this information but it can at least communicate to the user some aspects of the data input/output of the software. I think this is valuable information for users/researchers for example to make a judgment based on the software metadata whether the software might be suitable for them and worth looking into. It could also be used for some automated tool suggestions.
The roles are not distinguished. If there are several types of input/output data, you can specify them all, but the precise relation between the things that are consumed and the things that are produced is not expressed, nor is whether it consumes/produces any or all of them. I'll see if I can explain it better in the text. |
|
We did this in https://knowledgecaptureanddiscovery.github.io/SoftwareDescriptionOntology/release/1.9.0/index-en.html#hasInput and If this profile is about specifying software types, I would leave this out, to be honest. If you want to include this type of information then maybe we can start a different profile? I insist on this because contributions to schema.org are usually very modular. If people do not understand something, or think it's too complicated, it won't be merged. |
I agree. It's probably best to start a different profile for this and keep both minimal. Let's do that. Can we settle on a name (something like
I see your DatasetSpecification indeed goes way deeper, for this new profile I wanted to keep things fairly simple. I also want to point back to the discussion at codemeta/codemeta#188 and the feedback there. One of the suggestions was to simply allow the whole of
Yes, initially I wanted to keep it as simple as possible which indeed limits what you can do automatically. The main aim would be to present the user some metadata about possible input/output types for the software (as opposed to nothing at all as it stands now with codemeta). I'm open to including and developing this more but a bit wary about the can of worms it might open. It would probably be best if we can start very simple (basically just these two properties) and leave some room for incorporating more later? |
|
I added some initial work on splitting and reworking things to https://github.com/proycon/software-iodata , but I'd rather move and push it to a new remote repo under https://github.com/SoftwareUnderstanding/ to keep things together, if you agree to collaborate further on this of course. |
|
@proycon I invited you as a member of the organization. Now you should be able to request a repo transfer. I am happy to collaborate towards defining a loose mechanism for defining i/o (the name is fine by me too). |
|
Then I guess this PR should be closed? |
|
Thanks! I sent a transfer request.
Yes, I see your point. Let's work that out further in the new repo. Perhaps we want to encapsulate the
Indeed, closing this now. |
This formalizes some of the earlier discussion in codemeta/codemeta#188 and introduces two new properties to allow us to describe input and output data for software on a high-level.
PS: this includes and builds upon the earlier PR #4