Skip to content

Conversation

@fthobe
Copy link
Contributor

@fthobe fthobe commented Apr 11, 2025

After a discussion in #4392 I hereby propose a broken link checker based on the work of @gaurav-nelson (Redhat).

Gaurav has developed a broken link checker that works via Github actions and allows us to exclude:

  • folders like release given the frequently outdated links there;
  • localhost for automated testing.

Given that his work is licensed under MIT License, I'd like to

  1. open up a repo in the Github organization of schmema.org;
  2. configure the system properly with a config.json to exclude irrelevant folders and files;
  3. add the workflow to schemaorg/schemaorg/

Output (truncated for visibility):

FILE: ./README.md
[✓] https://schema.org/
[✖] http://localhost:8080/
[✓] https://cloud.google.com/appengine/docs
[✓] https://github.com/schemaorg/schemaorg/wiki/Contributing
[✖] https://schema.org/version/
[✓] https://schema.org/docs/extension.html
[✖] https://twitter.com/schemaorg_dev

34 links checked.

ERROR: 3 dead links found!
[✖] http://localhost:8080/ → Status: 0
[✖] https://schema.org/version/ → Status: [40](https://github.com/fthobe/schemaorg/actions/runs/14401189416/job/40387131375#step:4:41)4
[✖] https://twitter.com/schemaorg_dev → Status: 400

Screenshot:

Image

@mfhepp
Copy link
Contributor

mfhepp commented Apr 12, 2025

Thanks - three comments:

  1. There seems to be a newer tool by the same author: https://github.com/UmbrellaDocs/action-linkspector
  2. I am a bit hesitant to add an external Github action to the workflow of such a high-visibility project, as supply-chain attacks and other bad things could happen (if e.g. the external action's repo gets compromised, e.g. introducing spam or malware links to documentation etc.
  3. We should not automatically remove links that are "broken", e.g. temporarily return a non-200 HTTP status. E.g. a temporary outage of purl.org should not automatically remove all respective links.

@fthobe
Copy link
Contributor Author

fthobe commented Apr 12, 2025

  1. There seems to be a newer tool by the same author: https://github.com/UmbrellaDocs/action-linkspector

I saw no tangibile benefit for schema and the old Solution is more lightweight.

  1. I am a bit hesitant to add an external Github action to the workflow of such a high-visibility project, as supply-chain attacks and other bad things could happen (if e.g. the external action's repo gets compromised, e.g. introducing spam or malware links to documentation etc.

Yes, i agree, That's Why i'd like to hard fork it into a Schema.org repo and not allow any modifications to schema by the action. I checked the license (permissive MIT) and that would be the road to go.

  1. We should not automatically remove links that are "broken", e.g. temporarily return a non-200 HTTP status. E.g. a temporary outage of purl.org should not automatically remove all respective links.

The Runner does not:

  • change the Code of the repo

It does solely Outputs a List in the Terminal of the GitHub Action with the broken links

@fthobe
Copy link
Contributor Author

fthobe commented Apr 17, 2025

@mfhepp any thoughts?

@github-actions
Copy link

This pull request is being nudged due to inactivity.

@github-actions github-actions bot added the no-pr-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Jul 31, 2025
@MatthiasWiesmann
Copy link
Contributor

I also don't like having such external dependencies. This could be trivially be added to the python tests.

@github-actions github-actions bot removed the no-pr-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants