Skip to content
@internetarchive

Internet Archive

The Internet Archive is "the library of the Internet", and a big supporter of Free Software.

Pinned Loading

  1. openlibrary openlibrary Public

    One webpage for every book ever published!

    Python 6.2k 1.8k

  2. bookreader bookreader Public

    The Internet Archive BookReader

    JavaScript 1.1k 475

  3. heritrix3 heritrix3 Public

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 3.2k 782

  4. cicd cicd Public

    build & test using github registry; deploy to nomad clusters

    21 1

Repositories

Showing 10 of 268 repositories
  • RevisionChest Public

    Transforms Wikipedia XML dumps into a more compact, stream-friendly format

    internetarchive/RevisionChest’s past year of commit activity
    Rust 0 GPL-3.0 0 0 0 Updated Feb 5, 2026
  • openlibrary Public

    One webpage for every book ever published!

    internetarchive/openlibrary’s past year of commit activity
    Python 6,158 AGPL-3.0 1,751 799 (16 issues need help) 195 Updated Feb 5, 2026
  • heritrix3 Public

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    internetarchive/heritrix3’s past year of commit activity
    Java 3,180 782 30 4 Updated Feb 5, 2026
  • internetarchive/iaux-collection-browser’s past year of commit activity
    TypeScript 8 AGPL-3.0 1 2 23 Updated Feb 4, 2026
  • elements Public

    A web component library from the Internet Archive

    internetarchive/elements’s past year of commit activity
    TypeScript 6 AGPL-3.0 0 8 7 Updated Feb 4, 2026
  • iaux-modal-manager Public

    A Modal Manager WebComponent

    internetarchive/iaux-modal-manager’s past year of commit activity
    TypeScript 3 AGPL-3.0 1 1 16 Updated Feb 4, 2026
  • Zeno Public

    State-of-the-art web crawler 🔱

    internetarchive/Zeno’s past year of commit activity
    Go 374 AGPL-3.0 52 36 (2 issues need help) 9 Updated Feb 4, 2026
  • brozzler Public

    brozzler - distributed browser-based web crawler

    internetarchive/brozzler’s past year of commit activity
    Python 783 Apache-2.0 111 36 19 Updated Feb 4, 2026
  • infogami Public Forked from infogami/infogami
    internetarchive/infogami’s past year of commit activity
    Python 48 AGPL-3.0 51 10 3 Updated Feb 4, 2026
  • wiki-references-db Public

    Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikis

    internetarchive/wiki-references-db’s past year of commit activity
    Python 7 GPL-3.0 0 3 0 Updated Feb 4, 2026