A collection of open datasets published by Simula Research Laboratory and SimulaMet.
Currently, we have published the following datasets:
Medical and Biology Datasets
- Cellular, A cell autophagy dataset. [project]
- Depresjon, The Depresjon Dataset. [publication | project]
- GastroVision, A multicenter dataset. [publication | project]
- HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [publication | project]
- HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [publication | project]
- HyperKvasir, The Largest Gastrointestinal Dataset. [publication | project]
- Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [publication | project]
- Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [publication | project]
- Kvasir Instrument, A gastrointestinal instrument Dataset. [publication | project]
- Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [publication | project]
- Kvasir-VQA, A Text-Image Pair GI Tract Dataset. [publication | project]
- Kvasir-VQA-x1, A Large-Scale Multi-Task Benchmark for GI Tract Visual Question Answering. [publication | project]
- KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [publication | project]
- MedMultiPoints, A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging. [publication | project]
- Medico Multimedia - VISEM Tracking, A sperm tracking dataset. [publication | project]
- Nerthus, A Bowel Preparation Quality Video Dataset. [publication | project]
- Psykose, A Motor Activity Database of Patients with Schizophrenia. [publication | project]
- VISEM, A Multimodal Video Dataset of Human Spermatozoa. [publication | project]
- VISEM QC, A sperm quality control dataset. [project]
Sport and Activity Datasets
- Alfheim, Soccer video and player position dataset. [publication | project]
- Arx, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [publication | project]
- ExposureEngine, Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts. [project]
- Heimdallr, A Dataset For Sport Analysis. [project]
- HockeyAI, A Multi-Class Ice Hockey Dataset for Object Detection. [publication | project]
- HockeyOrient, A Dataset for Ice Hockey Player Orientation Classification. [publication | project]
- HockeyRink, A Dataset for Precise Ice Hockey Rink Keypoint Mapping and Analytics. [publication | project]
- PMData, A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys. [publication | project]
- ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset. [project]
- Soccer Summarization, Soccer game captions and summary in English for game summarization. [publication | project]
- SoccerChat, A Multimodal Video-Text Dataset for Natural Language Soccer Game Understanding. [publication | project]
- SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams. [project]
- SoccerNet-Echoes, A Soccer Game Audio Commentary Dataset. [publication | project]
- SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch. [publication | project]
- TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos. [publication | project]
Other Datasets
- Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [publication | project]
- European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [publication | project]
- Eye Tracker, A Serious Game Based Dataset. [publication | project]
- HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios. [publication | project]
- Image Sentiment, A dataset for image sentiment analysis. [publication | project]
- Njord, A fishing boat dataset. [project]
- Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation. [project]
- THREAT, A Large Annotated Corpus for Detection of Violent Threats. [project]
- Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [publication | project]
- WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [publication | project]
- WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [publication | project]
To add a new dataset, follow these steps:
- Fork the Repository: Fork this repository to your GitHub account.
- Create a Markdown File: In your forked repository, navigate to the
datasetsfolder and create a new Markdown file (.md) for your dataset. The file name should be descriptive of the dataset. - Add Dataset Information: Copy and paste the following template into your Markdown file:
Fill in the template with the appropriate information about your dataset.
--- title: <dataset name> desc: <dataset description> thumbnail: <dataset thumbnail> publication: <link to publication> github: <link to github> tags: - <list of tags> ---
- Add a Dataset Thumbnail: Add a thumbnail to the dataset that will be displayed on the main page. The thumbnail should use a 16:9 aspect ratio, like
320 x 180or640 x 360pixels, and be placed underpublic/thumbnails. - Update the README: Update this README with the new dataset added under one of the categories above. Add links to the publication, code, or other things that may be useful.
- Create a Pull Request: Once you have added the Markdown file and filled in the dataset information, commit your changes. Push the changes to your forked repository. Create a pull request to merge your changes into the main repository.
If you have any questions or need assistance, please open an issue in the repository or contact steven@simula.no.