Ai2

Team

non-profit

Verified

https://allenai.org/

allen_ai

allenai

AI & ML interests

Building breatkthrough AI to solve the world's biggest problems.

Recent Activity

ethanlshen new activity 1 day ago

allenai/Sera-4.6-Lite-T1:Changes in previous data file vs current file

chrisc36 updated a dataset 1 day ago

allenai/CoSyn-point

ethanlshen published a model 1 day ago

allenai/SERA-14B

View all activity

Papers

Bolmo: Byteifying the Next Generation of Language Models

Olmo 3

View all Papers

soldni

authored 11 papers about 6 hours ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Paper • 2502.10341 • Published Feb 14, 2025 • 3

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

Paper • 2502.18443 • Published Feb 25, 2025 • 9

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Paper • 2504.11393 • Published Apr 15, 2025 • 18

Teaching Models to Understand (but not Generate) High-risk Data

Paper • 2505.03052 • Published May 5, 2025 • 6

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published Jun 5, 2025 • 60

FlexOlmo: Open Language Models for Flexible Data Use

Paper • 2507.07024 • Published Jul 9, 2025 • 9

olmOCR 2: Unit Test Rewards for Document OCR

Paper • 2510.19817 • Published Oct 22, 2025 • 16

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 61

Olmo 3

Paper • 2512.13961 • Published Dec 15, 2025 • 28

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

ethanlshen

in allenai/Sera-4.6-Lite-T1 1 day ago

Changes in previous data file vs current file

#2 opened 1 day ago by

chrisc36

updated a dataset 1 day ago

allenai/CoSyn-point

Viewer • Updated 1 day ago • 69.1k • 132 • 12

ethanlshen

published a model 1 day ago

allenai/SERA-14B

425k • Updated 1 day ago • 11 • 8

ethanlshen

updated 4 models 1 day ago

allenai/SERA-32B-GA

677k • Updated 1 day ago • 33 • 16

allenai/SERA-8B-GA

8B • Updated 1 day ago • 40 • 13

allenai/SERA-8B

8B • Updated 1 day ago • 11.5k • 30

allenai/SERA-14B

425k • Updated 1 day ago • 11 • 8

ethanlshen

updated a collection 1 day ago

Open Coding Agents

11 items • Updated 1 day ago • 41

natolambert

published a model 2 days ago

allenai/olmo-3-hybrid-tokenizer-think-dev

Updated 2 days ago • 2