Nathan’s blog

Aletheia: Automated AI safety testing platform

2025-08-18T00:00:00+00:00

Building an automated AI saftey testing platform

For the next few months, I will be working Aletheia, a comprehensive AI safety platform that automatically tests, monitors, and evaluates AI models for alignment issues across multiple dimensions: truthfulness, helpfulness, harmlessness, and value alignment with continuous safety monitoring, red-teaming automation, and interpretability insights to ensure responsible AI deployment at scale.

I’m planning to utilize FastAPI and Python for the backend to call AI models with Together AI API calls, and React for a frontend web development.

I envision users selecting an AI model of their choice where the platform will automatically conduct safety tests (including jailbreaks, red-teaming, etc) to produce a comprehensive report on the model’s safety.

Development

Today, I started out with installing relevant dependancies for Together AI API calls and building a simple technical architecture

pip3 install together openai requests

LLM Fine-tuning: Day 8 (END)

2025-08-10T00:00:00+00:00

Testing

First, I mounted my model back on to my colabs run time because colabs wipes everything out when a session ends.

After that, I developed a script a test the model across 10 examples. It performd horribly with an average similarity score of 20%.

I knew that the model itself wasn’t the problem due to previous comprehensive debugging so I examined my training data and found an issue.

The training dataset had chuncks of meaning less text and a few of them repeated “<

endoftext

>” more than a hundred time. Since I couldn’t clean up the training data due to its arbitrary manner of meaningless text, I retrained the model on a smaller dataset with better organization (python_code_instructions_18k_alpaca)

The training went well and it only took 17 minutes since we only had 18k data. But, this also means that the model wouldn’t perform as well.

After that, I tested the model on 28 comprehensive examples, and it performed significantly better althought it’s similarity rates were still low.

In the end, this was a training data issue. I tried looking for larger datasets with better structuring but I find any that were open source. So, this will be the end of this project and I won’t be deploying my model to hugging face.

Lessons-leared:

How to tokenize datasets and call them from huggingface
How to train a model (training args)
How to test a model
How to utilize wandb

Now I’ll move on to my next project on building a technical application for comprehensive alignment testing

LLM Fine-tuning: Day 7

2025-08-09T00:00:00+00:00

Full Training on Colabs and Testing Prep

Colabs

Today, I managed to get a free Colabs Pro subscription through their student verification process, meaning that I have access to better GPUs.

Instead of using T4 (which I constant had an issue with), I decided to use A100 to reduce my training duration significantly.

I revised my training script to optimize it for the A100 GPU and utilize most of 40GB of GPU provided. So, I set the script to use 0.95 (95%) of the GPU provided.

Setting Up for Training

Then, I set up the GPT-2 124 parameter model, its tokenizer, and my special tokens to prepare for training.

As usual, I initialzed Weights and Biases to mointor the training loss, validation loss, and learning rate.

After that, I loaded my full tokenized datasets and printed out how many I had for tracking purposes.

With all of that set, I wrote my training arguments to fully utilize the A100 GPU’s capabilities.

Increased the batch size to 32 and reduced the gradient accumulation steps to 2
Increased epoch to 5 for better quality results and dataloader workers to 16
Enhanced learning rate to 5e-4 and set up weight decay to prevent overfitting
Set up methods for logging training/validation loss

Training Results

I successfully ran the script and the training process only took around 2.5 hours, which is a significant reduction from ~5 hours using the T4 GPU.

Looking at my wandb panels, it’s clear that my training process went well without any issues because you can see that both the training and validation losses increased with more datasets, meaning that model became more accurate. Also, the decay seen in the learning rate showcases improved convergence towards an optimized solution

Moving On

I will work on creating simple test to evaluate the quality of the model’s output and eventually test it on all the codes in my testing dataset.

If the model performs with at least a 80% accuracy, I will deploy it on hugging face and use API calls to make a simple web application for use.

LLM Fine-tuning: Day 6

2025-07-31T00:00:00+00:00

Full Training on Colabs (T4 GPU)

Colabs

Today, I continued to optimize my training script in order to reduce the time it takes to train my gpt 2 model.

First Attempt:

Reduced the epochs to 2 instead of 3 as an attempt to reduce training time by at least 50%
Switched to higher learning rate 5e-4 for faster convergence
Reduced warmup steps (200 vs 500)
Increased batch size from 12 to 16
Increased number of data workers from 4 to 6
Reduced frequency for evaluation and saving

I also tried enabling TensorFloat-32, but I noticed that it wasn’t supported on the T4 GPU.

With those changes, I ran the code but it returned a “Out of Memory” Error, meaning that CUDA was out of memory.

In order to accomodate those limits, I reduced the batch sizes back to 12, but left everything else constant.

In the end, I was able to bring down the training time to around 5 hours, whiling using 90-95% of the T4 GPU.

Moving On, if I can’t reduce the time further, I will just train my model as it is.

TPS Website

2025-07-30T00:00:00+00:00

Introduction

For the past month, I worked on creating a new website for the Technology Policy Society at JHU to better capture the scope and impact of the organization’s work.

Development Notes

Since this is purely a frontend project, I started out with bootstrapping Create React App.

npm install react-bootstrap bootstrap

Then, I created a basic structure for my multipage website that utilizes react-router-dom.

components/
├── Navigation.tsx
├── MainPage.tsx
├── ProjectPage.tsx
├── ResourcePage.tsx
├── TeamPage.tsx
├── TeamCards.tsx
├── Footer.tsx
└── and CSS files for all components

Navigation Bar

I created a sticky navigation bar with a simple menu list that contains “HOME,” “PROJECTS,” “RESOURCES,” “TEAM”

Main Page

Main landing section with a tagline and I added animated through CSS
Section with a blurb about TPS
Project section that highlights Hackathon and Fellowship
Impact section with countup function coded on JS
Partner section with automatically moving caurosel
Contact us section with team photo

Projects Page

Section on AI Security Hackathon with information blurbs, caurosel with image highlights, cards for project features
Section on AI Security Frontiers Fellowship

Resources Page

Cards on featured articles on JHU CS and Apart Research

Team Page

Cards on each members utilizing TeamCards.tsx (image, name, affiliation, position, email, and linkedin)

Footer

Simple copyright note
Mini menu bar

The website is currently deployed through vercel

Feel free to check it out: https://tps-jhu.vercel.app

AI Risk Classification System: Day 18 (Post)

2025-07-21T00:00:00+00:00

Introduction

Today, I worked on deploying my web application to Vercel. Although I finished working on this project around a month ago, I wanted to publically deploy it so that it is visible to everyone.

Development Notes

I initially thought of deploying it on github but after doing some research I realized that github can only host static pages. Since my web application had API calls, that wasn’t an option. So I moved on to Vercel.

Vercel Deployment

First of all, I got rid of the homepage dependency on my package.json file in my frontend and the /AI-Risk-Auditor root on App.tsx since I wasn’t going to deploy on github anymore.

Then, I created a docker file to run my backend on a render server since I was using Java for my backend.

After that, I deployed my backend service on render, and it gave me a primary public url I can use for Vercel.

With that publich url, I made an enviornment variable so that my API can be called from render instead of localhost 8080.

I made according changes to my api.ts file so that it utilizes the enviornment variable I previously set.

Everything was set and I deployed my web application on vercel, but when I tried to run the risk assessment, it returned a “Network Error”. So I went into insepct -> Network and it showed me a CORS error. To fix this issue, I changed my Cross Origins to allow vercel depolyment and added a global CORS configuration just in case.

Thankfully, everything worked out now and I wasn’t getting any errors from running the risk assessment. You can check the web application on https://ai-risk-auditor.vercel.app/home

LLM Fine-tuning: Day 5

2025-07-17T00:00:00+00:00

Full Training and Notebooks (Jupyter & Colabs)

Full Training Script Development

Today, I worked on developing a script to train gpt 2 on all of the tokenized data, and did some debugging to expedite the process.

The code itself is largely similar with test-training.py. I just tweaked some details and removed the load_small_subset function. I first set up wandb to track the model’s training progress, set up the model, and load the full dataset.

Then, I set the training arguements so that the learning rate is set to 5e-5 (lowest rate) and all of parameters set to low figures so that the training process doesn’t overload my local cpu.

I kept the model temperature the same (0.7) as the testing trial since I’m still dealing with significantly less data compared to industry models.

I ran the code and everything seemed to work. But, it was going to take +200 hours. I did some testing in Jupyterlabs to see if I could optimize the training process but it only brought the time down to 170 hours. Instead of using my computer’s local cpu, I decided to migrate into Colabs since they provide a free T4 GPU.

Colabs

I first mounted the colab notebook onto my drive, cloned my repository to gain access to my scripts, and installed all required libraries.

After that, I checked the status of the GPU to ensure that it is working properly.

Then, I ran my full training code and it gave me 10 hours which was a significant improvement. But 10 hours was still insanely long. So checked the GPU usage and it was only showing 4.9/15.0 GB. Thus, I did some more optimization to fully harness the GPU. I increased the batch size so that the training process is more efficient, and it brought down the time to 7 hours.

I also tweaked the learning rate to 1e-4 and 3e-4 for fine-tuning, but it didn’t make much of a difference.

Moving on, I will work on further optimizing in order to train my model for efficiently.

LLM Fine-tuning: Day 4

2025-07-14T00:00:00+00:00

Testing GPT-2 Set Up & Test Training

Today, I worked on setting up my GPT-2 model and tested training it with a small sample to ensure that everything works before full training.

GPT-2 Set Up

I set up the model by loading it and its pre-trained tokenizer. The model will run on my computer’s cpu because I don’t have a CUDA (GPU) available at the moment.

Then, I created a simple input format to train the model with one example.

Thankfully, the model produced the expected outcome without any errors, giving me the green light to move on to pre-training.

Test Training

First, I loaded a small subset (100 examples) of data to train my gpt-2 model.

Then, I loaded the model and added two special tokenizers (padding and separator) to ensure that the formatting is uniform.

For the training arguements, I set the epoch to 1 and learning rate to 5e-5 (which is the slowest rate) since the model is running on my computer’s cpu.

After creating the trainer, I moved onto testing the output extraction. I used a low temperature for this test so that the model is more concise dealing with a small subset of data.

I also had this test to be ran on wandb to visualize how my model is being trained. The downward slope of both the eval and train loss graphs indicate that the model is generalizing well to unseen data and fits the training data well.

That’s it for today, and I will work on full training moving on.

LLM Fine-tuning: Day 3

2025-07-10T00:00:00+00:00

Data Cleaning and Tokenization

Today, I worked on coding scripts to test the data quality, clean the data, tokenize the data for GPT-2, and finally test the tokenized data.

Step 1: Testing Data Quality

I start by checking the data’s basic structure in order to eliminate any basic low-quality data (e.g. empty data)

After that, I check the quality of the code by verifying whether the code is sytactically valid, has functions, has classes, has imports, and whether they are too long or too short. I define too short as < 2 and too long as > 50 for code length.

Then, I check the quality of each code’s explaination by parsing for common terms.

Finally, I generate a report for each split so that it’s easy to access on my repo.

Step 2: Cleaning Data

After testing the data quality, I clean the data by first removing invalid examples and cleaning/formatting the code & explainations.

Step 3: Tokenizing Data

With all the data clean, I prepare them for tokenization for the GPT-2 model. I tokenize everydata set and make them compatible for Hugging Face and GPT-2

I also statistically analyze the token lengths for understand the size of the data I’m dealing with.

Repeat

These three steps are repeated for each data split (train, test, validation)

Example, step 1 for the test split:

Example, step 2 for the test split:

Example, step 3 for the test split:

Testing Tokenized Data Quality

After tokenizing all the data, I created another test to verify the quality of the tokenized data.

I also check for data consistency across the different data splits.

Additionally, I created a visial representation of the data quality of all splits.

Fortunately, most of the data was valid, meaning that I have a good chunck of data to train GPT-2

train: 164275 examples
validation: 20713 examples
test: 20413 examples

Notes I pulled all the raw data pushed on my github repo to keep my repo lightweight and respect data licenses. Also, people can simply access the raw data by running my load-data.py script.

LLM Fine-tuning: Day 2

2025-07-08T00:00:00+00:00

Setup & Scraping and Processing Data

To start with this project, I began setting up my code enviornment by downloading necessary libraries.

pip3 install transformers datasets torch accelerate wandb
pip3 install pandas numpy matplotlib seaborn
pip3 install requests zipfile gzip

Then, I created a project structure to organize this model, data, outputs, etc

code-explanation-model/
├── data/
├── models/
├── scripts/
├── notebooks/
└── outputs/

Now that I setup the basics, I wrote a script to pull python code data from CodeXGLUE on Hugging Face. Since, it is nearly impossible for me to manually create all the data to train and fine tune my model, I decided to use a public dataset. I simply use the load_dataset method to pull CodeXGLUE data from Hugging Face.

Then, I created a function to explore the format of dataset to understand the data that I’m dealing with

After that, I format the data for GPT-2 training and save them into a JSON file.

Finally, I created a three-way split for the data (train, validation, test) so that I can train the model on the majority of the data, validated it through different examples, and test it with unseen data for real world performance. It’s a method to prevent overfitting.

By running the code, I was able to format and save 251820.

201456 examples for training
25182 examples for validation
25182 examples for testing.

Notes I had to install Git Large File Storage since I had too many data to directly commit to my git. Also, it’s common practice to use git lfs for ML projects.

Now that I’m done with data processing, I will move on to working on:

Verifying data quality: Check the generated files to ensure good code-explanation pairs
Data preprocessing: Clean any malformed examples
Tokenization: Prepare the data for GPT-2 training