ConfidenceClassification

Confidence in Classification using Conformal Inference

This project shows how to take the log probabilities from an LLM, and determine thresholds to set the recall or estimate the false positive rate given a probability threshold.

It uses data from the Jigsaw toxic comment Kaggle competition. You need to download the train.csv.zip file to be able to replicate. Run the code in 01 (which requires to wait until a prior batch is finished to submit the next batch), then once all batches are finished, can run 02. 03 then analyzes the data.

For python environment, just need the openai library, requests, scipy, and pandas.

Uses OpenAI (gpt-4o-mini) to do the classification and get the log-probs in batches.

See blog post for description

This is an appendix to the book, Large Language Models for Mortals: A Practical Guide for Analysts with Python. Buy the book!

Name		Name	Last commit message	Last commit date
parent directory ..
01_save_ratings.py		01_save_ratings.py
02_get_batches.py		02_get_batches.py
03_analyze_data.py		03_analyze_data.py
MetricConfSamp.csv.zip		MetricConfSamp.csv.zip
MetricTestSamp.csv.zip		MetricTestSamp.csv.zip
README.md		README.md
binary_plots.py		binary_plots.py
conformal_fp.py		conformal_fp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Confidence in Classification using Conformal Inference

FilesExpand file tree

ConfidenceClassification

Directory actions

More options

Directory actions

More options

Latest commit

History

ConfidenceClassification

Folders and files

parent directory

README.md

Confidence in Classification using Conformal Inference