Activity Feed

AI & ML interests

None defined yet.

Recent Activity

MikeDoes 
posted an update about 13 hours ago
view post
Post
86
Are you sure the open-source model you just downloaded is safe?

A recent paper on "Privacy Backdoors" reports a new vulnerability where pre-trained models can be poisoned before fine-tuning them. This is a serious challenge for everyone building on open-source AI.

Instead of just pointing out problems, we believe in finding better solutions. To understand this threat, the researchers needed to test their attack on realistic data structures. They needed a dataset that could effectively simulate a high-stakes privacy attack, and we're proud that our Ai4Privacy dataset was used to provide this crucial benchmark. The paper reports that for our complex dataset, the privacy leakage on a non-poisoned model was almost zero. After the backdoor attack, that number reportedly jumped to 87%.

Ai4Privacy dataset provided a realistic benchmark for their research. Our dataset, composed of synthetic identities, helped them demonstrate how a poisoned model could dramatically amplify privacy leakage.

This is why we champion open source: it enables the community to identify these issues and develop better, safer solutions together.
Kudos to the authors Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, and Nicholas Carlini, University of Maryland and Google DeepMind.

🔗 Read the research to understand this new challenge: https://arxiv.org/pdf/2404.01231

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
MonsterMMORPG 
posted an update about 20 hours ago
view post
Post
1608
SECourses Musubi Trainer upgraded to V27 and FLUX 2, FLUX Klein, Z-Image training added with demo configs - amazing VRAM optimized - read the news

App is here : https://www.patreon.com/posts/137551634

Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y
  • 1 reply
·
MikeDoes 
posted an update 3 days ago
view post
Post
3555
A single lock on a door isn't enough. Real security is about layers.

The same is true for AI privacy. A new paper, "Whispered Tuning", offers a fantastic layered solution that aims to fortify LLMs against privacy infringements.

We're proud that the first, essential layer, a high-precision PII redaction model was built on the foundation of the Ai4Privacy/pii-65k dataset.

Our dataset provided the necessary training material for their initial anonymization step, which then enabled them to develop further innovations like differential privacy fine-tuning and output filtering. This is a win-win: our data helps create a solid base, and researchers build powerful, multi-stage privacy architectures on top of it.

Together, we're making AI safer.

🔗 Read the full paper to see how a strong foundation enables a complete privacy solution: https://www.scirp.org/journal/paperinformation?paperid=130659

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
MonsterMMORPG 
posted an update 5 days ago
view post
Post
1586
LTX 2 & Z Image Base Full Tutorial + Audio to Video Lip Sync + ComfyUI + SwarmUI + Windows + Cloud

Full tutorial link > https://www.youtube.com/watch?v=SkXrYezeEDc

Info
LTX 2 is the newest state of the art (SOTA) Open Source video generation model and tutorial will show you how to use it with very best and most performant way in ComfyUI and also in SwarmUI. Moreover, Z Image Base model published and I will show how to use Z Image Base with most amazing preset and workflow as well. Furthermore, this tutorial will show you how to install, update, setup, download ComfyUI and SwarmUI and models and presets and workflows both on Windows and on RunPod, Massed Compute and SimplePod. Linux users can use Massed Compute scripts and installers directly. This is a masterpiece entire lecture level complete tutorial. This video will kickstart your AI journey 100x. Both local Windows and Cloud.

45 Second Raw Demo Video

This video made with text + image + audio = lip synched and animated video at once

See video below
  • 3 replies
·
MikeDoes 
posted an update 8 days ago
view post
Post
127
How can we teach a robot to understand the nuances of privacy in elderly care? It starts with teaching it to recognize sensitive data.

A new conceptual paper introduces "Privacy Agents" an AI designed to safeguard contextual integrity in care settings. To demonstrate that their innovative concept is feasible, the researchers from TU Wien needed to prove an AI could identify PII in a real-world transcript.

We're proud that the tool they used for this proof-of-concept was fine-tuned on the Ai4Privacy/pii-masking-200k dataset.

This is a perfect win-win: brilliant researchers are designing the future of privacy-aware robotics, and our open-source data helps provide the foundational tools to show it's possible. This is how conceptual breakthroughs become practical solutions.

🔗 Check out their forward-thinking paper on the future of privacy in Human-Robot Interaction: http://hirschmanner.com/publication/privacy-hri-2024/privacy-hri-2024.pdf

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
MikeDoes 
posted an update 10 days ago
view post
Post
1756
What happens when an LLM "forgets" your data? A new paper reports it might not be gone for good.

The "Janus Interface" paper details a new attack that could recover forgotten PII through fine-tuning APIs. This is a solution-oriented paper because it highlights a problem that needs fixing.

Testing such a high-stakes attack requires equally high-stakes data. The Ai4Privacy 300k dataset was a key part of their evaluation, providing a testbed for extracting sensitive Social Security Numbers. Our dataset, with its synthetic structured SSN data, helped the researchers at Indiana University, Stanford & CISPA, and others demonstrate that their attack works on more than just emails. It could affect highly sensitive personal identifiers.

We're excited to see our open-source dataset used in such cutting-edge security research. It's a win for the community when researchers can use our resources to stress-test the safety of modern AI systems. This work is a direct and explicit call for stronger protections on fine-tuning interfaces.

🔗 This is why open data for security research is so important. Check out the full paper: https://arxiv.org/pdf/2310.15469

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
  • 1 reply
·
Smooke 
posted an update 15 days ago
view post
Post
1144
New
HackerNoon
Post: The Words of Interest Benchmark Test For Matching an LLM to Your Interests https://hackernoon.com/the-words-of-interest-benchmark-test-for-matching-an-llm-to-your-interests

By picking individual words instead phrases or paraphrases or passages, this test bypasses plot summaries (which are everywhere regurgitating themselves online) and focuses on the author's words. It reveals whether an AI has truly "absorbed" the specific texture of a book or is simply echoing the general internet consensus.
MikeDoes 
posted an update 16 days ago
view post
Post
1028
Why choose between performance, privacy, and transparency when you can have all three?

We're highlighting a solution-oriented paper that introduces PRvL, an open-source toolkit for PII redaction. The interesting part, the researchers used the AI4Privacy-300K and AI4Privacy-500K datasets to train and benchmark their suite of models.

This is the power of open-source collaboration. We provide the comprehensive data foundation, and the community builds better solutions on top of it. It's a win for every organization when this research results in a powerful, free, and self-hostable tool that helps keep their data safe.

Big cheers to Leon Garza, Anantaa Kotal, Aritran Piplai, Lavanya Elluri, Prajit D., and Aman Chadha for pulling this off.

🔗 Read the full paper to see their data-driven results and access the PRvL toolkit: https://arxiv.org/pdf/2508.05545

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset
MikeDoes 
posted an update 17 days ago
view post
Post
3020
How do you prove your new, specialized AI model is a better solution? You test it against the best.

That's why we were excited to see the new AdminBERT paper from researchers at Nantes Université and others. To show the strength of their new model for French administrative texts, they compared it to the state-of-the-art generalist model, NERmemBERT.

The direct connection to our work is clear: NERmemBERT was trained on a combination of datasets, including the Pii-masking-200k dataset by Ai4Privacy.

This is a perfect win-win for the open-source community. Our foundational dataset helps create a strong, general-purpose benchmark, which in turn helps researchers prove the value of their specialized work. This is how we all get better.

🔗 Great work by Thomas Sebbag, Solen Quiniou, Nicolas Stucky, and Emmanuel Morin on tackling a challenging domain! Check out their paper: https://aclanthology.org/2025.coling-main.27.pdf

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset
MonsterMMORPG 
posted an update 18 days ago
view post
Post
3992
Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2 - Full 4K step by step tutorial also published

Full 4K tutorial : https://youtu.be/XDzspWgnzxI

Check above full 4K tutorial to learn more and see uncompressed original quality and size images

It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.
·
mmhamdy 
posted an update 21 days ago
view post
Post
3043
The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable length😄

Here's a nice short summary from Gemini
MikeDoes 
posted an update 22 days ago
view post
Post
234
The future of AI privacy isn't just in the cloud; it's on your device. But how do we build and validate these tools?

A new paper on "Rescriber" explores this with a tool that uses smaller LLMs for on-device anonymization. Building and validating such tools requires a strong data foundation. We're excited to see that the researchers used the Ai4Privacy open dataset to create their performance benchmarks.

This is our mission in action: providing the open-source data that helps innovators build and test better solutions that will give users more control over their privacy. It's a win for the community when our data helps prove the feasibility of on-device AI for data minimization, with reported user perceptions on par with state-of-the-art cloud models.

Shoutout to Jijie Zhou, Eryue Xu, Yaoyao Wu, and Tianshi Li on this one!

🔗 Check out the research to see how on-device AI, powered by solid data, is changing the game: https://dl.acm.org/doi/pdf/10.1145/3706598.3713701

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset
MikeDoes 
posted an update 24 days ago
view post
Post
2320
Building powerful multilingual AI shouldn't mean sacrificing user privacy.

We're highlighting a solution-oriented report from researchers Sahana Naganandh, Vaibhav V, and Thenmozhi M at Vellore Institute of Technology that investigates this exact challenge. The direct connection to our mission is clear: the paper showcases the PII43K dataset as a privacy-preserving alternative to high-risk, raw multilingual data

The report notes that our dataset, with its structured anonymization, is a "useful option for privacy-centric AI applications." It's always a delight when academic research independently validates our data-first approach to solving real-world privacy problems.

This is how we build a safer AI future together.

🔗 Read the full report here to learn more: https://assets.cureusjournals.com/artifacts/upload/technical_report/pdf/3689/20250724-59151-93w9ar.pdf

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset

  • 1 reply
·
MonsterMMORPG 
posted an update 24 days ago
view post
Post
2797
NVFP4 With CUDA 13 Full Tutorial, 100%+ Speed Gain + Quality Comparison & New Cheap Cloud SimplePod

Full tutorial: https://www.youtube.com/watch?v=yOj9PYq3XYM

Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.

In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.

📂 Resources & Links:
ComfyUI Installers: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]

SimplePod: [ https://simplepod.ai/ref?user=secourses ]

SwarmUI Installer, Model Auto Downloader and Presets: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]

How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]

SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]

NVIDIA NVFP4 Blog Post More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]
  • 1 reply
·
MikeDoes 
posted an update 30 days ago
view post
Post
1327
We can't build more private AI if we can't measure privacy intelligence.

That's why we're highlighting the Priv-IQ benchmark, a new, solution-oriented framework for evaluating LLMs on eight key privacy competencies, from visual privacy to knowledge of privacy law. The direct connection to our work is clear: the researchers relied on samples from the Ai4Privacy dataset to build out questions for Privacy Risk Assessment and Multilingual Entity Recognition.

This is the power of open-source collaboration. We provide the data building blocks, and researchers construct powerful new evaluation tools on top of them. It's a win-win for the entire ecosystem when we can all benefit from transparent, data-driven benchmarks that help push for better, safer AI.

Kudos to Sakib Shahriar and Rozita A. Dara for this important contribution. Read the paper to see the results: https://www.proquest.com/docview/3170854914?pq-origsite=gscholar&fromopenview=true&sourcetype=Scholarly%20Journals

#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset