A bug in Bun may have been the root cause of the Claude Code source code leak.
submitted by /u/SuccessfulBowl2564
submitted by /u/SuccessfulBowl2564
got pulled into a meeting today. apparently we're adding an Agentic AI to the team. it will learn our environment, handle tasks autonomously, and integrate via API. it does not need onboarding, a desk, or health insurance. Great. i have one question nobody in that meeting could answer. how does it a
Simulation what the Qwen3.5 model family would look like using 1-bit technology and TurboQuant. The table below shows the results, this would be a revolution: Model Parameters Q4KM File (Current) KV Cache (256K) (Current) Hypothetical 1-bit Weights KV Cache 256K with TurboQuant Hypothetical Total Me
been overseeing our AI agent deployment and the numbers are alarming. we have 400 developers using AI coding agents (mixture of copilot and cursor). based on our API billing, each developer generates roughly 50,000-80,000 tokens per day in inference requests. at our scale that's about 20-30 million
Hey guys, I was wondering what recent state-of-the-art small language models are the best for general question-answering task (diverse topics including math)? Any good/bad experience with specific models? Thank you! submitted by /u/No-Mud-1902
New laws to make it easier to cancel subscriptions and get refunds
Happy to announce TQ34S. 2x faster, better quality than TQ31S, same size. Please note: on median PPL, Q3KS has slight edge. My next model has beaten Q3KS on medial but need more tweaking submitted by /u/Imaginary-Anywhere23
TL;DR: No LLM provider tells you what a model can do via API. So frameworks build their own registries. LiteLLM maintains a 2600+ entry modelcostmap, LangChain pulls from a third-party database (models.dev), and smaller projects just hardcode lists. None of this comes from the provider. A single cap
I'm curious about trying something I want to test which supposed to run 100% locally, Free, Offline using my PC Specs limits: Before I made this post I did a small test and it was very impressive for what it is and it made me wondering if I can push the limits to something better with more control f
Just been playing around with PrismML's 1-bit 8B LLM and its legit. Now the question is can turboquant be used with it? seemingly yes? (If so, then I'm really not seeing any real hurdles to agentic tasks done on device on today's smartphones..) submitted by /u/rm-rf-rm
A small Spam Detection model specifically fine-tuned to recognize spam content from text in Italian. The following types of content are considered spam: Unsolicited commercial advertisement or non-commercial proselytizing. Fraudulent schemes. including get-rich-quick and pyramid schemes. Phishing at
tl;dr better quantization - smarter models submitted by /u/jacek2023
submitted by /u/clem59480
Any time I catch it messing up it just lies and tries to hide it’s mistakes . This is the 1st model I’m caught doing this multiple times. I’m have llms hallucinate or be just completely wrong but qwen will say it did something, I call it out then it goes and double downs on its lie “I did do it like
I've just released APEX (Adaptive Precision for EXpert Models): a novel MoE quantization technique that outperforms Unsloth Dynamic 2.0 on accuracy while being 2x smaller for MoE architectures. Benchmarked on Qwen3.5-35B-A3B, but the method applies to any MoE model. Half the size of Q8. Perplexity c
Hey, r/LocalLLaMA ! I am back with a new model - no transformer but a GAN! It is called CatGen v2 and it generates 128x128px of cats. You can find the full source code, samples and the final model here: Look at this sample after epoch 165 (trained on a single Kaggle T4 GPU): Feedback is very welcome
80% of the benefit of TQ with almost no downsides. Q8 is now ≈ F16 submitted by /u/Dany0
submitted by /u/zdy132
Tl;dr: One of Stanford's hottest AI seminar courses. We open the course to the public. Lectures start tomorrow (Thursdays), 4:30-5:50pm PDT, at Skilling Auditorium and Zoom. Talks will be recorded. Course website: Interested in Transformers, the deep learning model that has taken the world by storm?
Just noticed this one today. Not sure how they got away distilling from an Anthropic model. submitted by /u/VegetableSun9225
arcee-ai/Trinity-Large-Thinking · Hugging Face submitted by /u/TKGaming11
So I recently bought a Mac (m2 max) with local llm use in mind and I did my research and everywhere everyone was saying go for the larger ram option or I will regret it later... So I did. Time to choose a model: "Okay, - Nice model, Qwen3.5 35b a3b running 8 bit quant, speedy even with full context
I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a
I bought an RTX 5060 Ti 16GB around Christmas and had one goal: get a strong model running locally on my card without paying api fees. I have been testing local ai with open claw. I did not come into this with a quantization background. I only learned about llama, lmstudio and ollama two months ago.
submitted by /u/PraxisOG
Gemma Gemma Gemma Gemma submitted by /u/jacek2023
Gamma 4 drops most likely tomorrow! what will it take to make it a good release for you? submitted by /u/SpecterOrigin
Hey everyone, Tim from AnythingLLM and yesterday I saw the PrismML Bonsai post so i had to give it a real shot because 14x smaller models (in size and memory) would actually be a huge game changer for Local models - which is basically all I do. I personally only ran the Bonsai 8B model for my tests,
Flood of useless vibe coded projects is getting out of hand... submitted by /u/kingofjupyter
Blog post: From Chujie Zheng on 𝕏: submitted by /u/Nunki08