Multimodal + Reasoning

Analyze the systemic limitations and safety guardrails detailed in the model card. Build an open-source evaluation suite to benchmark similar open-weights models against these stated capabilities.

Multimodal + Reasoning

Signals (167)

oobabooga/textgen

Distributionally Robust Token Optimization in RLHF

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

CyberAgent moves faster with ChatGPT Enterprise and Codex

AIBuildAI: An AI Agent for Automatically Building AI Models

Designing synthetic datasets for the real world: Mechanism design and reasoning from first principles

Knowledge Is Not Static: Order-Aware Hypergraph RAG for Language Models

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

ChatGPT for research

Mistake gating leads to energy and memory efficient continual learning

Bringing AI Closer to the Edge and On-Device with Gemma 4

Using custom GPTs

Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models

The AI Great Leap Forward

unslothai/unsloth

ADAG: Automatically Describing Attribution Graphs

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild

Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

code-yeongyu/oh-my-openagent

Geometric Routing Enables Causal Expert Control in Mixture of Experts

The Gemini app is now on Mac

Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference

Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Hallucination as output-boundary misclassification: a composite abstention architecture for language models

Training mRNA Language Models Across 25 Species for $165

GPT‑Rosalind for life sciences research

diegosouzapw/OmniRoute

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning

Claude Opus 4.7

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Reward Design for Physical Reasoning in Vision-Language Models

The local LLM ecosystem doesn’t need Ollama

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Getting started with ChatGPT

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

oumi-ai/oumi

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Brainstorming with ChatGPT

The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

Lordog/dive-into-llms

Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

Amazon Bedrock adds 18 fully managed open weight models, including the new Mistral Large 3 and Ministral 3 models

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

BasedHardware/omi

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

run-llama/liteparse

Reallocating $100/Month Claude Code Spend to Zed and OpenRouter

Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement

LLM-HYPER: Generative CTR Modeling for Cold-Start Ad Personalization via LLM-Based Hypernetworks

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents

GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

Ranked Activation Shift for Post-Hoc Out-of-Distribution Detection

Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning

Small models also found the vulnerabilities that Mythos found

anthropics/claude-cookbooks

Robust Reasoning Benchmark

Riemann-Bench: A Benchmark for Moonshot Mathematics

Sensitivity-Positional Co-Localization in GQA Transformers

Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling

z-lab/dflash

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers