hypedar — AI trend radar for developers

arXiv15m ago

4.8

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments

arXiv:2604.02669v1 Announce Type: new Abstract: How biased is a language model? The answer depends on how you ask. A model that refuses to choose between castes for a leadership role will, in a fill-in-the-blank task, reliably associate upper castes with purity and lower castes with lack of hygiene.

low

alignmentethicsevaluationbenchmarkingllm

arXiv15m ago

5.5

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

arXiv:2604.02668v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit sycophancy: agreement with user stance even when it conflicts with the model's opinion. While prior work has mostly studied this in single-agent settings, it remains underexplored in collaborative multi-agent

low

agentsreasoningalignmentmulti-agentllm

arXiv15m ago

5.0

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models

arXiv:2604.02660v1 Announce Type: new Abstract: As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for att

low

agentsevaluationethics

arXiv15m ago

4.6

Revealing the Learning Dynamics of Long-Context Continual Pre-training

arXiv:2604.02650v1 Announce Type: new Abstract: Existing studies on Long-Context Continual Pre-training (LCCP) mainly focus on small-scale models and limited data regimes (tens of billions of tokens). We argue that directly migrating these small-scale settings to industrial-grade models risks insuff

low

traininginference

arXiv15m ago

3.9

Speaking of Language: Reflections on Metalanguage Research in NLP

arXiv:2604.02645v1 Announce Type: new Abstract: This work aims to shine a spotlight on the topic of metalanguage. We first define metalanguage, link it to NLP and LLMs, and then discuss our two labs' metalanguage-centered efforts. Finally, we discuss four dimensions of metalanguage and metalinguisti

low

discussionnlp

arXiv15m ago

4.6

Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework

arXiv:2604.02640v1 Announce Type: new Abstract: Performance evaluation of Retrieval-Augmented Generation (RAG) systems within enterprise environments is governed by multi-dimensional and composite factors extending far beyond simple final accuracy checks. These factors include reasoning complexity,

low

ragevaluation

arXiv15m ago

5.0

Train Yourself as an LLM: Exploring Effects of AI Literacy on Persuasion via Role-playing LLM Training

arXiv:2604.02637v1 Announce Type: new Abstract: As large language models (LLMs) become increasingly persuasive, there is concern that people's opinions and decisions may be influenced across various contexts at scale. Prior mitigation (e.g., AI detectors and disclaimers) largely treats people as pas

low

trainingeducationagents

arXiv15m ago

3.1

Reinforcement Learning-based Knowledge Distillation with LLM-as-a-Judge

arXiv:2604.02621v1 Announce Type: new Abstract: Reinforcement Learning (RL) has been shown to substantially improve the reasoning capability of small and large language models (LLMs), but existing approaches typically rely on verifiable rewards, hence ground truth labels. We propose an RL framework

low

rltrainingdistillation

arXiv15m ago

4.3

An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

arXiv:2604.02596v1 Announce Type: new Abstract: In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can further benefit from

low

raginferencelow-resource

arXiv15m ago

5.1

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

arXiv:2604.02560v1 Announce Type: new Abstract: Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of p

low

inferencediffusionoptimization

arXiv15m ago

4.6

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

arXiv:2604.02557v1 Announce Type: new Abstract: Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adap

low

multimodalfine-tuningculture

arXiv15m ago

4.8

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

arXiv:2604.02554v1 Announce Type: new Abstract: Diversity-aware retrieval is essential for Retrieval-Augmented Generation (RAG), yet existing methods lack theoretical guarantees and face scalability issues as the number of retrieved passages $k$ increases. We propose a principled formulation of dive

low

ragretrievaloptimization

arXiv15m ago

5.5

PolyJarvis: LLM Agent for Autonomous Polymer MD Simulations

arXiv:2604.02537v1 Announce Type: new Abstract: All-atom molecular dynamics (MD) simulations can predict polymer properties from molecular structure, yet their execution requires specialized expertise in force field selection, system construction, equilibration, and property extraction. We present P

low

mcpagents

arXiv15m ago

5.0

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

arXiv:2604.02512v1 Announce Type: new Abstract: Large language models (LLMs) increasingly exhibit human-like patterns of pragmatic and social reasoning. This paper addresses two related questions: do LLMs approximate human social meaning not only qualitatively but also quantitatively, and can prompt

low

reasoning

arXiv15m ago

4.8

Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models

arXiv:2604.02485v1 Announce Type: new Abstract: Confirmation bias, the tendency to seek evidence that supports rather than challenges one's belief, hinders one's reasoning ability. We examine whether large language models (LLMs) exhibit confirmation bias by adapting the rule-discovery study from hum

low

reasoning

arXiv15m ago

3.3

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

arXiv:2604.02460v1 Announce Type: new Abstract: Recent work reports strong performance from multi-agent LLM systems (MAS), but these gains are often confounded by increased test-time computation. When computation is normalized, single-agent systems (SAS) can match or outperform MAS, yet the theoreti

low

reasoninginference

arXiv15m ago

4.3

Skeleton-based Coherence Modeling in Narratives

arXiv:2604.02451v1 Announce Type: new Abstract: Modeling coherence in text has been a task that has excited NLP researchers since a long time. It has applications in detecting incoherent structures and helping the author fix them. There has been recent work in using neural networks to extract a skel

low

reasoning

arXiv15m ago

5.0

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

arXiv:2604.02423v1 Announce Type: new Abstract: Large language models exhibit sycophancy: the tendency to shift outputs toward user-expressed stances, regardless of correctness or consistency. While prior work has studied this issue and its impacts, rigorous computational linguistic metrics are need

low

reasoning

arXiv15m ago

5.1

CIPHER: Conformer-based Inference of Phonemes from High-density EEG

arXiv:2604.02362v1 Announce Type: new Abstract: Decoding speech information from scalp EEG remains difficult due to low SNR and spatial blurring. We present CIPHER (Conformer-based Inference of Phonemes from High-density EEG Representations), a dual-pathway model using (i) ERP features and (ii) broa

low

multimodalinference

arXiv15m ago

4.8

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

arXiv:2604.02359v1 Announce Type: new Abstract: General-purpose Large Language Models (LLMs) are becoming widely adopted by people for mental health support. Yet emerging evidence suggests there are significant risks associated with high-frequency use, particularly for individuals suffering from psy

low

ragreasoning

Feed

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models

Revealing the Learning Dynamics of Long-Context Continual Pre-training

Speaking of Language: Reflections on Metalanguage Research in NLP

Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework

Train Yourself as an LLM: Exploring Effects of AI Literacy on Persuasion via Role-playing LLM Training

Reinforcement Learning-based Knowledge Distillation with LLM-as-a-Judge

An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

PolyJarvis: LLM Agent for Autonomous Polymer MD Simulations

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

Skeleton-based Coherence Modeling in Narratives

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

CIPHER: Conformer-based Inference of Phonemes from High-density EEG

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis