Open-Science AI Research · India

Voice AI for
every human voice

CogniHuman is a registered, non-profit research foundation building open Voice AI and LLMs for underrepresented languages, regional dialects, and communities across India and the world.

Explore our research Get involved

Apache 2.0 · CC BY-SA 4.0 · Section 8 Not-for-Profit, India

1B+ speakers of underserved Indian languages

7 research objectives across our foundation

100% open — code, weights, and datasets

∞ languages left to include in Voice AI

Active Research

What we're building

Real research contributions — not API wrappers. Every project is open, reproducible, and designed to outlast any single team or grant.

// 01 Active

BashaEval

A systematic evaluation framework for Voice AI on Indian languages. Standardized test sets, open metrics (WER, MOS, latency), and baseline results — all open-sourced. The missing benchmark for Indic speech AI.

Benchmarks Indic Languages Evaluation

Learn more

// 02 Active

Voice Corpus Project

Ethically sourced, professionally documented speech datasets for regional languages and dialects. Full consent protocols, CC BY-SA 4.0 licensing, and annotation standards that set the bar for Indic speech data.

Datasets Dialect Ethics

Learn more

// 03 In Progress

VoiceLoop-X

On-device, CPU-first voice agent research. Systematic study of latency, quantization, and efficient inference for voice pipelines that run without cloud dependency — bringing Voice AI to resource-constrained environments.

On-Device Efficiency Voice Agents

Learn more

Our Mission

AI as a common good

We believe Voice AI should serve everyone — not just those who speak English, not just those with cloud access, not just those whose dialects were included in training data.

We are a Section 8 Not-for-Profit registered in India, with 12A and 80G status. We exist to advance science, not to extract value from it.

About the foundation →

Publish reproducible research

Open methodology, open weights, open code. If you can't reproduce it, it's not science.

Preserve linguistic heritage

India has 22 scheduled languages and hundreds of dialects. We build for all of them.

Bridge the digital divide

Free educational tools, open datasets, and accessible infrastructure for communities that need it most.

Set ethical standards

Consent-first data collection, explainable frameworks, and benchmarks that protect human dignity.

Research philosophy

"If it doesn't advance open science, we don't build it. Publishing reproducible methodology is not optional — it is the entire point."

01 /

Novel datasets, not fine-tuned demos

We create ethically sourced speech corpora with professional annotation standards. The dataset is the contribution.

02 /

Benchmarks that mean something

We build standardized evaluations for Indic languages — because you can't improve what you can't measure.

03 /

On-device, not cloud-dependent

Voice AI that works on a ₹8,000 Android phone matters more than one that needs a GPU in the cloud.

Latest

Updates & writing

View all →

April 2026

Announcing BashaEval: A Benchmark for Indic Voice AI

We are building the first systematic evaluation framework for Voice AI systems across Indian languages. Here's the methodology and what we've learned so far.

Research

March 2026

On-Device Voice AI: The Case for CPU-First Research

Why we believe the most impactful Voice AI research targets phones and edge devices — not cloud GPUs. The VoiceLoop-X project explained.

Blog

February 2026

CogniHuman Research Foundation: Why We Started

A short note on why a group of researchers registered a Section 8 foundation and what we believe is missing from the open Voice AI ecosystem in India.

Announcement

Open Collaboration

Research is better together

We welcome researchers, linguists, developers, and community organizers. You don't need to be an ML engineer to make a meaningful contribution.

CSR-1 registered · NITI Aayog Darpan · 80G eligible

Data Collection

Help record or transcribe speech for underrepresented dialects in your community.

Model Evaluation

Run benchmarks, document results, and help us improve our evaluation methodology.

Research & Engineering

Implement optimizations, fine-tune models, write papers, and push the state of the art.

Community Outreach

Connect us with communities, institutions, or CSR partners who share our mission.

Voice AI for every human voice

What we're building

BashaEval

Voice Corpus Project

VoiceLoop-X

AI as a common good

Novel datasets, not fine-tuned demos

Benchmarks that mean something

On-device, not cloud-dependent

Updates & writing

Announcing BashaEval: A Benchmark for Indic Voice AI

On-Device Voice AI: The Case for CPU-First Research

CogniHuman Research Foundation: Why We Started

Research is better together

Voice AI for
every human voice