BashaEval
A systematic evaluation framework for Voice AI on Indian languages. Standardized test sets, open metrics (WER, MOS, latency), and baseline results — all open-sourced. The missing benchmark for Indic speech AI.
Learn moreCogniHuman is a registered, non-profit research foundation building open Voice AI and LLMs for underrepresented languages, regional dialects, and communities across India and the world.
Apache 2.0 · CC BY-SA 4.0 · Section 8 Not-for-Profit, India
Active Research
Real research contributions — not API wrappers. Every project is open, reproducible, and designed to outlast any single team or grant.
A systematic evaluation framework for Voice AI on Indian languages. Standardized test sets, open metrics (WER, MOS, latency), and baseline results — all open-sourced. The missing benchmark for Indic speech AI.
Learn moreEthically sourced, professionally documented speech datasets for regional languages and dialects. Full consent protocols, CC BY-SA 4.0 licensing, and annotation standards that set the bar for Indic speech data.
Learn moreOn-device, CPU-first voice agent research. Systematic study of latency, quantization, and efficient inference for voice pipelines that run without cloud dependency — bringing Voice AI to resource-constrained environments.
Learn moreOur Mission
We believe Voice AI should serve everyone — not just those who speak English, not just those with cloud access, not just those whose dialects were included in training data.
We are a Section 8 Not-for-Profit registered in India, with 12A and 80G status. We exist to advance science, not to extract value from it.
Publish reproducible research
Open methodology, open weights, open code. If you can't reproduce it, it's not science.
Preserve linguistic heritage
India has 22 scheduled languages and hundreds of dialects. We build for all of them.
Bridge the digital divide
Free educational tools, open datasets, and accessible infrastructure for communities that need it most.
Set ethical standards
Consent-first data collection, explainable frameworks, and benchmarks that protect human dignity.
Research philosophy
"If it doesn't advance open science, we don't build it. Publishing reproducible methodology is not optional — it is the entire point."
We create ethically sourced speech corpora with professional annotation standards. The dataset is the contribution.
We build standardized evaluations for Indic languages — because you can't improve what you can't measure.
Voice AI that works on a ₹8,000 Android phone matters more than one that needs a GPU in the cloud.
Latest
We are building the first systematic evaluation framework for Voice AI systems across Indian languages. Here's the methodology and what we've learned so far.
ResearchWhy we believe the most impactful Voice AI research targets phones and edge devices — not cloud GPUs. The VoiceLoop-X project explained.
BlogA short note on why a group of researchers registered a Section 8 foundation and what we believe is missing from the open Voice AI ecosystem in India.
AnnouncementOpen Collaboration
We welcome researchers, linguists, developers, and community organizers. You don't need to be an ML engineer to make a meaningful contribution.
CSR-1 registered · NITI Aayog Darpan · 80G eligible
Data Collection
Help record or transcribe speech for underrepresented dialects in your community.
Model Evaluation
Run benchmarks, document results, and help us improve our evaluation methodology.
Research & Engineering
Implement optimizations, fine-tune models, write papers, and push the state of the art.
Community Outreach
Connect us with communities, institutions, or CSR partners who share our mission.