- AI Kosha Initiative: A government-backed platform for non-personal datasets aimed at fostering AI model and tool development.
- Initial Dataset Count: Launched with 316 datasets, mainly supporting Indian language translation tools.
- IndiaAI Mission Alignment: AI Kosha is part of the ₹10,370 crore IndiaAI Mission, focusing on AI advancement.
Relevance : GS 3(Science ,Technology)
Compute Capacity & Infrastructure
- GPU Access Expansion:
- 14,000 GPUs commissioned for shared access, an increase from 10,000 earlier this year.
- More GPUs to be added quarterly to support AI model training.
Government’s AI Development Strategy
- Homegrown AI Model:
- Government accelerating efforts to develop an indigenous foundational AI model.
- Inspired by China’s DeepSeek, which achieved success at lower costs than U.S. firms (OpenAI, Google).
- High interest from startups in leveraging India-specific AI solutions.
Dataset Categories in AI Kosha
- Translation & Linguistic Tools: Majority of datasets aimed at improving Indian language AI models.
- Other Data Sources:
- Telangana Open Data Initiative (health-related data).
- 2011 Census Data.
- Satellite Imagery from Indian satellites.
- Meteorological and Pollution Data.
Past Government Data Initiatives
- Open Governance Data Platform:
- 12,000+ datasets hosted by data.gov.in from multiple government agencies.
- Ministries and departments have designated Chief Data Officers to facilitate dataset contributions.
- 2018 Non-Personal Data Committee:
- Explored making private sector data (e.g., ride-sharing traffic data) accessible for startups & policy use.
- Faced pushback from tech industry over data-sharing concerns.
- Debate on non-personal data preceded the LLM (Large Language Model) boom, such as ChatGPT.
Significance & Challenges
- Significance:
- Encourages AI innovation using publicly available data.
- Supports startups, academia, and government in developing AI tools.
- Strengthens AI ecosystem with better compute power and data access.
- Challenges:
- Private sector resistance to data sharing remains unresolved.
- Data quality and availability across diverse domains need continuous enhancement.
- Evaluation frameworks for foundational AI models still evolving.