By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Sign In
Latest World News UpdateLatest World News UpdateLatest World News Update
Notification Show More
Font ResizerAa
  • Home
  • Business
  • National
  • Entertainment
  • Sports
  • Health
  • Science
  • Tech
  • World
  • Marathi
  • Hindi
  • Gujarati
  • Videos
  • Press Release
    • Press Release
    • Press Release Distribution Packages
  • Live Streaming
  • Legal Talk
Reading: Yandex releases world’s largest event dataset for advancing recommender systems – World News Network
Share
Latest World News UpdateLatest World News Update
Font ResizerAa
Search
  • Home
  • Business
  • National
  • Entertainment
  • Sports
  • Health
  • Science
  • Tech
  • World
  • Marathi
  • Hindi
  • Gujarati
  • Videos
  • Press Release
    • Press Release
    • Press Release Distribution Packages
  • Live Streaming
  • Legal Talk
Have an existing account? Sign In
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Latest World News Update > Blog > Business > Yandex releases world’s largest event dataset for advancing recommender systems – World News Network
Business

Yandex releases world’s largest event dataset for advancing recommender systems – World News Network

worldnewsnetwork
Last updated: June 2, 2025 12:00 am
worldnewsnetwork
Share
7 Min Read
SHARE

VMPL
New Delhi [India], June 2: Yandex has published Yambda (Yandex Music Billion-Interactions Dataset), the world’s largest currently available open dataset for recommender systems, containing nearly 5 billion anonymized user interactions with audio tracks from its music streaming platform, Yandex Music.
Yambda serves as a universal benchmark for testing new approaches and algorithms across all domains utilizing recommender systems — e-commerce, social networks, and short-form video platforms.
The dataset enables researchers to develop and test new recommender algorithms against its baseline models, accelerating innovation. Startups with limited data can leverage the dataset to build and test systems using Yambda before scaling. This accelerates the creation of advanced technologies tailored to business needs worldwide.

Bridging the research-industry gap
The quality and scale of training data are critical to delivering relevant recommendations on platforms like streaming services, social networks, short-form video apps, and e-commerce marketplaces. However, research in recommender systems has lagged behind rapidly advancing fields like large language models, largely due to limited access to large-scale datasets. Effective recommendation models require terabytes of behavioral data, which commercial platforms possess but rarely share publicly.
Researchers are often left with small, outdated datasets that fail to capture the complexity of modern usage:
* Spotify’s Million Playlists dataset is too small for commercial-scale recommender systems.
* Netflix Prize dataset, with ~17,000 items and date-only timestamps, limits temporal modeling and large-scale research.
* Criteo 1TB Click Logs dataset lacks proper documentation and identifiers, and focuses narrowly on ad clicks.
“Recommender systems are inherently tied to sensitive data. Companies can only publish recommender system datasets publicly after exhaustive anonymization, a resource-intensive process that’s slowed open innovation,” explains Nikolai Savushkin, Head of Recommender Systems at Yandex.
This data scarcity creates a gap: models that excel in academic settings often underperform in real-world applications. Efforts to integrate recommender systems with advanced architectures are also constrained by the lack of suitable training data.
About the Yambda dataset
Yambda addresses recommender system challenges by providing a massive, anonymized dataset from its music streaming service with ~28 million monthly users. This dataset provides insights into how users interact with the content offered by Yandex Music, which is known for its sophisticated recommendation system My Wave that tailors the listening experience to the tastes of each user. To protect privacy, all user and track data is anonymized, using numeric identifiers to meet privacy standards.
Key features of the dataset:
* 4.79 billion anonymized user interactions collected over 10 months.
* Data from 1 million users and anonymized descriptors for 9.39 million tracks.
* Includes two feedback types: implicit interactions (listens) and explicit interactions (likes, dislikes, and their removal).
* Offers audio embeddings (vector representations generated via convolutional neural networks) and anonymized information about tracks.
* Features an “is_organic” flag marking whether users discovered tracks independently or through recommendations, enabling deeper behavioral analysis.
* All events are timestamped, which supports the analysis of user behavior over time and allows models to be evaluated under conditions that closely resemble real-world use.
The dataset is released in Apache Parquet format, compatible with distributed processing systems such as Spark or Hadoop and analytical libraries like Pandas and Polars.
“Yambda empowers researchers to test innovative hypotheses and businesses to build smarter recommender systems. Ultimately, users benefit — finding the perfect song, product, or service effortlessly,” notes Nikolai Savushkin.
Dataset versions and evaluation
Available in three sizes — approximately 5 billion, 500 million, and 50 million events — the Yambda dataset accommodates researchers and developers with different needs and computational resource capacities.

The dataset uses Global Temporal Split (GTS) for evaluation, a method that splits data by timestamps to preserve event sequences. Unlike Leave-One-Out, which removes the last positive interaction from each user’s history for testing, GTS avoids breaking temporal dependencies between training and test sets. This ensures a more realistic model testing — mimicking real-world conditions where future data is unavailable.
Baseline implementations include MostPop, DecayPop, ItemKNN, iALS, BPR, SANSA, and SASRec, providing benchmarks for comparing new recommender system approaches. These baselines are evaluated using standard metrics, including:
* NDCG@k (ranking quality)
* Recall@k (retrieval effectiveness)
* Coverage@k (catalog diversity)
“When industry leaders share hard-won tools and data, a rising tide lifts all boats: researchers gain real-world benchmarks, startups access resources once reserved for tech giants, and users everywhere enjoy greater personalization,” added Nikolay Savushkin.
Yambda, the world’s largest open recommender system dataset, is now available on Hugging Face.
About Yandex
Yandex is a global technology company that builds intelligent products and services powered by machine learning. The company’s goal is to help consumers and businesses better navigate the online and offline world. Since 1997, Yandex has been delivering world-class, locally relevant search and information services and has also developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.
About My Wave
My Wave, a personalized recommendation system integrated into the multi-million-user music streaming service, Yandex Music, employs deep neural models and AI algorithms to analyze over a thousand factors — including user interactions, customizable mood/language settings, and real-time music analysis of spectrograms, frequency ranges, rhythm, vocal tone, and genre. By processing listening history and track sequences, it dynamically adapts to user preferences, identifies audio similarities, and predicts musical tastes to deliver tailored suggestions.
(ADVERTORIAL DISCLAIMER: The above press release has been provided by VMPL. ANI will not be responsible in any way for the content of the same)


Disclaimer: This story is auto-generated from a syndicated feed of ANI; only the image & headline may have been reworked by News Services Division of World News Network Inc Ltd and Palghar News and Pune News and World News

sponsored by

WORLD MEDIA NETWORK


PRESS RELEASE DISTRIBUTION

Press releases distribution in 166 countries

EUROPE UK, INDIA, MIDDLE EAST, AFRICA, FRANCE, NETHERLANDS, BELGIUM, ITALY, SPAIN, GERMANY, AUSTRIA, SWITZERLAND, SOUTHEAST ASIA, JAPAN, SOUTH KOREA, GREATER CHINA, VIETNAM, THAILAND, INDONESIA, MALAYSIA, SOUTH AMERICA, RUSSIA, CIS COUNTRIES, AUSTRALIA, NEW ZEALAND AND MORE

Press releases in all languages

ENGLISH, GERMAN, DUTCH, FRENCH, PORTUGUESE, ARABIC, JAPANESE, and KOREAN CHINESE, VIETNAMESE, INDONESIAN, THAI, MALAY, RUSSIAN. ITALIAN, SPANISH AND AFRICAN LANGUAGES

Press releases in Indian Languages

HINDI, MARATHI, GUJARATI, TAMIL, TELUGU, BENGALI, KANNADA, ORIYA, PUNJABI, URDU, MALAYALAM
For more details and packages

Email - support@worldmedianetwork.uk
Website - worldmedianetwork.uk

India Packages

Read More

Europe Packages

Read More

Asia Packages

Read More

Middle East & Africa Packages

Read More

South America Packages

Read More

USA & Canada Packages

Read More

Oceania Packages

Read More

Cis Countries Packages

Read More

World Packages

Read More
sponsored by

You Might Also Like

India’s 10-Year Bond Yield may fall further if RBI cuts rate by more than 25 bps: Bank of Baroda Report – World News Network

RBI’s Monetary Policy Committee meeting to begin today, policy announcement on June 6; SBI expects 50 bps cut – World News Network

Economists divided on quantum of rate cut, 25 bps or 50 bps, ahead of RBI policy announcement – World News Network

RBI’s Monetary Policy Committee meeting to begin today, policy announcement on June 6; SBI expects 50 bps cut – World News Network

Economists divided on quantum of rate cut, 25 bps or 50 bps, ahead of RBI policy announcement – World News Network

Share This Article
Facebook Twitter Copy Link Print
Share
Previous Article Know Your Opponents: India and Thailand renew rivalry in men’s football – World News Network
Next Article Kamal Haasan welcomes verdict in Anna University sexual assault case – World News Network
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

- Advertisement -

Latest News

Judge rules in favour of Justin Baldoni as Blake Lively’s emotional distress case faces setback – World News Network
Entertainment June 4, 2025
Agra’s AGSSL Sets Benchmark With Revolutionary ESOP Policy, Empowering Employees And Retaining Talent – World News Network
Business June 4, 2025
Tredence Named Snowflake Partner of the Year for Retail & CPG at Snowflake Summit 2025 – World News Network
Business June 4, 2025
Hyundai Motor Company President and CEO Jose Munoz Reinforces Hyundai’s Journey as a Mobility Leader at FISITA World Mobility Conference 2025 – World News Network
Business June 4, 2025

Sports

Bengaluru’s sky glimmers with fireworks as fans celebrate RCB’s maiden IPL trophy triumph – World News Network
Sports
Gary Stead set to step down as New Zealand coach in June – World News Network
Sports

Popular Category

  • Business
  • Entertainment
  • Health
  • Lifestyle
  • National
  • Science
  • Sports
  • Tech
  • Videos
  • World
  • Marathi
  • Hindi
  • Gujarati
  • Press Release
  • Press Release Distribution Packages

Entertainment

Actor Vibhu Raghave passes away after battle with Cancer, Karan Veer Mehra, Sanaya Irani, more TV celebs mourn loss – World News Network
Entertainment
Actor Vibhu Raghave passes away after battle with Cancer, Karan Veer Mehra, Sanaya Irani, more TV celebs mourn loss – World News Network
Entertainment
Latest World News UpdateLatest World News Update
Follow US
Copyright © 2023 World News Network. All Rights Reserved
Welcome Back!

Sign in to your account

Lost your password?