In machine learning, the test error decreases as the training data used to build the model increases. These laws are called the neural scaling laws. Usually, it is the power law where the test error falls off as a power law with the training data. Because of this, millions of dollars of Investments are made to collect data. The problem with power law scaling is that massive amounts of more data are required to increase the performance only by a few percentages, so it is unsustainable. In this paper, the researchers have developed a metric to prune the dataset to make the scalability to exponential decay.
The researchers have used statistical mechanics to show that the performance can scale via an exponential decay relationship when the datasets are appropriately pruned. The currently existing pruning techniques are either compute-intensive or poor performance. The new approach uses a self-supervised AI model that estimates a pruning metric with less computation. In a paper published by OpenAI in 2020, it was shown that the model performance shows a power law w.r.t number of parameters, size of the training dataset, and the computing power used. By this law, we would require a substantial additional training dataset to increase accuracy by a few percentages. But for an exponential decay relationship, we would need much less supplementary dataset to achieve a similar performance improvement.
The team of researchers at Meta started by developing a model theoretically which improves performance from data pruning using statistical mechanics. Firstly determine the margin, which is defined as the distance of the data point from the decision boundary. Then this indicates if the training example is easy, which means the margin is large, or hard, which means the margin is less. Now comes the pruning of the dataset part. The researchers determined that for small datasets, it was best to keep the easy examples but for larger datasets keeping the harder examples was better. It was also found that as the initial dataset size increases, the amount of data required to be pruned to achieve an exponential decay increases.
Although large foundation models are trained on unlabeled datasets, the best existing metrics for dataset pruning require a large amount of computational power and labeled datasets, making them unfeasible for pruning training datasets. The Meta researchers created a self-supervised pruning metric to address this problem. The team employed k-means clustering on an embedding space from a trained model to calculate the measure. Each dataset’s distance from the closest cluster centroid serves as the pruning metric.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Beyond neural scaling laws: beating power law scaling via data pruning'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference link. Please Don't Forget To Join Our ML Subreddit
Prathvik is ML/AI Research content intern at MarktechPost, he is a 3rd year undergraduate at IIT Kharagpur. He has a keen interest in Machine learning and data science.He is enthusiastic in learning about the applications of in different fields of study .
Researchers at Stanford and Meta AI have Developed a Dataset Pruning Technique for Scaling Artificial Intelligence AI Training & Latest News Update
Researchers at Stanford and Meta AI have Developed a Dataset Pruning Technique for Scaling Artificial Intelligence AI Training & More Live News
All this news that I have made and shared for you people, you will like it very much and in it we keep bringing topics for you people like every time so that you keep getting news information like trending topics and you It is our goal to be able to get
all kinds of news without going through us so that we can reach you the latest and best news for free so that you can move ahead further by getting the information of that news together with you. Later on, we will continue
to give information about more today world news update types of latest news through posts on our website so that you always keep moving forward in that news and whatever kind of information will be there, it will definitely be conveyed to you people.
Researchers at Stanford and Meta AI have Developed a Dataset Pruning Technique for Scaling Artificial Intelligence AI Training & More News Today
All this news that I have brought up to you or will be the most different and best news that you people are not going to get anywhere, along with the information Trending News, Breaking News, Health News, Science News, Sports News, Entertainment News, Technology News, Business News, World News of this made available to all of you so that you are always connected with the news, stay ahead in the matter and keep getting today news all types of news for free till today so that you can get the news by getting it. Always take two steps forward
Credit Goes To News Website – This Original Content Owner News Website . This Is Not My Content So If You Want To Read Original Content You Can Follow Below Links