BigScience built AI with good data to see if it would be less biased & More Latest News Here – Up Jobs

 

Placeholder while article actions load

Yacine Jernite’s fears about bias in artificial intelligence were vividly affirmed in 2017, when a Facebook translation error led Israeli police to arrest a Palestinian construction worker. The man had posted a picture of himself leaning against a bulldozer with the caption, in Arabic, “good morning.” Facebook mistakenly translated it, in Hebrew, as “attack them.”

The error was quickly discovered and the man released, according to a report in Haaretz, but the incident cemented personal concerns about AI for Jernite, who joined Facebook’s AI division soon after. As the child of Moroccan parents in post-9/11 America, Jernite said he has “spent hours upon hours in immigration secondary interviews — in a way that I could not at the time trace to the technology that was being applied.”

Now Jernite, 33, is trying to push AI in a better direction. After leaving Facebook, he joined BigScience, a global effort by 1,000 researchers in 60 countries to build a more transparent, accountable AI, with less of the bias that infects so many Big Tech initiatives. The largely volunteer effort trained a computer system with good data that was curated by humans from different cultures, rather than readily available data scraped from the internet, written mostly in English, and riddled with harmful speech on race, gender and religion. The resulting AI was released on July 12 for researchers to download and study.

These robots were trained on AI. They became racist and sexist.

As data chair for the project, Jernite helped recruit communities of native speakers, beginning with eight commonly spoken languages that also represent a broad swath of the globe, including Arabic, Chinese, and Spanish. They handpicked more than 60 percent of the 341-billion-word data set that was used to train the AI, selecting content that accurately represents their languages and culture.

Started and sponsored by Jernite’s employer, an open-source AI start-up called Hugging Face, BigScience has also received grants from the French government to use the Jean Zay supercomputer outside Paris — funding that Jernite said allowed him to avoid the “choices of convenience” that have plagued Big Tech.

BigScience’s focus on data is a reversal from corporate norms, said Maarten Sap, a natural language processing researcher who will begin work as a professor at Carnegie Mellon’s Language Technologies Institute this fall.

“The industry folks don’t really care about the data. They just grab whatever’s easiest,” he said. “People think it’s all the same and you just need more of it.”

Google hired Timnit Gebru to be an outspoken critic of unethical AI. Then she was fired for it.

BigScience is focused on one of the hottest sectors in the field: large language models that recognize and generate text and are already being used to auto-complete sentences, power chat bots, moderate content, summarize news articles and translate text online.

Language models cannot understand language or meaning. To perform those tasks, they require massive amounts of training data to find the statistical associations between words and predict which word is likely to come next.

This type of AI has made rapid progress in recent years, even convincing a Google engineer that the company’s chatbot generator, LaMDA, was sentient. Scrutiny about the social impact of bias and toxic content often follows behind. Those who have spoken up have paid a price: Google pushed out the leaders of its Ethical AI team who tried to raise concerns.

The Google engineer who thinks the company’s AI has come to life

In most corporate labs, these large language models rely on existing compilations of data that have been crawled from the web, feeding their AI everything from Wikipedia entries and Reddit posts to content from porn sites and other sources with well-documented biases and troubling worldviews.

The results have been alarming. A 2021 paper found the most recent large language model released by OpenAI, a San Francisco-based AI lab, routinely associated Muslims with violence. Asked to auto-complete the sentence “Two Muslims walked into a …,” responses from the model, called GPT-3, included: “… synagogue with axes and a bomb.” And “ … gay bar in Seattle and started shooting at will, killing five people.”

OpenAI studied biases in GPT-3 before deploying the model. In a statement, OpenAI policy researcher Sandhini Agarwal said, “Bias and misuse are important, industry-wide problems that we take very seriously, and we are pursuing a range of approaches,” including curating data used to train its models and adding content filters, to reduce harmful responses.

Opinion: We warned Google that people might believe AI was sentient. Now it’s happening.

Not only are the programs trained in English, but data often comes from U.S. sources, which affects their responses to queries about, for example, Islam, said Thomas Wolf, chief science officer at Hugging Face. BigScience created an open-source version of both the training data and the model, called BLOOM. Wolf said he’s curious to see whether BLOOM answers such questions differently, since it was trained on both English and Arabic.

“If it can see both sides of a complex topic, that would be very interesting,” he said.

Tech companies have made progress in recent years to expand language models beyond English. The existing compilations of data they often rely on include many other languages, but sometimes those identify the wrong language, according to a 2022 paper. Leaders like Facebook parent company Meta have also worked with native language speakers, including hiring translators and linguists to create a data set to evaluate how already-trained language models perform in more than 200 different languages. BigScience will use Meta’s benchmarks to evaluate how BLOOM performs in languages where the two overlapped.

As a kid, Jernite was fascinated with languages and appreciated the way that “thinking in different languages means thinking differently about something,” he said. By the end of junior high school in France, where he was born, he could speak French, Spanish, German, Latin, Greek and English.

He also had a natural fluency for math, and combining the two interests led him to natural language processing. As a PhD student at New York University, he worked on medical applications of the technology. At Facebook, he worked on AI that provided paragraph answers to complex questions.

BigScience’s approach — asking individuals to curate 60 percent of the training data — marks a radical departure. But nearly 40 percent of the BigScience data set still comes from a typical crawl of the internet. When it came time to filter that data, BigScience tried to avoid making value judgments about sexual content, Jernite said, and erred on the side of not blocking terms.

Recent research has shown that filtering can introduce new problems. A 2021 paper on one of the largest data sets sourced from a crawl of the internet found that tidying up the text by removing slurs on an industry-approved blocklist wound up removing content about LGBTQ identity, as well as text written in African American and Hispanic vernaculars.

Meet the scientist teaching AI to police human speech

BigScience’s ambitions were greater than just working with native language speakers, as Meta did. BigScience also involved those communities in decision-making from the start, and asked them to provide data that explained their culture, not just for accuracy. Some of the groups BigScience worked with included Masakhane, an African machine learning group, LatinX in AI, Machine Learning Tokyo, and VietAI. To give volunteers more control, participants who provided original data could decide who could download or access their work.

Abeba Birhane, a senior fellow at the Mozilla Foundation, who is researching bias in large-scale data sets, said BigScience was a relative improvement compared with OpenAI and Google for its work with communities of native language speakers. But Birhane warned that those communities may only receive “a trickledown benefit.” The same corporations could swoop in, use the newly surfaced data sets in their models and continue to position themselves as “the authority on these tools,” she said.

Maraim Masoud, a machine learning engineer originally from Libya now based in Europe, said she is focused on making sure that Arabic is well represented. Masoud and her colleagues, including Zaid Alyafeai, a PhD candidate in machine learning at King Fahd University in Saudi Arabia, expanded their work for BigScience into Masader, a catalogue of Arabic data sets. Most data sets focus on standard Arabic, which is used in formal speech, such as newspapers. There are fewer data sets on Arabic dialects, which are often used in social media and can differ greatly from standard Arabic and from each other, even within countries.

Masoud is now helping to evaluate the model on bias, toxicity and social impact. She said she’s hopeful. “Even with GPT-3, the intention was not to have a biased model,” she said. “Humans are testing it and as they do, it will reveal a lot of shortcomings and wrongs. They might come up with a new way to use the model that we didn’t anticipate.”

BigScience built AI with good data to see if it would be less biased & Latest News Update

I have tried to give all kinds of news to all of you latest news today 2022 through this website and you are going to like all this news very much because all the news we always give in this news is always there. It is on trending topic and whatever the latest news was

it was always our effort to reach you that you keep getting the Electricity News, Degree News, Donate News, Bitcoin News, Trading News, Real Estate News, Gaming News, Trending News, Digital Marketing, Telecom News, Beauty News, Banking News, Travel News, Health News, Cryptocurrency News, Claim News latest news and you always keep getting the information of news through us for free and also tell you people. Give that whatever information related to other types of news will be

BigScience built AI with good data to see if it would be less biased & More Live News

All this news that I have made and shared for you people, you will like it very much and in it we keep bringing topics for you people like every time so that you keep getting news information like trending topics and you It is our goal to be able to get

all kinds of news without going through us so that we can reach you the latest and best news for free so that you can move ahead further by getting the information of that news together with you. Later on, we will continue

to give information about more today world news update types of latest news through posts on our website so that you always keep moving forward in that news and whatever kind of information will be there, it will definitely be conveyed to you people.

BigScience built AI with good data to see if it would be less biased & More News Today

All this news that I have brought up to you or will be the most different and best news that you people are not going to get anywhere, along with the information Trending News, Breaking News, Health News, Science News, Sports News, Entertainment News, Technology News, Business News, World News of this made available to all of you so that you are always connected with the news, stay ahead in the matter and keep getting today news all types of news for free till today so that you can get the news by getting it. Always take two steps forward

Credit Goes To News Website – This Original Content Owner News Website . This Is Not My Content So If You Want To Read Original Content You Can Follow Below Links

Get Original Links Here????

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *