## OpenAI và các đối thủ tìm kiếm hướng đi mới cho AI thông minh hơn khi phương pháp hiện tại gặp giới hạn #AI #OpenAI #TríTuệNhânTạo #CôngNghệ #DeepLearning #LargeLanguageModel
OpenAI và các đối thủ đang tìm kiếm một hướng đi mới để phát triển trí tuệ nhân tạo (AI) thông minh hơn, khi mà các phương pháp hiện tại đang gặp phải những giới hạn bất ngờ. Sự phát triển của các mô hình ngôn ngữ lớn (LLM) – như GPT-3 của OpenAI – đã đạt được những bước tiến đáng kể, mang lại những khả năng ấn tượng trong xử lý ngôn ngữ tự nhiên. Tuy nhiên, con đường hướng tới AI thực sự thông minh đang gặp phải nhiều thách thức không lường trước.
Việc huấn luyện các LLM ngày càng lớn đòi hỏi nguồn lực khổng lồ về tính toán và dữ liệu. Chi phí tăng lên theo cấp số nhân, đặt ra những câu hỏi về tính khả thi và hiệu quả kinh tế. Thêm vào đó, việc tăng quy mô mô hình không nhất thiết dẫn đến sự gia tăng tuyến tính về hiệu suất. Các nhà nghiên cứu đang nhận thấy hiện tượng “giảm hiệu quả biên”, tức là việc đầu tư thêm tài nguyên vào huấn luyện không mang lại sự cải thiện đáng kể về khả năng của AI.
Hơn nữa, các LLM hiện tại vẫn còn nhiều hạn chế. Chúng dễ bị “ảo giác” – tức là tạo ra những thông tin sai lệch hoặc vô nghĩa – và thường thiếu khả năng lập luận logic, hiểu biết sâu sắc về thế giới thực, cũng như khả năng thích nghi với các ngữ cảnh mới.
Trước những giới hạn này, OpenAI và các đối thủ cạnh tranh đang tích cực tìm kiếm các phương pháp huấn luyện và kiến trúc mô hình mới. Một số hướng đi được nghiên cứu bao gồm:
* Tối ưu hóa thuật toán huấn luyện: Tìm kiếm các thuật toán hiệu quả hơn, giảm thiểu thời gian và chi phí huấn luyện mà vẫn đảm bảo chất lượng.
* Kiến trúc mô hình mới: Khám phá các kiến trúc mô hình khác biệt so với các mô hình biến đổi (transformer) đang được sử dụng phổ biến hiện nay, có thể khắc phục được những nhược điểm của các LLM hiện tại.
* Tích hợp kiến thức: Kết hợp kiến thức chuyên sâu vào quá trình huấn luyện, giúp AI có hiểu biết sâu sắc hơn về thế giới thực và giảm thiểu hiện tượng “ảo giác”.
* Huấn luyện đa phương thức: Kết hợp dữ liệu đa phương thức (văn bản, hình ảnh, âm thanh) để huấn luyện AI có khả năng hiểu và xử lý thông tin đa dạng hơn.
Cuộc đua hướng tới AI thông minh hơn đang diễn ra quyết liệt. Việc vượt qua những giới hạn hiện tại đòi hỏi sự đột phá cả về mặt lý thuyết và công nghệ. Sự thành công trong việc tìm ra những phương pháp huấn luyện và kiến trúc mô hình mới sẽ mở ra một kỷ nguyên mới cho sự phát triển của trí tuệ nhân tạo, mang lại những ứng dụng đột phá trong nhiều lĩnh vực của cuộc sống. #FutureofAI #AIInnovation #TechTrends #ArtificialIntelligence
OpenAI and rivals seek new path to smarter AI as current methods hit limitations
Viết lại bài dài đầy đủ và chuyên nghiệp kèm hashtag bằng tiếng Việt kèm hashtag nhằm kích thích người đọc vào xem OpenAI and rivals seek new path to smarter AI as current methods hit limitations
Artificial intelligence companies like OpenAI are seeking to overcome unexpected delays and challenges in the pursuit of ever-bigger large language models by developing training techniques that use more human-like ways for algorithms to “think”.
A dozen AI scientists, researchers and investors told Reuters they believe that these techniques, which are behind OpenAI’s recently released o1 model, could reshape the AI arms race, and have implications for the types of resources that AI companies have an insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this story. After the release of the viral ChatGPT chatbot two years ago, technology companies, whose valuations have benefited greatly from the AI boom, have publicly maintained that “scaling up” current models through adding more data and computing power will consistently lead to improved AI models.
But now, some of the most prominent AI scientists are speaking out on the limitations of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training – the phase of training an AI model that uses a vast amount of unlabeled data to understand language patterns and structures – have plateaued.
Sutskever is widely credited as an early advocate of achieving massive leaps in generative AI advancement through the use of more data and computing power in pre-training, which eventually created ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”
Sutskever declined to share more details on how his team is addressing the issue, other than saying SSI is working on an alternative approach to scaling up pre-training.
Behind the scenes, researchers at major AI labs have been running into delays and disappointing outcomes in the race to release a large language model that outperforms OpenAI’s GPT-4 model, which is nearly two years old, according to three sources familiar with private matters.
The so-called ‘training runs’ for large models can cost tens of millions of dollars by simultaneously running hundreds of chips. They are more likely to have hardware-induced failure given how complicated the system is; researchers may not know the eventual performance of the models until the end of the run, which can take months.
Another problem is large language models gobble up huge amounts of data, and AI models have exhausted all the easily accessible data in the world. Power shortages have also hindered the training runs, as the process requires vast amounts of energy.
To overcome these challenges, researchers are exploring “test-time compute,” a technique that enhances existing AI models during the so-called “inference” phase, or when the model is being used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing the best path forward.
This method allows models to dedicate more processing power to challenging tasks like math or coding problems or complex operations that demand human-like reasoning and decision-making.
“It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer,” said Noam Brown, a researcher at OpenAI who worked on o1, at TED AI conference in San Francisco last month.
OpenAI has embraced this technique in their newly released model known as “o1,” formerly known as Q* and Strawberry, which Reuters first reported in July. The O1 model can “think” through problems in a multi-step manner, similar to human reasoning. It also involves using data and feedback curated from PhDs and industry experts. The secret sauce of the o1 series is another set of training carried out on top of ‘base’ models like GPT-4, and the company says it plans to apply this technique with more and bigger base models.
At the same time, researchers at other top AI labs, from Anthropic, xAI, and Google DeepMind, have also been working to develop their own versions of the technique, according to five people familiar with the efforts.
“We see a lot of low-hanging fruit that we can go pluck to make these models better very quickly,” said Kevin Weil, chief product officer at OpenAI at a tech conference in October. “By the time people do catch up, we’re going to try and be three more steps ahead.”
Google and xAI did not respond to requests for comment and Anthropic had no immediate comment.
The implications could alter the competitive landscape for AI hardware, thus far dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capital investors, from Sequoia to Andreessen Horowitz, who have poured billions to fund expensive development of AI models at multiple AI labs including OpenAI and xAI, are taking notice of the transition and weighing the impact on their expensive bets.
“This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference,” Sonya Huang, a partner at Sequoia Capital, told Reuters.
Demand for Nvidia’s AI chips, which are the most cutting edge, has fueled its rise to becoming the world’s most valuable company, surpassing Apple in October. Unlike training chips, where Nvidia dominates, the chip giant could face more competition in the inference market.
Asked about the possible impact on demand for its products, Nvidia pointed to recent company presentations on the importance of the technique behind the o1 model. Its CEO Jensen Huang has talked about increasing demand for using its chips for inference.
“We’ve now discovered a second scaling law, and this is the scaling law at a time of inference…All of these factors have led to the demand for Blackwell being incredibly high,” Huang said last month at a conference in India, referring to the company’s latest AI chip.
Artificial intelligence companies like OpenAI are seeking to overcome unexpected delays and challenges in the pursuit of ever-bigger large language models by developing training techniques that use more human-like ways for algorithms to “think”.
A dozen AI scientists, researchers and investors told Reuters they believe that these techniques, which are behind OpenAI’s recently released o1 model, could reshape the AI arms race, and have implications for the types of resources that AI companies have an insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this story. After the release of the viral ChatGPT chatbot two years ago, technology companies, whose valuations have benefited greatly from the AI boom, have publicly maintained that “scaling up” current models through adding more data and computing power will consistently lead to improved AI models.
But now, some of the most prominent AI scientists are speaking out on the limitations of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training – the phase of training an AI model that uses a vast amount of unlabeled data to understand language patterns and structures – have plateaued.
Sutskever is widely credited as an early advocate of achieving massive leaps in generative AI advancement through the use of more data and computing power in pre-training, which eventually created ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”
Sutskever declined to share more details on how his team is addressing the issue, other than saying SSI is working on an alternative approach to scaling up pre-training.
Behind the scenes, researchers at major AI labs have been running into delays and disappointing outcomes in the race to release a large language model that outperforms OpenAI’s GPT-4 model, which is nearly two years old, according to three sources familiar with private matters.
The so-called ‘training runs’ for large models can cost tens of millions of dollars by simultaneously running hundreds of chips. They are more likely to have hardware-induced failure given how complicated the system is; researchers may not know the eventual performance of the models until the end of the run, which can take months.
Another problem is large language models gobble up huge amounts of data, and AI models have exhausted all the easily accessible data in the world. Power shortages have also hindered the training runs, as the process requires vast amounts of energy.
To overcome these challenges, researchers are exploring “test-time compute,” a technique that enhances existing AI models during the so-called “inference” phase, or when the model is being used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing the best path forward.
This method allows models to dedicate more processing power to challenging tasks like math or coding problems or complex operations that demand human-like reasoning and decision-making.
“It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer,” said Noam Brown, a researcher at OpenAI who worked on o1, at TED AI conference in San Francisco last month.
OpenAI has embraced this technique in their newly released model known as “o1,” formerly known as Q* and Strawberry, which Reuters first reported in July. The O1 model can “think” through problems in a multi-step manner, similar to human reasoning. It also involves using data and feedback curated from PhDs and industry experts. The secret sauce of the o1 series is another set of training carried out on top of ‘base’ models like GPT-4, and the company says it plans to apply this technique with more and bigger base models.
At the same time, researchers at other top AI labs, from Anthropic, xAI, and Google DeepMind, have also been working to develop their own versions of the technique, according to five people familiar with the efforts.
“We see a lot of low-hanging fruit that we can go pluck to make these models better very quickly,” said Kevin Weil, chief product officer at OpenAI at a tech conference in October. “By the time people do catch up, we’re going to try and be three more steps ahead.”
Google and xAI did not respond to requests for comment and Anthropic had no immediate comment.
The implications could alter the competitive landscape for AI hardware, thus far dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capital investors, from Sequoia to Andreessen Horowitz, who have poured billions to fund expensive development of AI models at multiple AI labs including OpenAI and xAI, are taking notice of the transition and weighing the impact on their expensive bets.
“This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference,” Sonya Huang, a partner at Sequoia Capital, told Reuters.
Demand for Nvidia’s AI chips, which are the most cutting edge, has fueled its rise to becoming the world’s most valuable company, surpassing Apple in October. Unlike training chips, where Nvidia dominates, the chip giant could face more competition in the inference market.
Asked about the possible impact on demand for its products, Nvidia pointed to recent company presentations on the importance of the technique behind the o1 model. Its CEO Jensen Huang has talked about increasing demand for using its chips for inference.
“We’ve now discovered a second scaling law, and this is the scaling law at a time of inference…All of these factors have led to the demand for Blackwell being incredibly high,” Huang said last month at a conference in India, referring to the company’s latest AI chip.
Viễn Đông Mobile là cửa hàng chuyên kinh doanh các sản phẩm điện tử phục vụ nhu cầu chơi game, bao gồm:
- Gaming phone: Điện thoại cấu hình mạnh, tối ưu cho việc chơi game.
- Máy tính bảng chuyên gaming: Màn hình lớn, hiệu năng cao, trải nghiệm game tốt hơn.
- Phụ kiện cao cấp: Tai nghe, bàn phím, chuột,… hỗ trợ game thủ.
Thông tin liên hệ:
- Địa chỉ: 211 đường 3/2, phường 10, quận 10, TP.HCM
- Điện thoại: 0777600020
- Email: [email protected]
Bản đồ chỉ đường