테크 공부 | 오픈소스, Inference, CUDA

Aug 13. 2025

On August 5th, OpenAI announced that it would freely share its model gpt-oss-20b and gpt-oss-120b. These two models were trained on NVIDIA H100 GPUs and run inferences best on NVIDIA’s CUDA platform across the globe.

Today, we will focus on learning about three concepts below and conclude with a summary. For busy readers, feel free to skip to the end for a quick summary.

Open Source

Inference

CUDA Platform

What does Open Source mean?

Open Source refers to the underlying blueprint of a product, or the “source code” of a piece of software, that is made publicly available. This means anyone can view, modify, study, or distribute the code for free. But it’s not just about giving something away. It encourages community collaboration and peer review and contributes to decentralizing technology development by allowing contributors from around the world to participate. Also, these open-source projects enable contributors to report any bugs or suggest improvements, which leads to faster innovation and better security, resulting in a more robust product.

Slightly off-tangent (but still relevant), why is OpenAI open-sourcing these models? Have you ever heard the famous quote: “If the product is free, you are the product”? Does using open-source software make us, the users, the product? What’s the deal here?

Open-sourcing is not necessarily at odds with the company’s for-profit business model. In fact, apart from gpt-oss-20b and gpt-oss-120, OpenAI provides more advanced features through its Chat GPT Plus ($20/mo) and Pro ($200/mo) plans. The open-sourcing of these two recently released models can be viewed as a nuance strategy to balance its commitment to AI research with its commercial goals. OpenAI is proactively responding to the rapidly developing yet highly competitive AI market. While they are currently one of the frontiers in the AI landscape, open-sourcing allows the company to prevent its competitors from dominating the open-source community and the growing market. Also, the open-source community acts as a massive research and development team around the globe, discovering new applications and often finding issues that help upgrade the product itself.

In summary, OpenAI’s open-source strategy is not meant to compete directly with its most advanced, proprietary offerings. Instead, they are fostering a developer ecosystem through learning and research, with the long-term goal of building a funnel for proprietary services.

What does Inference mean?

Inference refers to the process of using a trained machine learning model to make predictions or decisions on new, unseen data. It’s like putting a model’s knowledge into action. When these models are trained, they detect and learn patterns from a large dataset. Once the training phase is complete, the model is ready to be used for inference. This is when researchers and engineers feed new input (i.e. images, texts, spreadsheets) and the model generates an output (i.e. identifying objects in the images, translating texts, and analyzing numbers on spreadsheets) based on what it learned from the training. In other words, inference is the “application” or “prediction” stage of the machine learning (ML) lifecycle.

So, why do we care about inference? Inference is crucial in AI because it’s the whole point of creating the model in the first place — it’s how we get tangible, real-world results. After a model is trained, the inference stage is when knowledge is applied to make predictions and decisions on new, unseen inputs. This could be anything from interacting with a chatbot, receiving shopping recommendations on Instagram, or predicting a stock’s future price. Without inference, a trained model is just a large storage of data and weights; it has no ability to do anything useful for us, the end users.

What is the CUDA Platform?

The CUDA platform is a “parallel computing platform and programming model” developed by NVIDIA. Its primary purpose is to enable software developers to use NVIDIA GPUs for general-purpose computing (known as GPGPU), going beyond just rendering graphics.

Already too complicated? Let’s break it down and review the concept of CPU vs GPU first. CPU refers to a computer’s central processing unit. It’s very good at handling tasks sequentially, one after another. However, GPU, a graphics processing unit, has thousands of smaller cores designed to perform a massive number of simple calculations simultaneously. So developers are writing codes for these GPU cores using familiar programming languages like C, C++, or Python (or any other supported languages).

In short, CUDA lets developers write code to run tasks in parallel on thousands of GPU cores and consists of a toolkit with libraries, development tools, and a runtime environment, enabling faster and more efficient processing of computationally demanding tasks. However, CUDA is closed source (meaning it only works with NVIDIA hardware) as well as has some memory management challenges along with technical complexity in navigating its use.

Summary

Open Source: the “source-code” of a product — most commonly software — made freely available to the public. A strategy OpenAI uses to contribute to AI research and form a development community in advancing their models with a long-term goal of commercial use.

Inference: the process of using a trained ML model to make predictions on new, unseen data. After the training phase, we find these models being used in financial predictions, chatbot interaction, and social media recommendations, only to name a few.

The CUDA Platform: a parallel computing platform and programming model developed by NVIDIA for GPGPU to accelerate high computational tasks. It requires advanced technical knowledge to use and only works with NVIDIA GPUs.

keyword

작가의 이전글크리스마스이브에 발견된 계약서 (2)9, 10월의 이모저모작가의 다음글