La Artificial intelligence running locally on Windows 11 It's no longer just for labs or large corporations. Today you can run advanced models directly on your PC, without relying on the cloud, taking advantage of the CPU, GPU, and even the NPU if your machine is state-of-the-art. And the key to achieving this easily is... ONNX Runtime integrated with Windows ML and .NET.
In the following lines we will see, calmly but without beating around the bush, How to use local AI with ONNX Runtime on Windows 11What ONNX and its runtime are, how it fits with Windows ML, what you need to install, how to set up real-world examples in .NET/WinUI 3, how to leverage DirectML and hardware acceleration, and what practical scenarios you can cover (image classification, speech-to-text, local language models, RAG, etc.). You'll see some code, but the focus is on you understanding the complete flow of Load, prepare, and run ONNX models on your machine.
What is ONNX and what role does ONNX Runtime play in Windows 11?
The standard ONNX (Open Neural Network Exchange) It's an open format designed to describe neural network models in an interoperable way. Simply put: you can train a model in PyTorch or TensorFlowexport it to ONNX and then use it on Windows, in the cloud, on the web, or wherever you want, without rewriting everything from scratch.
Un ONNX model This includes the network structure (layers, connections, types of operation), the weights resulting from training, and the definition of inputs and outputs. Thanks to this, different tools and runtimes can understand it exactly. what data they expect, how to process it, and what they returnFor example, an image classification model receives a preprocessed image and returns a probability vector for each class.
It is based on that standard format ONNX Runtimewhich is an optimized runtime engine for running these models on multiple platforms. This runtime provides a uniform API for:
- Load ONNX models from disk or memory.
- Create inference sessions with different execution options.
- Connect execution providers (Execution Providers) for CPU, GPU and NPU.
- Feed the model with input tensors and retrieve the output tensioners.
In Windows 11, ONNX Runtime is natively integrated through Windows ML and of the ecosystem of Windows App SDK and DirectMLThis allows your desktop applications, whether in C#, WinUI 3, WPF, or even web applications with WebNN, to take advantage of Hardware acceleration without the hassle of manufacturer-specific drivers and SDKs.
Advantages of using AI locally versus in the cloud
Work with local AI models in Windows 11 It has several very clear advantages over relying exclusively on remote services such as ChatGPT, Gemini, or others:
First, the data privacy and confidentiality It improves dramatically because the processing is done on your own device. When you run an ONNX model on your PC, your documents, audio, images, or medical records never leave your network, which greatly reduces the risk of leaks or misuse of information.
There is also the factor cost and controlBy not relying on continuous requests to a paid cloud API, you avoid unexpected bills and can deploy solutions that work at scale across an organization without multiplying subscription costs. You invest in the hardware once and get the most out of it with ONNX Runtime and Windows ML.
La low latency This is another strength. If you're doing computer vision inferences, contextual assistants, or speech recognition, each round trip to the cloud adds milliseconds. Running the model locally reduces response time and improves the feeling of fluidity, which is critical in interactive applications or on edge devices.
Last but not least is the independence from the Internet connectionIf your app needs to run on an airplane, in a remote factory, or in any environment with limited networking, having the ONNX models embedded within the application itself (for example, inside the MSIX) ensures that the AI ​​will continue to function flawlessly, just like in PhotoPrism with local AI.
Windows ML, ONNX Runtime and DirectML: how it all fits together
In the modern Windows 11 ecosystem, Windows ML acts as a middleware layer It unifies the management of CPU, GPU, and NPU for AI model inference. Its mission is to coordinate available hardware resources so that ONNX model execution is as efficient as possible, without you having to wrestle with the details of each chip.
Windows ML is tightly integrated with ONNX RuntimeIt reuses its APIs, relies on its Execution Providers (EPs), and delegates the compilation and optimization of the model to it. Microsoft also handles distribute and maintain both ONNX Runtime and the EPs from the various manufacturersThis greatly simplifies the packaging of your applications and reduces external dependencies.
A key component in this puzzle is DirectMLThe abstraction layer that allows you to run ML models by leveraging the GPU (and the NPU on compatible systems) in a unified way. ONNX Runtime offers a dedicated EP for DirectML, so you can Create inference sessions that automatically use CPU, GPU, or NPU depending on what is available on the computer.
Big manufacturers like AMD, Intel, NVIDIA and Qualcomm They collaborate with Microsoft to provide optimized EPs: AMD integrates Ryzen AI, Intel combines OpenVINO with Windows ML, NVIDIA provides TensorRT for RTX GPUs, and Qualcomm tunes the Snapdragon X NPU. Thanks to this, the same ONNX models can take advantage of very diverse hardware without you having to change your high-level code; if you need to fine-tune the system, learn how to configure performance profiles.
All of this comes packaged in the Windows App SDK (from version 1.8.1 onwards) and is officially supported on devices with Windows 11 24H2 or higherThis makes the operating system a very robust platform for local AI experiences in production.
Prerequisites for using local AI with ONNX Runtime on Windows 11
For starting to Run ONNX models locally on Windows 11 From your .NET applications, it's advisable to review some basic environment and tool requirements:
At the system level, you will need Updated Windows 11 (ideally build 22621 or higher) and, if you're going to deploy modern apps, work with the Windows App SDK and WinUI 3 projects packaged in MSIXThis gives you access to the Windows ML and DirectML APIs, as well as the latest AI integration features.
Regarding the development environment, the most common option is to use Visual Studio 2022 or higher with the .NET desktop development workload enabled. From there you can create projects WinUI 3, WPF, .NET MAUI or even console apps in C# or VB.NET that reference ONNX Runtime and Windows ML.
In terms of dependencies, there are several key NuGet packages that are repeated in most scenarios: Microsoft.ML.OnnxRuntime for CPU, Microsoft.ML.OnnxRuntime.DirectML If you want to use GPU with DirectML, Microsoft.AI.MachineLearning to work with Windows ML in WinUI 3, and libraries such as SixLabors.ImageSharp or similar for preprocessing input images.
Of course, you'll also need at least one compatible ONNX modelYou can download pre-trained models from the ONNX Model Zoo (for example, ResNet, SqueezeNet, classification models, object detection, or NLP), or convert your own models from PyTorch or TensorFlow using tools such as AI Toolkit for VS Code, tf2onnx or onnxruntime-tools.
In terms of hardware, although ONNX Runtime is CPU-optimized and works perfectly well without a GPU, you'll notice a clear improvement when you take advantage of it. local accelerators such as GPU or NPUespecially with large models (language models, Stable Diffusion, Whisper, etc.). See our essential hardware checklist.
How to install and use ONNX Runtime in .NET and WinUI 3 projects

One of the most direct ways to start Using ONNX Runtime on Windows 11 It's about creating a desktop application in .NET. The basic pattern is repeated in both C# and VB.NET: reference the package, load the model, prepare the inputs, and run the inference session.
In a traditional .NET project (for example, a console app or a classic desktop application), the typical flow is to add the package Microsoft.ML.OnnxRuntime from the Visual Studio NuGet Package Manager, and then create a InferenceSession starting from the path to your .onnx file. That session is the one you will reuse every time you want to make a prediction.
In the case of an application WinUI 3 desktop With the Windows App SDK, the process is similar, but you'll typically combine ONNX Runtime with other supporting libraries. For example, a typical image classification demo might include:
- Microsoft.ML.OnnxRuntime and Microsoft.ML.OnnxRuntime.Managed for the execution of the model.
- SixLabors.ImageSharp to load, resize and normalize images.
- SharpDX.DXGI if you want to explicitly select a graphics adapter for DirectML.
Within the main window (for example, MainWindow.xaml.cs) you usually create a model initialization method that configures the SessionOptionsThe execution provider (CPU or DirectML) and the path to the ONNX model. This way, your application starts, initializes the session only once, and then reuses that session for each inference call, maximizing performance.
If you prefer to work with Windows ML directly In a WinUI 3 project, you will add the package Microsoft.AI.MachineLearning and you will copy your ONNX model (for example, model_mnist.onnx or an image classification model) to a folder like Assets/ML and you will configure its properties to be included in the compilation output as content.
Practical example: Image classification with ONNX Runtime and DirectML
A classic and very illustrative example of Local AI with ONNX Runtime in Windows 11 It is the classification of images using a model such as ResNet50 or SqueezeNetThe conceptual flow is always the same, although the code may vary depending on the framework you use.
In a WinUI 3 app, you can design a simple interface with a button to select a photoa control Image to display the chosen image or with a TextBlock to list predictionsThe user clicks the button, a FileOpenPicker opens, an image is selected, and in the background, the program processes it and passes it to the ONNX model.
Image preprocessing usually involves load it in 24-bit RGB formatresize it to the size the model expects (for example, 224×224 pixels), crop if necessary, and normalize the values ​​of each channel according to the means and standard deviations indicated in the model's documentation. With ImageSharp, this translates to resizing the image, iterating through the pixels row by row, and filling a DenseTensor with normalized values.
After preparing the tensioner, a OrtValue from tensor memory (without unnecessary copies), a dictionary is built with the model entries (usually a single tensor called something like "data" or similar) and the method is invoked Run from the InferenceSession. The output is usually a probability tensor per class, which you can post-process using a softmax so that the values ​​are in and sum to 1.
Finally, you order the probabilities from highest to lowest, and you are left with the 10 labels with more confidence and display them on screen. This is usually done using an auxiliary class (for example, Prediction with Label and Confidence properties) and a label table (LabelMap) that maps each output index to a semantic class (dog, car, keyboard, etc.).
This type of example demonstrates how ONNX Runtime, DirectML, and WinUI 3 fit together to deliver a Experience local AI with a modern UI, hardware acceleration, and no need for a constant internet connection..
Working with Windows ML and ONNX models in WinUI 3
In addition to using "pure" ONNX Runtime, you can rely on Windows ML within a WinUI 3 project to benefit from native operating system integration. This approach is very useful when you want to package your application in MSIX, use the Windows App SDK, and ensure consistent behavior on modern Windows 11 devices.
The typical process consists of Add the Microsoft.AI.MachineLearning package to the project, copy your ONNX model (for example, model_mnist.onnx or an image classification model) to a folder like Assets/ML and configure its properties to be included in the compilation output as content.
From the C# code, you load the model using the Windows ML APIs, You create a session and you prepare the input data in the expected format (for example, a tensor with a 28×28 grayscale image for MNIST, or a 224×224 image for a more complex vision model). The session then handles using DirectML and hardware acceleration transparently if the device supports it.
Once the inference is performed, you work with the outputs to obtain the most probable value (for example, the most probable digit in the case of MNIST) or a set of probabilities that you can display in the interface. All of this integrates seamlessly with the WinUI 3 XAML UI, allowing you to update visual controls with AI results.
Windows ML also handles execution providers and coordinate the use of CPU, GPU, and NPU depending on the available hardware, without having to change your models or drastically reorganize your code.
Local language models, RAGs, and other advanced examples
Local AI in Windows 11 isn't limited to computer vision. With ONNX Runtime and DirectML, you can also run Language models (LLM) converted to ONNX, voice-to-text models like Whisper, or segmentation and image generation architectures like Stable Diffusion.
Microsoft maintains a number of official examples and the AI ​​Dev GalleryAn open application with over 25 interactive demos showing how to integrate local models into real apps: from AI-powered audio editors that index snippets based on a text query, to smart note-taking apps that use models like Phi-3 for summary, autocomplete, and offline reasoning.
A very relevant pattern is the Recovery Enhanced Generation (RAG)In this approach, you combine a local language model with an external knowledge base. For example, a WPF app can load a PDF, slice it, and vectorize it using a local embeddings model, store those vectors, and, at query time, retrieve the most relevant fragments so that the language model can generate responses based on real data, without having to modify the model's weights.
These examples demonstrate that you can build functional contextual assistants, semantic search systems, audio transcription, and special-purpose chatbots. completely local on Windows 11, leveraging ONNX Runtime and DirectML to keep performance under control.
In addition, there are reference projects for running models such as Phi-3, Flame 3 or Mistral Optimized with DirectML, both in ONNX format and directly from PyTorch, using lightweight applications (e.g., with a Gradio-style web interface) to test performance and adjust settings before integrating them into a final application.
Hardware-accelerated AI on the web with ONNX Runtime Web and WebNN
Another interesting way to use Local AI with ONNX Runtime in Windows 11 It's the web environment, supporting you in ONNX Web Runtime and the WebNN APIIn this context, heavyweight models such as Stable Diffusion, Segment Anything, or Whisper can be run directly on the device's GPU or NPU via DirectML, but from a modern web application.
There are public demonstrations that show, for example, Stable Diffusion Turbo generating images from text in just a few seconds using WebNN and DirectML on machines with advanced AI capabilities. The same applies to Segment Anythingwhich allows you to crop any object from a user-uploaded image, or with Whisper Base, which converts speech to text locally in the browser.
In these scenarios, ONNX Runtime Web handles load the ONNX model in the browser contextmanage input and output tensors and communicate with WebNN to route execution to DirectML on the Windows 11 device. The result is that you can deliver rich AI experiences without sending audio, images, or text to a remote server.
This combination makes Windows 11 a very powerful platform not only for native desktop applications, but also for web applications with hybrid inferencewhere part of the logic resides in the cloud and part is executed directly on the user's computer.
For developers already using web technologies (JavaScript, SPA frameworks, etc.), this approach is a convenient way to without needing to rewrite the entire application Make the leap to local AI without having to rewrite the entire application in .NET or C#.
Good practices, safety and responsible development
By integrating AI features in Windows applicationsMicrosoft recommends following a series of responsible development guidelines, especially in the context of generative models and processing sensitive content. This is not just about performance, but also about minimizing potential harm or misuse.
Windows AI APIs incorporate mechanisms for text content moderation through services like Microsoft Foundry, which helps filter out potentially problematic results. Even when working with local models, it's advisable to apply additional layers of input validation and output review in critical domains (finance, health, safety, etc.).
From an application design perspective, it is recommended validate the inputs before launching the inference (image sizes, file types, text length), manage GPU crashes or lack of acceleration with alternative CPU routes, and avoid repeatedly loading the model on each request. Keeping the inference session alive and reusing it is key to good performance.
It is also important to consider the optimization of ONNX modelsTechniques such as quantization, pruning, and operation merging can reduce model size and improve speed without significantly sacrificing accuracy. Tools like the AI ​​Toolkit for VS Code or the ONNX Runtime's own build infrastructure (including the new build APIs introduced in recent versions) are designed to facilitate this process.
Finally, don't forget to check the hardware availability and developer mode on the device where you will deploy your apps, especially if you are going to test on end-user devices, kiosks, industrial devices, or environments with strict security policies.
With all these components well aligned—ONNX as the standard format, ONNX Runtime and Windows ML as execution engines, DirectML and the EPs from AMD, Intel, NVIDIA, and Qualcomm to leverage CPU, GPU, and NPU, and tools like AI Dev Gallery or AI Toolkit—Windows 11 becomes a very powerful platform for building Local, private, fast, and production-ready AI applications, from simple image classifiers to complex assistants based on language models and RAG.