<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://amirsojoodi.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://amirsojoodi.github.io/" rel="alternate" type="text/html" /><updated>2026-03-26T01:13:39-04:00</updated><id>https://amirsojoodi.github.io/feed.xml</id><title type="html">Amir’s Homepage</title><subtitle>Research Assistant at Queen&apos;s University</subtitle><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><entry><title type="html">PhD Dissertation LaTeX Template</title><link href="https://amirsojoodi.github.io/posts/PhD-Dissertation-LaTeX-Template/" rel="alternate" type="text/html" title="PhD Dissertation LaTeX Template" /><published>2026-01-23T00:00:00-05:00</published><updated>2026-01-23T00:00:00-05:00</updated><id>https://amirsojoodi.github.io/posts/LaTeX-PhD-Template</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/PhD-Dissertation-LaTeX-Template/"><![CDATA[<p>This is an unofficial LaTeX template for PhD Dissertation (Jan 2026) at Queen’s University, ECE department, PPRL group.</p>

<p>You can use this template as a starting point for your own dissertation. It includes the basic structure and formatting required by the university, as well as some additional features such as a glossary and macros for common terms.</p>

<p>View it on <a href="https://github.com/amirsojoodi/Queensu-PhD-Thesis-Template">Github</a> or <a href="https://www.overleaf.com/read/pdcwwrgtjfvy#423f0e">Overleaf</a>.</p>

<ul>
  <li>The glossary is created automatically. See the sample to understand how to define new terms and use them in the text.</li>
  <li>Several macros have been defined in preamble.sty to make the text more readable. Use the same format to create your own macros.</li>
  <li>To build locally, I suggest Linux or WSL, with <code class="language-plaintext highlighter-rouge">latexmk</code> and <code class="language-plaintext highlighter-rouge">makeglossaries</code>.</li>
</ul>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="LaTeX" /><category term="Templates" /><category term="Tips" /><summary type="html"><![CDATA[This is an unofficial LaTeX template for PhD Dissertation (Jan 2026) at Queen’s University, ECE department, PPRL group.]]></summary></entry><entry><title type="html">Setting up Gemini CLI on WSL Ubuntu 24.04</title><link href="https://amirsojoodi.github.io/posts/Setting-up-Gemini-CLI/" rel="alternate" type="text/html" title="Setting up Gemini CLI on WSL Ubuntu 24.04" /><published>2025-12-29T00:00:00-05:00</published><updated>2025-12-29T00:00:00-05:00</updated><id>https://amirsojoodi.github.io/posts/Pomodoro</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/Setting-up-Gemini-CLI/"><![CDATA[<p>During the past years, I have been under heavy workload of studies and work, and I couldn’t really explore the new AI tools that are coming out these days. Today, I was searching for a pomodoro timer app to help me focus on my work, and I thought, why not building one myself?? I think it was Afshin that had an influence on me. Check out his game (brilliant idea <em>and</em> vibecoded!): <a href="https://c3c.arefi.info/">Connect 3 Chess</a>.</p>

<p>So, I finally had the motivation and time (at the same time) to explore and experiment with a vibcoding agent. I decided to start with Gemini CLI.</p>

<h2 id="setting-up-gemini-cli-on-wsl-ubuntu-2404">Setting up Gemini CLI on WSL Ubuntu 24.04</h2>

<p>I used this <a href="https://www.zdnet.com/article/geminis-command-line-tool-is-a-productivity-game-changer-and-its-free-how-i-use-it/">guide</a> as a reference.</p>

<h3 id="step-1-install-nodejs-and-npm">Step 1: Install Node.js and npm</h3>

<p>I used <code class="language-plaintext highlighter-rouge">nvm</code>:
(see <a href="https://nodejs.org/en/download">here</a> for more details)</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Download and install nvm:</span>
curl <span class="nt">-o-</span> https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash

<span class="c"># in lieu of restarting the shell</span>
<span class="se">\.</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/.nvm/nvm.sh"</span>

<span class="c"># Download and install Node.js:</span>
nvm <span class="nb">install </span>24

<span class="c"># Verify the Node.js version:</span>
node <span class="nt">-v</span> <span class="c"># Should print "v24.12.0".</span>

<span class="c"># Verify npm version:</span>
npm <span class="nt">-v</span> <span class="c"># Should print "11.6.2".</span>
</code></pre></div></div>

<h3 id="step-2-install-and-run-gemini-cli">Step 2: Install and run Gemini CLI</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>npm <span class="nb">install</span> <span class="nt">-g</span> @google/gemini-cli
</code></pre></div></div>

<p>After installation, run the following command to start the Gemini CLI setup. But before that: <strong>I believe wherever you run it, it will read the content of that path. So make sure not to run it in some confidential or private path.</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gemini

 ███            █████████  ██████████ ██████   ██████ █████ ██████   █████ █████
░░░███         ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███
  ░░░███      ███     ░░░  ░███  █ ░  ░███░█████░███  ░███  ░███░███ ░███  ░███
    ░░░███   ░███          ░██████    ░███░░███ ░███  ░███  ░███░░███░███  ░███
     ███░    ░███    █████ ░███░░█    ░███ ░░░  ░███  ░███  ░███ ░░██████  ░███
   ███░      ░░███  ░░███  ░███ ░   █ ░███      ░███  ░███  ░███  ░░█████  ░███
 ███░         ░░█████████  ██████████ █████     █████ █████ █████  ░░█████ █████
░░░            ░░░░░░░░░  ░░░░░░░░░░ ░░░░░     ░░░░░ ░░░░░ ░░░░░    ░░░░░ ░░░░░

Tips <span class="k">for </span>getting started:
1. Ask questions, edit files, or run commands.
2. Be specific <span class="k">for </span>the best results.
3. Create GEMINI.md files to customize your interactions with Gemini.
4. /help <span class="k">for </span>more information.
</code></pre></div></div>

<p>After authentication, you can start using Gemini CLI! That’s it!!</p>

<h2 id="lets-try-it-out">Let’s try it out</h2>

<p>I gave it a straightforward request.</p>

<pre><code class="language-txt">Create a desktop app for me to for the purpose of setting focused 
and DoNotDisturb timers, including breaks, work, etc. I'd like it
to have a small window, with minimal and beautiful colors. 
Whenever the timer ends, I want it to make a small bell-like 
notification. I want the focused time and break time to be 
customizable but also set as default with 3-4 presets. I don't 
want the app to stay on top to distract me. Build this app for 
running from a WSL Ubuntu terminal that launches a GUI. With Python.
</code></pre>

<p>Well, overall, after almost an hour, it did a pretty good job for me, and after asking me to install several dependencies, and fixing several issues, it was able to run the app successfully. However, the soundbell notification did not work due to WSL issue of not being able to access the audio hardware of Windows. Other than that, the app worked fine.
Also, I asked for some improvements and features, like repetitions, progress, etc., and it was able to implement them pretty well.</p>

<p>For my record, I had to do the following to make the app work:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>python3.12-venv
pyenv virtualenv pomodoro
pyenv activate pomodoro
<span class="nb">sudo </span>apt <span class="nb">install </span>python3-tk
pip3 <span class="nb">install </span><span class="nv">playsound</span><span class="o">==</span>1.2.2
<span class="nb">sudo </span>apt-get <span class="nb">install </span>libgirepository1.0-dev libgirepository2.0-dev
<span class="nb">sudo </span>apt <span class="nb">install </span>libcairo2-dev pkg-config python3-dev
pip <span class="nb">install </span>PyGObject
</code></pre></div></div>

<p>I also install aplay to test sound playing from WSL:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>alsa-utils
aplay /path/to/sample.wav
</code></pre></div></div>

<p>Which didn’t work anyway.</p>

<p>Then, I thought why not running it from Windows, with a clickable icon? So I asked it to generate it, and it did a great job again. It created a <code class="language-plaintext highlighter-rouge">.bat</code> and a <code class="language-plaintext highlighter-rouge">.vbs</code> file to launch the app without a terminal window. I also updated the color scheme a bit to match my Windows dark/green theme.</p>

<p>Here is how the app looks:</p>

<p><img src="https://amirsojoodi.github.io/files/Posts/VibeCoding/2025-12-29-Pomodoro.png" alt="Pomodoro App" /></p>

<p>The code generated by Gemini CLI can be found <a href="https://pastebin.com/tA5wrw2c">here</a>.</p>

<p>See this summary of the session:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> /quit
╭─────────────────────────────────────────────────────────────────────────────────╮
│                                                                                 │
│  Agent powering down. Goodbye!                                                  │
│                                                                                 │
│  Interaction Summary                                                            │
│  Session ID:                 erased                                             │
│  Tool Calls:                 9 <span class="o">(</span> ✓ 8 x 1 <span class="o">)</span>                                      │
│  Success Rate:               88.9%                                              │
│  User Agreement:             88.9% <span class="o">(</span>9 reviewed<span class="o">)</span>                                 │
│  Code Changes:               +936 <span class="nt">-50</span>                                           │
│                                                                                 │
│  Performance                                                                    │
│  Wall Time:                  9h 58m 3s                                          │
│  Agent Active:               8m 21s                                             │
│    » API Time:               5m 41s <span class="o">(</span>68.2%<span class="o">)</span>                                     │
│    » Tool Time:              2m 39s <span class="o">(</span>31.8%<span class="o">)</span>                                     │
│                                                                                 │
│                                                                                 │
│  Model Usage                 Reqs   Input Tokens   Cache Reads  Output Tokens   │
│  ────────────────────────────────────────────────────────────────────────────   │
│  gemini-2.5-flash-lite         16         58,787             0          1,734   │
│  gemini-2.5-pro                22        192,728       211,514         19,758   │
│  gemini-2.5-flash               3         37,601        11,406            562   │
│                                                                                 │
│  Savings Highlight: 222,920 <span class="o">(</span>43.5%<span class="o">)</span> of input tokens were served from the cache, │
│  reducing costs.                                                                │
╰─────────────────────────────────────────────────────────────────────────────────╯
</code></pre></div></div>

<p>Well, that’s it for now. I am very much excited to do more exciting things with these agents! Let’s see how it goes.</p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Programming" /><category term="VibeCoding" /><category term="Agents" /><category term="Ubuntu" /><category term="WSL" /><summary type="html"><![CDATA[During the past years, I have been under heavy workload of studies and work, and I couldn’t really explore the new AI tools that are coming out these days. Today, I was searching for a pomodoro timer app to help me focus on my work, and I thought, why not building one myself?? I think it was Afshin that had an influence on me. Check out his game (brilliant idea and vibecoded!): Connect 3 Chess.]]></summary></entry><entry><title type="html">Migrate from Ubuntu 20.04 in WSL to 24.04 LTS</title><link href="https://amirsojoodi.github.io/posts/Migrate-from-Ubuntu-20-04-to-24-04-LTS" rel="alternate" type="text/html" title="Migrate from Ubuntu 20.04 in WSL to 24.04 LTS" /><published>2025-02-12T00:00:00-05:00</published><updated>2025-02-12T00:00:00-05:00</updated><id>https://amirsojoodi.github.io/posts/WSL</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/Migrate-from-Ubuntu-20-04-to-24-04-LTS"><![CDATA[<p>I have been using Ubuntu 20.04 in WSL for a while. Recently, I decided to upgrade it to the latest LTS version, which is 24.04 LTS. I have two options to do this: a fresh installation or an in-place upgrade using the <code class="language-plaintext highlighter-rouge">do-release-upgrade</code> command. ALthoguh I decided to go with the messy in-place upgrade first, I later changed my mind and went for a fresh installation, especially because I had to do two major version upgrades (20.04 -&gt; 22.04 -&gt; 24.04). So, anyways. Here are the steps I followed. I specifically put this post here for future reference in case I need to install all the various packages I need again.</p>

<h2 id="backup">Backup</h2>

<p>First things, first.</p>

<p>Either back up your entire WSL instance using:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wsl <span class="nt">--export</span> Ubuntu-20.04 ubuntu-20.04-backup.tar
</code></pre></div></div>

<p>Or back the home dir:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> ~
<span class="nb">tar</span> <span class="nt">-czvf</span> ubuntu-home-backup.tar.gz <span class="nv">$HOME</span>
<span class="nb">mv </span>ubuntu-home-backup.tar.gz /mnt/c/Users/YourWindowsUsername/some/dir/
</code></pre></div></div>

<h2 id="fresh-installation-of-ubuntu-2404-lts-in-wsl">Fresh Installation of Ubuntu 24.04 LTS in WSL</h2>

<p>Install Ubuntu 24.04 LTS from Microsoft store. Open the installed instance once to finish the installation.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /etc/lsb-release
<span class="nv">DISTRIB_ID</span><span class="o">=</span>Ubuntu
<span class="nv">DISTRIB_RELEASE</span><span class="o">=</span>24.04
<span class="nv">DISTRIB_CODENAME</span><span class="o">=</span>noble
<span class="nv">DISTRIB_DESCRIPTION</span><span class="o">=</span><span class="s2">"Ubuntu 24.04.3 LTS"</span>
</code></pre></div></div>

<p>After installation, in powershell check the installed distros:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wsl <span class="nt">--list</span> <span class="nt">--verbose</span>
</code></pre></div></div>

<p>Login to the new Ubuntu 24.04 LTS instance and restore your home dir backup if you made one:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> ~
<span class="nb">mv</span> /mnt/c/Users/YourWindowsUsername/some/dir/ubuntu-home-backup.tar.gz <span class="nb">.</span>
<span class="nb">tar</span> <span class="nt">-xzvf</span> ubuntu-home-backup.tar.gz
</code></pre></div></div>

<p>Check if everything is okay, especially hidden files like .bashrc, .vimrc, .ssh, etc.</p>

<h2 id="installing-packages">Installing Packages</h2>

<p>Now, you need to reinstall all the packages you need.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt update <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>apt upgrade <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>build-essential cmake git vim wget curl htop net-tools unzip zip <span class="nt">-y</span>
<span class="c"># some of my frequently used packages:</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>ninja-build gdb valgrind <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>aspell tldr colordiff tree lolcat neofetch fastfetch <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>zenity evince <span class="nt">-y</span>
</code></pre></div></div>

<h2 id="other-configurations">Other Configurations</h2>

<h3 id="cuda">CUDA</h3>

<p>If you need CUDA:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
dpkg <span class="nt">-i</span> cuda-keyring_1.1-1_all.deb
<span class="nb">sudo </span>apt update
<span class="nb">sudo </span>apt <span class="nb">install </span>cuda-toolkit
<span class="c"># If you need the drivers too</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>cuda-drivers
</code></pre></div></div>

<h3 id="latex">LaTeX</h3>

<p>I use texlive for LaTeX documents. To install it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>texlive-latex-recommended <span class="nt">-y</span>
<span class="c"># or </span>
<span class="nb">sudo </span>apt <span class="nb">install </span>texlive-latex-extra <span class="nt">-y</span>
<span class="c"># For full installation (7 GB+)</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>texlive-full <span class="nt">-y</span>
</code></pre></div></div>

<h3 id="pyenv">Pyenv</h3>

<p>Install pyenv for managing python versions:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://pyenv.run | bash
</code></pre></div></div>

<h3 id="llvm">LLVM</h3>

<p>For more info, visit <a href="https://apt.llvm.org/">here</a>.</p>

<p>To, install LLVM from the official apt repository, first add the repository key and source:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget <span class="nt">-O</span> - https://apt.llvm.org/llvm-snapshot.gpg.key | <span class="nb">sudo </span>apt-key add -
<span class="nb">sudo </span>apt-add-repository <span class="s2">"deb http://apt.llvm.org/noble/ llvm-toolchain-noble main"</span>
</code></pre></div></div>

<p>Add the following lines to /etc/apt/sources.list.d/llvm-toolchain-noble.list if not already added:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deb http://apt.llvm.org/noble/ llvm-toolchain-noble main
deb-src http://apt.llvm.org/noble/ llvm-toolchain-noble main
<span class="c"># 20</span>
deb http://apt.llvm.org/noble/ llvm-toolchain-noble-20 main
deb-src http://apt.llvm.org/noble/ llvm-toolchain-noble-20 main
<span class="c"># 21</span>
deb http://apt.llvm.org/noble/ llvm-toolchain-noble-21 main
deb-src http://apt.llvm.org/noble/ llvm-toolchain-noble-21 main
</code></pre></div></div>

<p>Then install LLVM 20:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt update
<span class="nb">sudo </span>apt <span class="nb">install </span>nstall clang-20 clang-tools-20 clang-20-doc libclang-common-20-dev <span class="se">\</span>
  libclang-20-dev libclang1-20 clang-format-20 python3-clang-20 clangd-20 clang-tidy-20 <span class="se">\</span>
  libllvm-20-ocaml-dev libllvm20 llvm-20 llvm-20-dev llvm-20-doc llvm-20-examples <span class="se">\</span>
  llvm-20-runtime libomp-20-dev lld-20
</code></pre></div></div>

<h3 id="fastfetch">fastfetch</h3>

<p>At the time of writing this post, fastfetch is not available for Ubuntu 24.04 and some of its docs have inconsistencies. Here is what worked for me:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>git cmake build-essential libpci-dev libvulkan-dev libwayland-dev libxrandr-dev libxcb-randr0-dev libdconf-dev
git clone https://github.com/LinusDierheimer/fastfetch.git
<span class="nb">cd </span>fastfetch
<span class="c"># I changed the branch to master</span>
git checkout master
<span class="nb">mkdir </span>build <span class="o">&amp;&amp;</span> <span class="nb">cd </span>build
cmake ..
cmake <span class="nt">--build</span> <span class="nb">.</span> 
<span class="nb">sudo </span>cmake <span class="nt">--install</span> <span class="nb">.</span>
<span class="c"># Now you can run fastfetch by just typing:</span>
fastfetch
<span class="c"># or </span>
flashfetch
</code></pre></div></div>

<p>How to make ll (ls -alF) show the year in the timestamp:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">alias </span><span class="nv">ll</span><span class="o">=</span><span class="s1">'ls -alF --time-style=long-iso'</span>
</code></pre></div></div>

<h2 id="post-migration">Post-migration</h2>

<p>After everything is done, you can remove the old WSL instance if you made a full backup:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wsl <span class="nt">--unregister</span> Ubuntu-20.04
</code></pre></div></div>

<p>Some tips on removing old packages and cleaning up to make the backup smaller.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt autoremove <span class="nt">--purge</span>
<span class="nb">sudo </span>apt autoclean

<span class="c"># Check which packages are large and remove if not needed </span>
<span class="c"># This command shows the 30 largest installed packages</span>
dpkg-query <span class="nt">-Wf</span><span class="o">=</span><span class="s1">'${Installed-Size}\t${Package}\n'</span> <span class="se">\</span>
  | <span class="nb">sort</span> <span class="nt">-n</span> | <span class="nb">tail</span> <span class="nt">-n</span> 30 <span class="se">\</span>
  | <span class="nb">awk</span> <span class="s1">'{printf "%10.2f MB %s\n", $1/1024, $2}'</span>

<span class="c"># You can search for a group of packages like this:</span>
dpkg-query <span class="nt">-Wf</span><span class="o">=</span><span class="s1">'${Installed-Size}\t${Package}\n'</span> | <span class="nb">grep </span>cuda

<span class="c"># You can also add the sizes of the packages from previous command to get total size</span>
dpkg-query <span class="nt">-Wf</span><span class="o">=</span><span class="s1">'${Installed-Size}\t${Package}\n'</span> | <span class="nb">grep </span>cuda <span class="se">\</span>
  | <span class="nb">awk</span> <span class="s1">'{sum += $1} END {printf "Total size: %.2f MB\n", sum/1024}'</span>
</code></pre></div></div>

<p>To remvove old log files:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>journalctl <span class="nt">--vacuum-time</span><span class="o">=</span>7d
<span class="nb">sudo truncate</span> <span class="nt">-s</span> 0 /var/log/syslog
<span class="nb">sudo truncate</span> <span class="nt">-s</span> 0 /var/log/auth.log
<span class="nb">sudo rm</span> <span class="nt">-f</span> /var/log/<span class="k">*</span>.gz
<span class="nb">sudo rm</span> <span class="nt">-f</span> /var/log/<span class="k">*</span>.[0-9]
<span class="nb">sudo rm</span> <span class="nt">-f</span> /var/log/<span class="k">*</span>.[0-9].gz
<span class="nb">sudo rm</span> <span class="nt">-f</span> /var/log/Xorg.pid-<span class="k">*</span>.log
</code></pre></div></div>

<h3 id="compact-in-wsl">Compact in WSL</h3>

<p>To compact the WSL virtual disk after removing old files and packages, first exit all WSL instances and then run (in powershell with admin rights):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wsl <span class="nt">--terminate</span> Ubuntu-20.04
<span class="c"># or </span>
wsl <span class="nt">--shutdown</span>
diskpart 
<span class="c"># in diskpart prompt:</span>
<span class="k">select </span>vdisk <span class="nv">file</span><span class="o">=</span><span class="s2">"Path</span><span class="se">\t</span><span class="s2">o</span><span class="se">\v</span><span class="s2">disk</span><span class="se">\f</span><span class="s2">ile"</span>
<span class="c"># For me it is:</span>
<span class="k">select </span>vdisk <span class="nv">file</span><span class="o">=</span><span class="s2">"C:</span><span class="se">\U</span><span class="s2">sers</span><span class="se">\a</span><span class="s2">mirs</span><span class="se">\A</span><span class="s2">ppData</span><span class="se">\L</span><span class="s2">ocal</span><span class="se">\P</span><span class="s2">ackages</span><span class="se">\C</span><span class="s2">anonicalGroupLimited.Ubuntu20.04LTS_79rhkp1fndgsc</span><span class="se">\L</span><span class="s2">ocalState</span><span class="se">\e</span><span class="s2">xt4.vhdx"</span>
compact vdisk file
</code></pre></div></div>

<p>I also had several versions of CUDA, LLVM, ROCM, HIP, and other packages installed that I no longer needed. With these steps, I was able to reduce the size of the storage around 250 GB to about 50 GB.</p>

<p>For now, I haven’t encountered any compatibility issues with Ubuntu 24.04 LTS in WSL. Everything seems to be working fine so far. If some issues arise, I will update this post accordingly.</p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Linux" /><category term="Ubuntu" /><category term="WSL" /><category term="Windows" /><category term="Tips" /><summary type="html"><![CDATA[I have been using Ubuntu 20.04 in WSL for a while. Recently, I decided to upgrade it to the latest LTS version, which is 24.04 LTS. I have two options to do this: a fresh installation or an in-place upgrade using the do-release-upgrade command. ALthoguh I decided to go with the messy in-place upgrade first, I later changed my mind and went for a fresh installation, especially because I had to do two major version upgrades (20.04 -&gt; 22.04 -&gt; 24.04). So, anyways. Here are the steps I followed. I specifically put this post here for future reference in case I need to install all the various packages I need again.]]></summary></entry><entry><title type="html">Some Tips to Release Binaries</title><link href="https://amirsojoodi.github.io/posts/Release-Binaries" rel="alternate" type="text/html" title="Some Tips to Release Binaries" /><published>2025-01-20T00:00:00-05:00</published><updated>2025-01-20T00:00:00-05:00</updated><id>https://amirsojoodi.github.io/posts/Releases</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/Release-Binaries"><![CDATA[<p>There are several things to consider when releasing a binary. Here are some tips that I found useful.</p>

<h2 id="best-practices">Best Practices</h2>

<ol>
  <li>Releases are tied to <strong>tags</strong>, not branches.</li>
  <li>Versioning - Use <a href="https://semver.org/">Semantic Versioning</a>!</li>
  <li>Security - Sign the binaries and verify the signatures.</li>
  <li>Changelog - Keep a <a href="https://keepachangelog.com/en/1.0.0/">CHANGELOG.md</a> file.</li>
  <li>License - Include a <a href="https://choosealicense.com/">LICENSE</a> file.</li>
  <li>Documentation - Include a README.md file including how to install and use the binary.</li>
  <li>CI/CD - Automate the build, test, and the release process.</li>
</ol>

<h2 id="consider-signing-the-binary">Consider Signing the Binary</h2>

<p>Signing the binary is a good practice to ensure the integrity and authenticity of the binary. You can use GPG on Linux:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># If you don't already have a GPG key, generate one:</span>
<span class="nv">$ </span>gpg <span class="nt">--full-generate-key</span>
<span class="c"># Choose RSA and a key size (e.g., 4096).</span>
<span class="c"># Set an expiration date (or none for permanent).</span>
<span class="c"># Enter your name and email (match your GitHub email).</span>
</code></pre></div></div>

<p>Then, export the public key:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gpg <span class="nt">--armor</span> <span class="nt">--export</span> YOUR_EMAIL <span class="o">&gt;</span> public.key
<span class="c"># This will create a file named public.key.</span>
</code></pre></div></div>

<p>After that, sign the binary:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gpg <span class="nt">--detach-sign</span> <span class="nt">--armor</span> app-linux-x86_64.tar.gz
<span class="c"># a file named app-linux-x86_64.tar.gz.asc will be created.</span>

gpg <span class="nt">--detach-sign</span> <span class="nt">--armor</span> app-windows-x64.zip
<span class="c"># a file named app-windows-x64.zip.asc will be created.</span>
</code></pre></div></div>

<p>Then, you can upload the binaries and their signatures to GitHub releases.</p>

<ul>
  <li>app-linux-x86_64.tar.gz</li>
  <li>app-linux-x86_64.tar.gz.asc</li>
  <li>app-windows-x64.zip</li>
  <li>app-windows-x64.zip.asc</li>
</ul>

<p>Then, users can verify the binaries using the public key:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gpg <span class="nt">--import</span> public.key
<span class="nv">$ </span>gpg <span class="nt">--verify</span> app-linux-x86_64.tar.gz.asc app-linux-x86_64.tar.gz
<span class="nv">$ </span>gpg <span class="nt">--verify</span> app-windows-x64.zip.asc app-windows-x64.zip
<span class="c"># output should be sth like: "Good signature from YOUR_NAME &lt;YOUR_EMAIL&gt;"</span>
</code></pre></div></div>

<h2 id="automating-the-release-process-with-github-actions">Automating the Release Process with GitHub Actions</h2>

<p>Add a GitHub Action workflow (<code class="language-plaintext highlighter-rouge">.github/workflows/release.yml</code>).</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Release Binaries</span>

<span class="na">on</span><span class="pi">:</span>
  <span class="na">push</span><span class="pi">:</span>
    <span class="na">tags</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s1">'</span><span class="s">v*'</span>  <span class="c1"># Triggers on versioned tags like v1.0.0</span>

<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">build</span><span class="pi">:</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">$</span>
    <span class="na">strategy</span><span class="pi">:</span>
      <span class="na">matrix</span><span class="pi">:</span>
        <span class="na">os</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">ubuntu-latest</span><span class="pi">,</span> <span class="nv">windows-latest</span><span class="pi">]</span>

    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout code</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v3</span>

      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Set up build environment</span>
        <span class="na">run</span><span class="pi">:</span> <span class="s">echo "Setting up..."</span>

      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build</span>
        <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
          <span class="s"># Replace with your build commands</span>
          <span class="s">echo "Building binary for $"</span>

      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Upload binaries</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/upload-artifact@v3</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s">app-$</span>
          <span class="na">path</span><span class="pi">:</span> <span class="s">path/to/binary</span>

  <span class="na">release</span><span class="pi">:</span>
    <span class="na">needs</span><span class="pi">:</span> <span class="s">build</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Download artifacts</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/download-artifact@v3</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">path</span><span class="pi">:</span> <span class="s">./artifacts</span>

      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Create GitHub Release</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">softprops/action-gh-release@v2</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">files</span><span class="pi">:</span> <span class="s">./artifacts/**</span>
          <span class="na">tag_name</span><span class="pi">:</span> <span class="s">$</span>
</code></pre></div></div>

<h3 id="automating-the-signing-in-github-actions">Automating the Signing in GitHub Actions</h3>

<p>You can automate the signing process using GitHub Actions. Here is an example workflow:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Import GPG Key</span>
  <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
    <span class="s">echo "$GPG_PRIVATE_KEY" | gpg --batch --import</span>
    <span class="s">echo "trusted-key $(gpg --list-keys --with-colons | grep pub | cut -d: -f5)" &gt;&gt; ~/.gnupg/gpg.conf</span>

<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Sign Binary</span>
  <span class="na">run</span><span class="pi">:</span> <span class="s">gpg --detach-sign --armor myapp-linux-x86_64.tar.gz</span>
</code></pre></div></div>

<h3 id="alternative-to-gpg">Alternative to GPG</h3>

<p>If you’re distributing via containers or want an alternative to GPG, you can use <a href="https://github.com/sigstore/cosign">Sigstore Cosign</a>.</p>

<h3 id="code-signing-for-windows">Code Signing for Windows</h3>

<p>I haven’t done this myself, but you can sign the Windows binaries using <a href="https://docs.microsoft.com/en-us/windows/win32/seccrypto/signtool">SignTool</a>.</p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="GPG" /><category term="Linux" /><category term="Release" /><category term="Tips" /><summary type="html"><![CDATA[There are several things to consider when releasing a binary. Here are some tips that I found useful.]]></summary></entry><entry><title type="html">Utilize PTX Just-In-Time (JIT) Compilation in CUDA</title><link href="https://amirsojoodi.github.io/posts/JIT-PTX/" rel="alternate" type="text/html" title="Utilize PTX Just-In-Time (JIT) Compilation in CUDA" /><published>2024-10-22T00:00:00-04:00</published><updated>2024-10-22T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/JIT-PTX</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/JIT-PTX/"><![CDATA[<p>In this post I’ve written about how to utilize PTX Just-In-Time (JIT) compilation in CUDA. PTX is a low-level assembly-like language that is used to represent the GPU code. The PTX code is then compiled to the machine code by the NVIDIA driver at runtime. This process is called Just-In-Time (JIT) compilation. But before I write about how to use PTX JIT compilation, I’ll provide some background on why you might want to use it.</p>

<h2 id="background-scenario">Background Scenario</h2>

<p>In this scenario, you may want to load a CUDA kernel at runtime as a CUDA Module, then extract a CUDA Funtion from the kernel you wrote, and then get more information from it. Information like the number of register usage per thread, shared memory, etc. Or you may want to intelligently select the number of blocks/threads to optimize SM Occupancy in order to have valid inter-block synchronization capabilities via cooperative groups.</p>

<p>Furthermore, if the aforementioned kernel is defined in a separate <code class="language-plaintext highlighter-rouge">CU</code> file, and you are getting these information from a C or C++ code, then it makes more sense to use separate compilation to PTX or FATBIN files, and then load them at runtime.</p>

<h2 id="loading-the-cuda-module">Loading the CUDA module</h2>

<p>To create a PTX file or a FATBIN file, you can use the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a PTX file</span>
nvcc <span class="nt">-ptx</span> <span class="nt">-o</span> kernel.ptx kernel.cu
<span class="c"># Cre`ate a FATBIN file</span>
nvcc <span class="nt">-fatbin</span> <span class="nt">-o</span> kernel.fatbin kernel.cu
</code></pre></div></div>

<p>Then you can load the CUDA module at runtime with something like this (CUDA Driver API):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;cuda.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
  
  <span class="n">cuInit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>

  <span class="c1">// select the first device</span>
  <span class="n">CUdevice</span> <span class="n">device</span><span class="p">;</span>
  <span class="n">cuDeviceGet</span><span class="p">(</span><span class="o">&amp;</span><span class="n">device</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

  <span class="n">CUcontext</span> <span class="n">context</span><span class="p">;</span>
  <span class="n">cuCtxCreate</span><span class="p">(</span><span class="o">&amp;</span><span class="n">context</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">device</span><span class="p">);</span>

  <span class="n">CUmodule</span> <span class="n">module</span><span class="p">;</span>
  <span class="n">CUfunction</span> <span class="n">function</span><span class="p">;</span>
  <span class="n">CUresult</span> <span class="n">result</span><span class="p">;</span>

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleLoad</span><span class="p">(</span><span class="o">&amp;</span><span class="n">module</span><span class="p">,</span> <span class="s">"kernel.ptx"</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to load the module."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleGetFunction</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="s">"kernel"</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to get the function."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Do something with the function</span>
  <span class="c1">// ...</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then build the main source with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nvcc <span class="nt">-o</span> main main.cu <span class="nt">-lcuda</span>
</code></pre></div></div>

<h3 id="thoughts-and-improvements">Thoughts and Improvements</h3>

<p>All in all, it’s a straightforward process. However, there are some issues:</p>

<ul>
  <li>The build process is a bit complicated, especially if you are using CMake.</li>
  <li>The FATBIN/PTX file should be addressed correctly, and it is not always preferable.</li>
</ul>

<p>So, why not storing the PTX code directly in the source file itself? This way, you can avoid the build process and the file management.</p>

<p>Just a reminder, <strong>the JIT Process doesn’t accept the actual CUDA code, but the PTX code</strong>. So, you need to convert the CUDA code to PTX code first. (I spent 2 hours to understand this!) Here is how you can do it:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;cuda.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="c1">// PTX code generated by:</span>
<span class="c1">// nvcc -ptx -o kernel.ptx kernel.cu</span>
<span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">kernel</span> <span class="o">=</span> <span class="s">R"(
  .version 6.5
  .target sm_70
  .address_size 64

  .visible .entry kernel(
    .param .u64 kernel_param_0
  )
  {
    // Kernel code here
  }
)"</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
  
  <span class="n">cuInit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>

  <span class="c1">// select the first device</span>
  <span class="n">CUdevice</span> <span class="n">device</span><span class="p">;</span>
  <span class="n">cuDeviceGet</span><span class="p">(</span><span class="o">&amp;</span><span class="n">device</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

  <span class="n">CUcontext</span> <span class="n">context</span><span class="p">;</span>
  <span class="n">cuCtxCreate</span><span class="p">(</span><span class="o">&amp;</span><span class="n">context</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">device</span><span class="p">);</span>

  <span class="n">CUmodule</span> <span class="n">module</span><span class="p">;</span>
  <span class="n">CUfunction</span> <span class="n">function</span><span class="p">;</span>
  <span class="n">CUresult</span> <span class="n">result</span><span class="p">;</span>

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleLoadData</span><span class="p">(</span><span class="o">&amp;</span><span class="n">module</span><span class="p">,</span> <span class="n">kernel</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to load the module."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleGetFunction</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="s">"kernel"</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to get the function."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Do something with the function</span>
  <span class="c1">// ...</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="add-error-management">Add Error Management</h3>

<p>Sometimes, the PTX code may contain errors, or the process fails for some reasons. To handle this, you can use <code class="language-plaintext highlighter-rouge">CUjit_option</code> get more information about the error. Here is an example:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;cuda.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="c1">// PTX code generated by:</span>
<span class="c1">// nvcc -ptx -o kernel.ptx kernel.cu</span>
<span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">kernel</span> <span class="o">=</span> <span class="s">R"(
  .version 6.5
  .target sm_70
  .address_size 64

  .visible .entry kernel(
    .param .u64 kernel_param_0
  )
  {
    // Kernel code here
  }
)"</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>

  <span class="n">cuInit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>

  <span class="c1">// select the first device</span>
  <span class="n">CUdevice</span> <span class="n">device</span><span class="p">;</span>
  <span class="n">cuDeviceGet</span><span class="p">(</span><span class="o">&amp;</span><span class="n">device</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

  <span class="n">CUcontext</span> <span class="n">context</span><span class="p">;</span>
  <span class="n">cuCtxCreate</span><span class="p">(</span><span class="o">&amp;</span><span class="n">context</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">device</span><span class="p">);</span>

  <span class="n">CUmodule</span> <span class="n">module</span><span class="p">;</span>
  <span class="n">CUfunction</span> <span class="n">function</span><span class="p">;</span>
  <span class="n">CUresult</span> <span class="n">result</span><span class="p">;</span>

  <span class="kt">int</span> <span class="n">logBufferSize</span> <span class="o">=</span> <span class="mi">1024</span><span class="p">;</span>
  <span class="kt">char</span> <span class="n">infoLogBuffer</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
  <span class="kt">char</span> <span class="n">errorLogBuffer</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
  <span class="n">CUjit_option</span> <span class="n">options</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES</span><span class="p">,</span> <span class="n">CU_JIT_INFO_LOG_BUFFER</span><span class="p">,</span>
                            <span class="n">CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES</span><span class="p">,</span> <span class="n">CU_JIT_ERROR_LOG_BUFFER</span><span class="p">};</span>
  <span class="kt">void</span><span class="o">*</span> <span class="n">optionValues</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{(</span><span class="kt">void</span><span class="o">*</span><span class="p">)(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">logBufferSize</span><span class="p">,</span> <span class="n">infoLogBuffer</span><span class="p">,</span>
                          <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">logBufferSize</span><span class="p">,</span> <span class="n">errorLogBuffer</span><span class="p">};</span> 

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleLoadDataEx</span><span class="p">(</span><span class="o">&amp;</span><span class="n">module</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">options</span><span class="p">,</span> <span class="n">optionValues</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to load the module."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"CUDA Driver API error = "</span> <span class="o">&lt;&lt;</span> <span class="n">result</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Info Log: "</span> <span class="o">&lt;&lt;</span> <span class="n">infoLogBuffer</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Error Log: "</span> <span class="o">&lt;&lt;</span> <span class="n">errorLogBuffer</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="n">result</span> <span class="o">=</span> <span class="n">cuModuleGetFunction</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="n">module</span><span class="p">,</span> <span class="s">"kernel"</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">!=</span> <span class="n">CUDA_SUCCESS</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to get the function."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Do something with the function</span>
  <span class="c1">// ...</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>One last suggestion:
If your kernel is a simple kernel not requiring any special optimization, you can create the PTX code for an old architecture like <code class="language-plaintext highlighter-rouge">sm_50</code>, so that it can be used on any GPU (well, most of them!).</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nvcc <span class="nt">-ptx</span> <span class="nt">-o</span> kernel.ptx <span class="nt">-arch</span><span class="o">=</span>sm_50 kernel.cu
</code></pre></div></div>

<p>P.S. Pay attention to the entry <code class="language-plaintext highlighter-rouge">.version 6.5</code> in the PTX code. If your target system’s PTX assembler is old, you’ll get runtime error. You may want to edit that field manually, as I didn’t find a way to set it automatically.</p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Programming" /><category term="CUDA" /><category term="JIT" /><category term="PTX" /><category term="FATBIN" /><summary type="html"><![CDATA[In this post I’ve written about how to utilize PTX Just-In-Time (JIT) compilation in CUDA. PTX is a low-level assembly-like language that is used to represent the GPU code. The PTX code is then compiled to the machine code by the NVIDIA driver at runtime. This process is called Just-In-Time (JIT) compilation. But before I write about how to use PTX JIT compilation, I’ll provide some background on why you might want to use it.]]></summary></entry><entry><title type="html">Collection of my dotfiles (public version)</title><link href="https://amirsojoodi.github.io/posts/dotfiles" rel="alternate" type="text/html" title="Collection of my dotfiles (public version)" /><published>2024-10-15T00:00:00-04:00</published><updated>2024-10-15T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/Dotfiles</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/dotfiles"><![CDATA[<p>This is a collection of my dotfiles that I use on my Linux system. I have written about them in various posts, but I thought I can put them all in one place. You can checkout the github repository <a href="https://github.com/amirsojoodi/dotfiles-public">here</a>, too.</p>

<h2 id="file-tree">File Tree</h2>

<p>Click on each file to see the content.</p>

<ol>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.bashrc">.bashrc</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.clang-format">.clang-format</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.git-prompt.sh">.git-prompt.sh</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.gitconfig">.gitconfig</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.inputrc">.inputrc</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.ssh/config">config</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.tmux.conf">.tmux.conf</a></li>
  <li><a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.vimrc">.vimrc</a></li>
  <li>Vscode <a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/.vscode/sftp.json">sftp.json</a></li>
  <li>Latex <a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/Latex/defaultSettings.yaml">defaultSettings.yaml</a></li>
  <li>Latex <a href="https://github.com/amirsojoodi/dotfiles-public/blob/main/Latex/indentconfig.yaml">indentconfig.yaml</a></li>
</ol>

<h2 id="some-notes">Some notes</h2>

<table>
  <thead>
    <tr>
      <th>File</th>
      <th>Location</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.bashrc</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Bash configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.clang-format</code></td>
      <td><code class="language-plaintext highlighter-rouge">/path/to/a/project/</code></td>
      <td>Clang-format configuration file for clang-format</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.git-prompt.sh</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Git prompt configuration file which is used in <code class="language-plaintext highlighter-rouge">.bashrc</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.gitconfig</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Git configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.inputrc</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Inputrc configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.ssh/config</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/.ssh/</code></td>
      <td>SSH configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.tmux.conf</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Tmux configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.vimrc</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Vim configuration file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">.vscode/sftp.json</code></td>
      <td><code class="language-plaintext highlighter-rouge">/usually/a/repo/.vscode/</code></td>
      <td>A configuration file for VSCode SFTP extension</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Latex/defaultSettings.yaml</code></td>
      <td><code class="language-plaintext highlighter-rouge">/the/path/in/indentconfig.yaml</code></td>
      <td>Default settings for latexindent program</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Latex/indentconfig.yaml</code></td>
      <td><code class="language-plaintext highlighter-rouge">~/</code></td>
      <td>Configuration file for latexindent program</td>
    </tr>
  </tbody>
</table>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Bash" /><category term="Git" /><category term="Linux" /><category term="SSH" /><category term="Tmux" /><category term="Vim" /><category term="VSCode" /><summary type="html"><![CDATA[This is a collection of my dotfiles that I use on my Linux system. I have written about them in various posts, but I thought I can put them all in one place. You can checkout the github repository here, too.]]></summary></entry><entry><title type="html">Tmux Tips and Tricks</title><link href="https://amirsojoodi.github.io/posts/Tmux-Tips-and-Tricks" rel="alternate" type="text/html" title="Tmux Tips and Tricks" /><published>2024-09-27T00:00:00-04:00</published><updated>2024-09-27T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/tmux</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/Tmux-Tips-and-Tricks"><![CDATA[<p>(If you are already familiar with <code class="language-plaintext highlighter-rouge">tmux</code>, and you are looking for a simple <code class="language-plaintext highlighter-rouge">.tmux.conf</code>, go to the end of this post.)</p>

<p>It’s been a long time since I wanted to start using <code class="language-plaintext highlighter-rouge">tmux</code> to see how it could be helpful, but Honestly, I haven’t felt the need to use it until just recently, when I had to run a long-running process on a remote server, and I didn’t want to keep my terminal open all the time. So, I decided to give <code class="language-plaintext highlighter-rouge">tmux</code> a try. And I’m glad I did! It’s a powerful tool that can help you manage multiple terminal sessions in a single window, and more. Here are some tips and tricks that I found useful.</p>

<p>For getting started, you can check tmux <a href="https://github.com/tmux/tmux/wiki/Getting-Started">official documentation</a>. Also, this one is a good <a href="https://tmuxcheatsheet.com/">cheat sheet</a>. For the <em>aweseome</em> list of <code class="language-plaintext highlighter-rouge">tmux</code> resources, you can check <a href="https://github.com/rothgar/awesome-tmux">this</a>.</p>

<h2 id="about-tmux">About <code class="language-plaintext highlighter-rouge">tmux</code></h2>

<p><code class="language-plaintext highlighter-rouge">tmux</code> is a terminal multiplexer that allows you to run multiple terminal sessions in a single window. It’s similar to <code class="language-plaintext highlighter-rouge">screen</code>, but it has some additional features that make it more powerful and easier to use. With <code class="language-plaintext highlighter-rouge">tmux</code>, you can create multiple windows and panes, detach from a session and reattach later, and share sessions with other users.</p>

<p><code class="language-plaintext highlighter-rouge">tmux</code> handles everything by a server-client model. The server runs in the background and manages all the sessions, windows, and panes. The client is the terminal that you interact with. You can have multiple clients connected to the same server.</p>

<p>To install <code class="language-plaintext highlighter-rouge">tmux</code> on your system:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Ubuntu</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>tmux
</code></pre></div></div>

<h2 id="using-tmux-basics">Using tmux basics</h2>

<p>I have provided some basic commands in this section, but in the following sections, I have provided the shortcuts and commands that you can use inside <code class="language-plaintext highlighter-rouge">tmux</code>. Also, I skipped writing about <code class="language-plaintext highlighter-rouge">tmux commands</code> in the status bar (similar to <code class="language-plaintext highlighter-rouge">vim</code> commands). If you want to see those, visit the cheat sheet I mentioned above. However, you can enter <code class="language-plaintext highlighter-rouge">tmux</code> command mode by pressing <code class="language-plaintext highlighter-rouge">Ctrl+b :</code>.</p>

<p>One useful command is <code class="language-plaintext highlighter-rouge">: set mouse on</code> to enable the mouse in <code class="language-plaintext highlighter-rouge">tmux</code>. You can enable it for all the sessions by adding <code class="language-plaintext highlighter-rouge">-g</code> to the command. (<code class="language-plaintext highlighter-rouge">tmux set -g mouse on</code>). To disable mouse, you can use <code class="language-plaintext highlighter-rouge">: set mouse off</code>.</p>

<p>Ok, back to the basics:</p>

<ul>
  <li>List all sessions with <code class="language-plaintext highlighter-rouge">tmux ls</code></li>
  <li>Start a new session with a name: <code class="language-plaintext highlighter-rouge">tmux new -s mySession</code></li>
  <li>Attach to a session:</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tmux attach <span class="nt">-t</span> mySession
<span class="c"># or short</span>
tmux a <span class="nt">-t</span> mySession
<span class="c"># Attach to the last session</span>
tmux a
</code></pre></div></div>

<ul>
  <li>Detach from a session running <code class="language-plaintext highlighter-rouge">tmux detach</code> or pressing <code class="language-plaintext highlighter-rouge">Ctrl+b d</code> in the session.</li>
  <li>Kill a session with <code class="language-plaintext highlighter-rouge">tmux kill-session -t mySession</code>.</li>
</ul>

<h2 id="inside-tmux">Inside tmux</h2>

<p>All commands in <code class="language-plaintext highlighter-rouge">tmux</code> start with <code class="language-plaintext highlighter-rouge">Ctrl+b</code> followed by another key (Unless the prefix key is changed in the configuration file). To see the full list of commands, you can press <code class="language-plaintext highlighter-rouge">Ctrl+b</code> followed by <code class="language-plaintext highlighter-rouge">?</code>. The documents use the format <code class="language-plaintext highlighter-rouge">C-b &lt;command&gt;</code> to represent the <code class="language-plaintext highlighter-rouge">Ctrl+b</code> key combination. Also, <code class="language-plaintext highlighter-rouge">M-&lt;key&gt;</code> represents the <code class="language-plaintext highlighter-rouge">Alt+&lt;key&gt;</code> combination, and <code class="language-plaintext highlighter-rouge">S-&lt;key&gt;</code> shows the <code class="language-plaintext highlighter-rouge">Shift+&lt;key&gt;</code> combination.</p>

<h2 id="customizing-tmux">Customizing tmux</h2>

<p>One of the beauties of <code class="language-plaintext highlighter-rouge">tmux</code> is that you can customize it very easily. You can create a <code class="language-plaintext highlighter-rouge">.tmux.conf</code> file in your home directory and add your custom configurations, from changing the default key bindings to setting the colors of the status bar. Here is a simple <code class="language-plaintext highlighter-rouge">.tmux.conf</code> file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Set the prefix key to Ctrl+a -&gt; people usually use this key binding</span>
<span class="nb">set</span> <span class="nt">-g</span> prefix C-a
unbind C-b
<span class="nb">bind </span>C-a send-prefix

<span class="c"># Start windows and panes at 1, not 0</span>
<span class="c"># Why you ask?! because 0 is pretty far away on the keyboard</span>
<span class="nb">set</span> <span class="nt">-g</span> base-index 1
setw <span class="nt">-g</span> pane-base-index 1

<span class="c"># Enable mouse support</span>
<span class="nb">set</span> <span class="nt">-g</span> mouse on

<span class="c"># Set the status bar colors</span>
<span class="nb">set</span> <span class="nt">-g</span> status-bg black
<span class="nb">set</span> <span class="nt">-g</span> status-fg white

<span class="c"># Set the splitting commands, | for horizontal and - for vertical</span>
<span class="nb">bind</span> | split-window <span class="nt">-h</span> <span class="nt">-c</span> <span class="s2">"#{pane_current_path}"</span>
<span class="nb">bind</span> - split-window <span class="nt">-v</span> <span class="nt">-c</span> <span class="s2">"#{pane_current_path}"</span>
unbind <span class="s1">'"'</span>
unbind %

<span class="c"># open new windows in the current path</span>
<span class="nb">bind </span>c new-window <span class="nt">-c</span> <span class="s2">"#{pane_current_path}"</span>

<span class="c"># reload config file</span>
<span class="nb">bind </span>r source-file ~/.tmux.conf

<span class="c"># Use Alt-arrow keys without prefix key to switch panes</span>
<span class="nb">bind</span> <span class="nt">-n</span> M-Left <span class="k">select</span><span class="nt">-pane</span> <span class="nt">-L</span>
<span class="nb">bind</span> <span class="nt">-n</span> M-Right <span class="k">select</span><span class="nt">-pane</span> <span class="nt">-R</span>
<span class="nb">bind</span> <span class="nt">-n</span> M-Up <span class="k">select</span><span class="nt">-pane</span> <span class="nt">-U</span>
<span class="nb">bind</span> <span class="nt">-n</span> M-Down <span class="k">select</span><span class="nt">-pane</span> <span class="nt">-D</span>

<span class="c"># set default terminal mode to 256 colors</span>
<span class="nb">set</span> <span class="nt">-g</span> default-terminal <span class="s2">"xterm-256color"</span>
<span class="nb">set</span> <span class="nt">-ga</span> terminal-overrides <span class="s2">",xterm-256color:Tc"</span>

<span class="c"># Visual bell, no sounds</span>
<span class="nb">set</span> <span class="nt">-g</span> visual-bell on
<span class="nb">set</span> <span class="nt">-g</span> bell-action none
</code></pre></div></div>

<h3 id="managing-tmux-windows-and-sessions">Managing tmux windows and sessions</h3>

<p><strong>After</strong> updating the <code class="language-plaintext highlighter-rouge">.tmux.conf</code> file, follow the table below:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a w</code></td>
      <td>List all windows</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a w 0-9</code></td>
      <td>Switch to window 0-9</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a c</code></td>
      <td>Create a new window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a &amp;</code></td>
      <td>Close the current window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a n</code></td>
      <td>Move to the next window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a p</code></td>
      <td>Move to the previous window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a ,</code></td>
      <td>Rename the current window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a -</code></td>
      <td>Split the window vertically</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a \|</code></td>
      <td>Split the window horizontally</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a d</code></td>
      <td>Detach from the session</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a s</code></td>
      <td>List all sessions</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a $</code></td>
      <td>Rename the current session</td>
    </tr>
  </tbody>
</table>

<h3 id="moving-and-resizing-panes">Moving and resizing panes</h3>

<p><strong>After</strong> updating the <code class="language-plaintext highlighter-rouge">.tmux.conf</code> file to the content above, the commands from the following table can be used:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a o</code></td>
      <td>Move to the next pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a ;</code></td>
      <td>Move to the last active pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a q</code></td>
      <td>Show pane numbers</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a q &lt;number&gt;</code></td>
      <td>Activate the pane with the specified number</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a x</code></td>
      <td>Kill the current pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a z</code></td>
      <td>Zoom in/out the current pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a !</code></td>
      <td>Move the current pane to a new window</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Up</code></td>
      <td>Switch to the pane above the active pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Alt+Up</code></td>
      <td>Switch to the pane above the active pane (another way)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Down</code></td>
      <td>Switch to the pane below the active pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Alt+Down</code></td>
      <td>Switch to the pane below the active pane (another way)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Left</code></td>
      <td>Switch to the pane on the left of the active pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Alt+Left</code></td>
      <td>Switch to the pane on the left of the active pane (another way)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Right</code></td>
      <td>Switch to the pane on the right of the active pane</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Alt+Right</code></td>
      <td>Switch to the pane on the right of the active pane (another way)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a }</code></td>
      <td>Move the current pane right</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a {</code></td>
      <td>Move the current pane left</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Ctrl+Up</code></td>
      <td>Resize the current pane up</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Ctrl+Down</code></td>
      <td>Resize the current pane down</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Ctrl+Left</code></td>
      <td>Resize the current pane left</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+a Ctrl+Right</code></td>
      <td>Resize the current pane right</td>
    </tr>
  </tbody>
</table>

<h2 id="automatic-startup">Automatic Startup</h2>

<p>You can start <code class="language-plaintext highlighter-rouge">tmux</code> automatically when you open a terminal. To do this, add the following line to your <code class="language-plaintext highlighter-rouge">.bashrc</code> file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start tmux if not already running</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$TMUX</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    </span>tmux attach <span class="o">||</span> tmux new
<span class="k">fi</span>
</code></pre></div></div>

<p>If you want to start <code class="language-plaintext highlighter-rouge">tmux</code> with a specific session, you can use the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start tmux with a specific session</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$TMUX</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    </span>tmux attach <span class="nt">-t</span> mySession <span class="o">||</span> tmux new <span class="nt">-s</span> mySession
<span class="k">fi</span>
</code></pre></div></div>

<h2 id="fun-part">Fun Part</h2>

<p>Now for the fun part, I have created some aliases to make it easier to manage <code class="language-plaintext highlighter-rouge">tmux</code> and its sessions. You can add these aliases to the <code class="language-plaintext highlighter-rouge">.bashrc</code> file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># tmux aliases</span>
<span class="nb">alias </span><span class="nv">ta</span><span class="o">=</span><span class="s1">'tmux attach -t'</span>
<span class="nb">alias </span><span class="nv">tk</span><span class="o">=</span><span class="s1">'tmux kill-session -t'</span>
<span class="nb">alias </span><span class="nv">tls</span><span class="o">=</span><span class="s1">'tmux ls'</span>
<span class="nb">alias </span><span class="nv">tn</span><span class="o">=</span><span class="s1">'tmux new -s'</span>
<span class="nb">alias </span><span class="nv">ttop</span><span class="o">=</span><span class="s1">'tmux attach -t top || tmux new -s top "top"'</span>

<span class="c"># managing ssh sessions to remote servers</span>
<span class="c"># First check if the session exists, if not create a new one and ssh to the server</span>
<span class="c"># You can then later split the window and do whatever you want with the session, then detach and reattach later</span>
<span class="nb">alias </span><span class="nv">tpprl</span><span class="o">=</span><span class="s1">'tmux attach -t pprl || tmux new -s pprl "ssh pprl"'</span>
<span class="nb">alias </span><span class="nv">tmist</span><span class="o">=</span><span class="s1">'tmux attach -t mist || tmux new -s mist "ssh mist"'</span>
<span class="nb">alias </span><span class="nv">tnarval</span><span class="o">=</span><span class="s1">'tmux attach -t narval || tmux new -s narval "ssh narval"'</span>
<span class="nb">alias </span><span class="nv">tbeluga</span><span class="o">=</span><span class="s1">'tmux attach -t beluga || tmux new -s beluga "ssh beluga"'</span>
<span class="nb">alias </span><span class="nv">tgraham</span><span class="o">=</span><span class="s1">'tmux attach -t graham || tmux new -s graham "ssh graham"'</span>

<span class="c"># List all tmux sessions and windows at login</span>
<span class="nb">echo</span> <span class="s2">"Available tmux sessions:"</span>
tmux <span class="nb">ls</span>
</code></pre></div></div>

<p><strong>Another fun usecase:</strong> You may also combine the tmux session management with ssh control path to avoid getting 2FA prompts every time you connect to a server. You can check Ali’s <a href="https://alifara.codeberg.page/posts/compute-canada-2fa/">post</a> to understand the process.</p>

<h2 id="references">References</h2>

<ul>
  <li>A nice tmux conf <a href="https://github.com/hamvocke/dotfiles/blob/master/tmux/.tmux.conf">file</a></li>
  <li>Tmux official <a href="https://github.com/tmux/tmux/wiki/Getting-Started">guide</a></li>
  <li>Tmux <a href="https://tmuxcheatsheet.com/">cheat sheet</a></li>
  <li>Awesome tmux <a href="https://github.com/rothgar/awesome-tmux">resources</a></li>
  <li>Oh my <a href="https://github.com/gpakosz/.tmux">tmux</a>!</li>
</ul>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Linux" /><category term="Bash" /><category term="Tmux" /><category term="Tips" /><summary type="html"><![CDATA[(If you are already familiar with tmux, and you are looking for a simple .tmux.conf, go to the end of this post.)]]></summary></entry><entry><title type="html">Setup LAMMPS</title><link href="https://amirsojoodi.github.io/posts/LAMMPS/" rel="alternate" type="text/html" title="Setup LAMMPS" /><published>2024-09-09T00:00:00-04:00</published><updated>2024-09-09T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/LAMMPS</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/LAMMPS/"><![CDATA[<p>Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code that can be used to model atoms or, more generally, as a parallel particle simulator at various scales. The complete documentation of LAMMPS can be found <a href="https://docs.lammps.org/">here</a>. In this post, I will provide a guide on how to setup LAMMPS on a Linux machine. My setup is on a cluster with NVIDIA GPUs, UCX, and OpenMPI. Also, we have a built-in module system to load the necessary modules.</p>

<h2 id="prepare-the-environment">Prepare the Environment</h2>

<p>Prerequisites:</p>

<ul>
  <li>Git</li>
  <li>CMake</li>
  <li>An MPI library, like <a href="https://www.open-mpi.org/">OpenMPI</a></li>
  <li>For NVIDIA GPU support, <a href="https://developer.nvidia.com/cuda-toolkit">CUDA Toolkit</a> is needed.</li>
</ul>

<p>While this step maybe different in various scenarios, I have the following environment variables set:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#! /bin/bash</span>
module <span class="nt">--force</span> purge
module load cuda

<span class="c"># If the argument is "builtin" then load the builtin modules, otherwise don't load any other modules</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"$#"</span> <span class="nt">-eq</span> 1 <span class="o">]</span> <span class="o">&amp;&amp;</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"builtin"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"using builtin modules"</span>
  module load ucx
  module load openmpi
  module list
  <span class="nb">echo</span> <span class="s2">"Built-in modules loaded"</span>
  <span class="k">return
fi

</span><span class="nb">echo</span> <span class="s2">"No additional modules loaded"</span>
module list

<span class="c"># If no argument is passed, set the root dir to the current directory,</span>
<span class="c"># else set it to the passed argument</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"$#"</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">ROOT_DIR</span><span class="o">=</span><span class="si">$(</span><span class="nb">pwd</span><span class="si">)</span>
<span class="k">else
  </span><span class="nb">export </span><span class="nv">ROOT_DIR</span><span class="o">=</span><span class="nv">$1</span>
<span class="k">fi

</span><span class="nb">export </span><span class="nv">BUILD_DIR</span><span class="o">=</span><span class="nv">$ROOT_DIR</span>/build

<span class="c">################### Some checks ###################</span>

<span class="c"># Check if LDFLAGS is bound or not</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="k">${</span><span class="nv">LDFLAGS</span><span class="p">+x</span><span class="k">}</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">""</span>
<span class="k">fi</span>

<span class="c"># Same with LD_RUN_PATH</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="k">${</span><span class="nv">LD_RUN_PATH</span><span class="p">+x</span><span class="k">}</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">LD_RUN_PATH</span><span class="o">=</span><span class="s2">""</span>
<span class="k">fi</span>

<span class="c"># Same with CXXFLAGS</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="k">${</span><span class="nv">CXXFLAGS</span><span class="p">+x</span><span class="k">}</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">CXXFLAGS</span><span class="o">=</span><span class="s2">""</span>
<span class="k">fi</span>

<span class="c"># Same with LIBRARY_PATH</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="k">${</span><span class="nv">LIBRARY_PATH</span><span class="p">+x</span><span class="k">}</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="s2">""</span>
<span class="k">fi</span>

<span class="c"># Same with LD_LIBRARY_PATH</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="k">${</span><span class="nv">LD_LIBRARY_PATH</span><span class="p">+x</span><span class="k">}</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="s2">""</span>
<span class="k">fi</span>

<span class="c">################### CUDA Configurations ###################</span>

<span class="c"># CUDA Configurations (mostly needed to build OpenMPI and UCX)</span>
<span class="nb">export </span><span class="nv">NVCC</span><span class="o">=</span><span class="si">$(</span>which nvcc<span class="si">)</span>
<span class="nb">export </span><span class="nv">CUDA_LIB</span><span class="o">=</span><span class="nv">$CUDA_HOME</span>/lib64/stubs

<span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="nv">$CUDA_HOME</span>/lib64/:<span class="nv">$LD_LIBRARY_PATH</span>
<span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="nv">$CUDA_LIB</span>:<span class="nv">$LD_LIBRARY_PATH</span>

<span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="nv">$CUDA_HOME</span>/lib64/:<span class="nv">$LIBRARY_PATH</span>
<span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="nv">$CUDA_LIB</span>:<span class="nv">$LIBRARY_PATH</span>

<span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L</span><span class="nv">$CUDA_LIB</span><span class="s2"> -L</span><span class="nv">$CUDA_HOME</span><span class="s2">/lib64 </span><span class="nv">$LDFLAGS</span><span class="s2">"</span>
<span class="nb">export </span><span class="nv">CPATH</span><span class="o">=</span><span class="nv">$CUDA_HOME</span>/include:<span class="nv">$CPATH</span>
<span class="nb">export </span><span class="nv">LD_RUN_PATH</span><span class="o">=</span><span class="nv">$CUDA_LIB</span>:<span class="nv">$LD_RUN_PATH</span>
<span class="nb">export </span><span class="nv">CUDA_LDFLAGS</span><span class="o">=</span><span class="s2">"-lcuda -lcudart -lcudadevrt -lnvidia-ml -L</span><span class="nv">$CUDA_LIB</span><span class="s2">"</span>

<span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="nv">$BUILD_DIR</span>/lib:<span class="nv">$LD_LIBRARY_PATH</span>
<span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="nv">$BUILD_DIR</span>/lib:<span class="nv">$LIBRARY_PATH</span>
<span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L</span><span class="nv">$BUILD_DIR</span><span class="s2">/lib </span><span class="nv">$LDFLAGS</span><span class="s2">"</span>

<span class="nb">export </span><span class="nv">CPATH</span><span class="o">=</span><span class="nv">$BUILD_DIR</span>/include:<span class="nv">$CPATH</span>
<span class="nb">export </span><span class="nv">LD_RUN_PATH</span><span class="o">=</span><span class="nv">$BUILD_DIR</span>/lib:<span class="nv">$LD_RUN_PATH</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$BUILD_DIR</span>/bin/:<span class="nv">$PATH</span>

<span class="c"># Now UCX and OpenMPI can be built</span>
</code></pre></div></div>

<p>I have skipped the UCX and OpenMPI configurations, but you can find them in my previous posts or in their official documentation.</p>

<h2 id="build-lammps">Build LAMMPS</h2>

<p>The following script clones the LAMMPS repository, builds it, and runs some benchmarks. For more information about the available packages, you can check the <a href="https://docs.lammps.org/Build_package.html">LAMMPS documentation</a>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nb">set</span> <span class="nt">-eux</span>

<span class="nb">export </span><span class="nv">OPENMPI_DIR</span><span class="o">=</span><span class="s2">"/path/to/openmpi"</span>
<span class="nb">source</span> <span class="s2">"the_above_script.sh"</span> <span class="nv">$OPENMPI_DIR</span>

<span class="c"># Check the paths of the executables</span>
which nvcc
which mpicc
which mpirun

<span class="nb">export </span><span class="nv">LAMMPS_DIR</span><span class="o">=</span><span class="s2">"/path/to/lammps"</span>

<span class="c"># Perform a clean clone</span>
<span class="nb">rm</span> <span class="nt">-rf</span> <span class="nv">$LAMMPS_DIR</span>
git clone <span class="nt">--depth</span><span class="o">=</span>1 <span class="nt">-b</span> release https://github.com/lammps/lammps.git <span class="nv">$LAMMPS_DIR</span>
<span class="nb">cd</span> <span class="nv">$LAMMPS_DIR</span>

<span class="c"># Build LAMMPS</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$LAMMPS_DIR</span>/build
<span class="nb">cd</span> <span class="nv">$LAMMPS_DIR</span>/build

cmake <span class="nt">-D</span> <span class="nv">CMAKE_BUILD_TYPE</span><span class="o">=</span>Release <span class="nt">-D</span> <span class="nv">CMAKE_INSTALL_PREFIX</span><span class="o">=</span><span class="nv">$LAMMPS_DIR</span>/build <span class="se">\</span>
  <span class="nt">-D</span> <span class="nv">PKG_KSPACE</span><span class="o">=</span>1 <span class="nt">-D</span> <span class="nv">PKG_MOLECULE</span><span class="o">=</span>1 <span class="nt">-D</span> <span class="nv">PKG_RIGID</span><span class="o">=</span>1 <span class="nt">-D</span> <span class="nv">PKG_MANYBODY</span><span class="o">=</span>1 <span class="se">\</span>
  <span class="nt">-D</span> <span class="nv">CMAKE_CXX_FLAGS</span><span class="o">=</span><span class="nt">-DCUDA_PROXY</span> <span class="nt">-D</span> <span class="nv">BUILD_MPI</span><span class="o">=</span>1 <span class="nt">-D</span> <span class="nv">PKG_GPU</span><span class="o">=</span>1 <span class="nt">-D</span> <span class="nv">GPU_API</span><span class="o">=</span>CUDA <span class="se">\</span>
  <span class="nt">-D</span> <span class="nv">CUDA_MPS_SUPPORT</span><span class="o">=</span>1 <span class="nv">$LAMMPS_DIR</span>/cmake

cmake <span class="nt">--build</span> <span class="nb">.</span> <span class="nt">--parallel</span> 32

<span class="c"># Some tests</span>
<span class="c"># mpirun -n 8 --mca pml ucx -x UCX_TLS=sm,cuda_copy,cuda_ipc --mca btl ^vader,tcp,openib \</span>
<span class="c">#   --mca coll ^hcoll ../lammps/build/lmp -sf gpu -pk gpu 4 -in ../lammps/bench/in.eam</span>
<span class="c"># mpirun -n 8 --mca pml ucx -x UCX_TLS=sm,cuda_copy,cuda_ipc --mca btl ^vader,tcp,openib \</span>
<span class="c">#   --mca coll ^hcoll ../lammps/build/lmp -sf gpu -pk gpu 1 -in ../lammps/bench/in.chain</span>
<span class="c"># mpirun -n 32 --mca pml ucx -x UCX_TLS=sm,cuda_copy,cuda_ipc --mca btl ^vader,tcp,openib \</span>
<span class="c">#   --mca coll ^hcoll ../lammps/build/lmp -sf gpu -pk gpu 4 -in ../lammps/bench/in.lj</span>
</code></pre></div></div>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Programming" /><category term="MPI" /><category term="CUDA" /><category term="LAMMPS" /><summary type="html"><![CDATA[Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code that can be used to model atoms or, more generally, as a parallel particle simulator at various scales. The complete documentation of LAMMPS can be found here. In this post, I will provide a guide on how to setup LAMMPS on a Linux machine. My setup is on a cluster with NVIDIA GPUs, UCX, and OpenMPI. Also, we have a built-in module system to load the necessary modules.]]></summary></entry><entry><title type="html">MPS on Multi-Instance GPU</title><link href="https://amirsojoodi.github.io/posts/MPS+MIG/" rel="alternate" type="text/html" title="MPS on Multi-Instance GPU" /><published>2024-08-28T00:00:00-04:00</published><updated>2024-08-28T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/MPS-on-MIG</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/MPS+MIG/"><![CDATA[<p>In previous posts, (<a href="https://amirsojoodi.github.io/posts/Enabling-MPS/">MPS</a> and <a href="https://amirsojoodi.github.io/posts/MIG/">MIG</a>), I have explained how to enable MPS and MIG on NVIDIA GPUs. In this post, I will explain how to use both technologies at the same time. In more detail, I would like to enable MPS on all of the MIG instances. For more information, you can refer to the <a href="https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html">NVIDIA document</a>.</p>

<h2 id="enabling-mps-on-mig">Enabling MPS on MIG</h2>

<p>I assume that you have already enabled MIG on your GPU(s). If not, please refer to the previous posts. As stated in the <a href="https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html">NVIDIA document</a>, the steps for configuring MPS on MIG is as follows:</p>

<ul>
  <li>Configure the desired MIG geometry on the GPU.</li>
  <li>Setup the <code class="language-plaintext highlighter-rouge">CUDA_MPS_PIPE_DIRECTORY</code> variable to point to unique directories so that the multiple MPS servers and clients can communicate with each other using named pipes and Unix domain sockets.</li>
  <li>Launch the application by specifying the MIG device using <code class="language-plaintext highlighter-rouge">CUDA_VISIBLE_DEVICES</code>. , &lt;– This one might be unnecessary if you point to the correct MPS server using <code class="language-plaintext highlighter-rouge">CUDA_MPS_PIPE_DIRECTORY</code>.</li>
</ul>

<p>To enable MPS on MIG, I wrote a simple script that does the above steps. The script is as follows:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nb">set</span> <span class="nt">-eux</span>

<span class="c"># GPU_UUIDs=($(nvidia-smi -L | grep -oE "(GPU|MIG)-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*"))</span>
<span class="nv">GPU_UUIDs</span><span class="o">=(</span><span class="si">$(</span>nvidia-smi <span class="nt">-L</span> | <span class="nb">grep</span> <span class="nt">-oE</span> <span class="s2">"(MIG)-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*"</span><span class="si">)</span><span class="o">)</span>

<span class="k">for</span> <span class="o">((</span>index <span class="o">=</span> 0<span class="p">;</span> index &lt; <span class="k">${#</span><span class="nv">GPU_UUIDs</span><span class="p">[@]</span><span class="k">}</span><span class="p">;</span> index++<span class="o">))</span><span class="p">;</span> <span class="k">do
  </span><span class="nv">GPU</span><span class="o">=</span><span class="k">${</span><span class="nv">GPU_UUIDs</span><span class="p">[index]</span><span class="k">}</span>
  <span class="nb">rm</span> <span class="nt">-rf</span> /tmp/mps_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">rm</span> <span class="nt">-rf</span> /tmp/mps_log_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">mkdir</span> /tmp/mps_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">mkdir</span> /tmp/mps_log_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="c"># Skip setting the GPU compute mode to Exclusive Process (not supported on MIG-enabled GPUs)</span>
  <span class="c"># nvidia-smi -i $index -c EXCLUSIVE_PROCESS</span>
  <span class="nb">export </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">export </span><span class="nv">CUDA_MPS_PIPE_DIRECTORY</span><span class="o">=</span>/tmp/mps_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">export </span><span class="nv">CUDA_MPS_LOG_DIRECTORY</span><span class="o">=</span>/tmp/mps_log_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  nvidia-cuda-mps-control <span class="nt">-d</span>
<span class="k">done

</span>ps <span class="nt">-ef</span> | <span class="nb">grep </span>mps
</code></pre></div></div>

<p>In summary, the script does the following:</p>

<ul>
  <li>getting the UUIDs of the MIG instances.</li>
  <li>creating unique directories for each MIG instance.</li>
  <li>setting the <code class="language-plaintext highlighter-rouge">CUDA_VISIBLE_DEVICES</code> to the MIG instance.</li>
  <li>setting the <code class="language-plaintext highlighter-rouge">CUDA_MPS_PIPE_DIRECTORY</code> and <code class="language-plaintext highlighter-rouge">CUDA_MPS_LOG_DIRECTORY</code> to the unique directories.</li>
  <li>enabling MPS server on the specified MIG instance.</li>
  <li>repeating the steps for all the MIG instances.</li>
  <li>And at the end, listing the MPS processes.</li>
</ul>

<h2 id="disabling-mps-on-mig">Disabling MPS on MIG</h2>

<p>To disable MPS on MIG, you can use the following script:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nb">set</span> <span class="nt">-eux</span>

<span class="c"># GPU_UUIDs=($(nvidia-smi -L | grep -oE "(MIG|GPU)-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*"))</span>
<span class="nv">GPU_UUIDs</span><span class="o">=(</span><span class="si">$(</span>nvidia-smi <span class="nt">-L</span> | <span class="nb">grep</span> <span class="nt">-oE</span> <span class="s2">"(MIG)-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*"</span><span class="si">)</span><span class="o">)</span>

<span class="k">for</span> <span class="o">((</span>index <span class="o">=</span> 0<span class="p">;</span> index &lt; <span class="k">${#</span><span class="nv">GPU_UUIDs</span><span class="p">[@]</span><span class="k">}</span><span class="p">;</span> index++<span class="o">))</span><span class="p">;</span> <span class="k">do
  </span><span class="nv">GPU</span><span class="o">=</span><span class="k">${</span><span class="nv">GPU_UUIDs</span><span class="p">[index]</span><span class="k">}</span>
  <span class="nb">export </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">export </span><span class="nv">CUDA_MPS_PIPE_DIRECTORY</span><span class="o">=</span>/tmp/mps_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">export </span><span class="nv">CUDA_MPS_LOG_DIRECTORY</span><span class="o">=</span>/tmp/mps_log_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">echo</span> <span class="s2">"quit"</span> | nvidia-cuda-mps-control
  <span class="nb">rm</span> <span class="nt">-rf</span> /tmp/mps_log_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="nb">rm</span> <span class="nt">-rf</span> /tmp/mps_<span class="k">${</span><span class="nv">GPU</span><span class="k">}</span>
  <span class="c"># Reset the GPU compute mode to Default (not supported on MIG-enabled GPUs)</span>
  <span class="c"># nvidia-smi -i $index -c DEFAULT</span>
<span class="k">done

</span>ps <span class="nt">-ef</span> | <span class="nb">grep </span>mps
</code></pre></div></div>

<p>In summary, the script does the following:</p>

<ul>
  <li>getting the UUIDs of the MIG instances.</li>
  <li>setting the <code class="language-plaintext highlighter-rouge">CUDA_VISIBLE_DEVICES</code> to the MIG instance.</li>
  <li>setting the <code class="language-plaintext highlighter-rouge">CUDA_MPS_PIPE_DIRECTORY</code> and <code class="language-plaintext highlighter-rouge">CUDA_MPS_LOG_DIRECTORY</code> to the unique directories.</li>
  <li>disabling MPS server on the specified MIG instance.</li>
  <li>removing the directories.</li>
  <li>repeating the steps for all the MIG instances.</li>
  <li>And at the end, listing the MPS processes which should be none.</li>
</ul>

<p>Notice that the script does not destroy MIG configuration, and the GPUs will still be in MIG mode. If you want to see how to disable MIG, please refer to the <a href="https://amirsojoodi.github.io/posts/MIG/">previous post</a>.</p>

<h2 id="example">Example</h2>

<p>I have created 2 GPU instances each with 2 compute instances on my A30 GPU. This is how it looks like:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nvidia-smi <span class="nt">-L</span>
GPU 0: NVIDIA A30 <span class="o">(</span>UUID: GPU-8f8bff94-112e-9541-43da-cfd453333404<span class="o">)</span>
  MIG 1c.2g.12gb  Device  0: <span class="o">(</span>UUID: MIG-22f0c05f-5cf2-5ea2-8297-789695656dc0<span class="o">)</span>
  MIG 1c.2g.12gb  Device  1: <span class="o">(</span>UUID: MIG-d74dcafc-aad0-58b8-83d8-61bcd963d2e9<span class="o">)</span>
  MIG 1c.2g.12gb  Device  2: <span class="o">(</span>UUID: MIG-27be287f-e2db-5526-a2f6-0bfabcf34af9<span class="o">)</span>
  MIG 1c.2g.12gb  Device  3: <span class="o">(</span>UUID: MIG-235f71be-a125-5ce0-9fe6-0cd97ae57733<span class="o">)</span>
GPU 1: NVIDIA A30 <span class="o">(</span>UUID: GPU-0783f1eb-ab00-d6ec-92e4-8676be77de38<span class="o">)</span>
GPU 2: NVIDIA A30 <span class="o">(</span>UUID: GPU-a90c6e94-391e-0fc3-8fc5-e2ef46ec6d2d<span class="o">)</span>
GPU 3: NVIDIA A30 <span class="o">(</span>UUID: GPU-46d1eefe-dfc8-2f00-16a9-95c08e019d47<span class="o">)</span>

<span class="nv">$ </span>nvidia-smi
Wed Aug 28 17:07:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|<span class="o">=========================================</span>+<span class="o">======================</span>+<span class="o">======================</span>|
|   0  NVIDIA A30                     On  | 00000000:17:00.0 Off |                   On |
| N/A   28C    P0              30W / 165W |     50MiB / 24576MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A30                     On  | 00000000:65:00.0 Off |                    0 |
| N/A   28C    P0              30W / 165W |      4MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A30                     On  | 00000000:CA:00.0 Off |                    0 |
| N/A   27C    P0              31W / 165W |      4MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A30                     On  | 00000000:E3:00.0 Off |                    0 |
| N/A   28C    P0              32W / 165W |      4MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|<span class="o">==================</span>+<span class="o">================================</span>+<span class="o">===========</span>+<span class="o">=======================</span>|
|  0    1   0   0  |              25MiB / 11968MiB  | 14      0 |  2   0    2    0    0 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+                                +-----------+-----------------------+
|  0    1   1   1  |                                | 14      0 |  2   0    2    0    0 |
|                  |                                |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    2   0   2  |              25MiB / 11968MiB  | 14      0 |  2   0    2    0    0 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+                                +-----------+-----------------------+
|  0    2   1   3  |                                | 14      0 |  2   0    2    0    0 |
|                  |                                |           |                       |
+------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|<span class="o">=======================================================================================</span>|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
</code></pre></div></div>

<p>Here is the output of <code class="language-plaintext highlighter-rouge">ps -ef | grep mps</code> after running the enabling MPS on MIG script (with sudo):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ps <span class="nt">-ef</span> | <span class="nb">grep </span>mps
root      241318       1  0 17:08 ?        00:00:00 nvidia-cuda-mps-control <span class="nt">-d</span>
root      241326       1  0 17:08 ?        00:00:00 nvidia-cuda-mps-control <span class="nt">-d</span>
root      241334       1  0 17:08 ?        00:00:00 nvidia-cuda-mps-control <span class="nt">-d</span>
root      241342       1  0 17:08 ?        00:00:00 nvidia-cuda-mps-control <span class="nt">-d</span>

<span class="c"># And the content of tmp directory</span>
<span class="nv">$ </span><span class="nb">ls</span> <span class="nt">-l</span> /tmp/
total 0
drwxr-xr-x 2 root   root   120 Aug 28 17:08 mps_MIG-22f0c05f-5cf2-5ea2-8297-789695656dc0
drwxr-xr-x 2 root   root   120 Aug 28 17:08 mps_MIG-235f71be-a125-5ce0-9fe6-0cd97ae57733
drwxr-xr-x 2 root   root   120 Aug 28 17:08 mps_MIG-27be287f-e2db-5526-a2f6-0bfabcf34af9
drwxr-xr-x 2 root   root   120 Aug 28 17:08 mps_MIG-d74dcafc-aad0-58b8-83d8-61bcd963d2e9
drwxr-xr-x 2 root   root    80 Aug 28 17:08 mps_log_MIG-22f0c05f-5cf2-5ea2-8297-789695656dc0
drwxr-xr-x 2 root   root    80 Aug 28 17:08 mps_log_MIG-235f71be-a125-5ce0-9fe6-0cd97ae57733
drwxr-xr-x 2 root   root    80 Aug 28 17:08 mps_log_MIG-27be287f-e2db-5526-a2f6-0bfabcf34af9
drwxr-xr-x 2 root   root    80 Aug 28 17:08 mps_log_MIG-d74dcafc-aad0-58b8-83d8-61bcd963d2e9
</code></pre></div></div>

<p>Now, you can run your application on the MIG instances with MPS enabled. For example, you can run the following command to run the <code class="language-plaintext highlighter-rouge">deviceQuery</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="nv">CUDA_MPS_PIPE_DIRECTORY</span><span class="o">=</span>/tmp/mps_MIG-22f0c05f-5cf2-5ea2-8297-789695656dc0 <span class="se">\</span>
  <span class="nv">CUDA_MPS_LOG_DIRECTORY</span><span class="o">=</span>/tmp/mps_log_MIG-22f0c05f-5cf2-5ea2-8297-789695656dc0 <span class="se">\</span>
  ./deviceQuery
</code></pre></div></div>

<p>After I ran the <code class="language-plaintext highlighter-rouge">deviceQuery</code> on all the MIG isntances with the correct PIPE and LOG directories, this is the output of <code class="language-plaintext highlighter-rouge">nvidia-smi</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nvidia-smi
...
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|<span class="o">=======================================================================================</span>|
|    0    1    0     241923      C   nvidia-cuda-mps-server                       30MiB |
|    0    1    1     254870      C   nvidia-cuda-mps-server                       30MiB |
|    0    2    0     249932      C   nvidia-cuda-mps-server                       30MiB |
|    0    2    1     242189      C   nvidia-cuda-mps-server                       30MiB |
+---------------------------------------------------------------------------------------+
</code></pre></div></div>

<p>Notice the GI IDs and CI IDs, and how each of them has its own MPS server. It’s worth mentioning that MPS servers are started in a lazy fashion. So if you don’t run any application, the MPS server will not be started.</p>

<p>Btw, I didn’t see any difference in the behaviour of the application when passing <code class="language-plaintext highlighter-rouge">CUDA_VISIBLE_DEVICES</code> to the command. It seems that the <code class="language-plaintext highlighter-rouge">CUDA_MPS_PIPE_DIRECTORY</code> is enough to point to the correct MPS server. However, setting <code class="language-plaintext highlighter-rouge">CUDA_VISIBLE_DEVICES</code> is a good practice to avoid any confusion, as setting it to a wrong value will cause the application to return an error.</p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Programming" /><category term="CUDA" /><category term="MPS" /><category term="MIG" /><summary type="html"><![CDATA[In previous posts, (MPS and MIG), I have explained how to enable MPS and MIG on NVIDIA GPUs. In this post, I will explain how to use both technologies at the same time. In more detail, I would like to enable MPS on all of the MIG instances. For more information, you can refer to the NVIDIA document.]]></summary></entry><entry><title type="html">Wasm + WebGPU example on DCP</title><link href="https://amirsojoodi.github.io/posts/DCP-Wasm-WebGPU-Example/" rel="alternate" type="text/html" title="Wasm + WebGPU example on DCP" /><published>2024-06-20T00:00:00-04:00</published><updated>2024-06-20T00:00:00-04:00</updated><id>https://amirsojoodi.github.io/posts/WebGPU-Wasm-DCP</id><content type="html" xml:base="https://amirsojoodi.github.io/posts/DCP-Wasm-WebGPU-Example/"><![CDATA[<p>This example is a follow-up to my <a href="https://amirsojoodi.github.io/posts/Cross-Platform-WebGPU">previous post</a> on how to write a cross-platform WebGPU example. In this one, I’ll demonstrate how to deploy a matmult example written in C/C++ and WebGPU in a DCP worker using WASM. Note that for verification purposes, I provide <code class="language-plaintext highlighter-rouge">dawn</code>-based native test, too, but this example doesn’t need to build/install dawn in order to work.</p>

<h2 id="what-is-dcp">What is DCP?</h2>

<p>DCP is a secure, and powerful parallel computing platform built on the web technology. For more information, take a look at <a href="https://docs.dcp.dev/">here</a>.</p>

<h2 id="this-example">This example</h2>

<p>We will experiment with a matrix multiplication in C/C++ using WebGPU. The matrices are stored in a 1D array, with their dimensions as their first and second elements, and their data stored afterwards. For this example to work, we have three options:</p>

<ol>
  <li>Build to run natively using <code class="language-plaintext highlighter-rouge">dawn</code></li>
  <li>Build to run as a web worker using <code class="language-plaintext highlighter-rouge">emscripten</code></li>
  <li>Build to run as a DCP workload using <code class="language-plaintext highlighter-rouge">emscripten</code> &lt;– we are interested in this one!</li>
</ol>

<h3 id="code-structure">Code structure</h3>

<p>By the end of the build process, the directory structure should look like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">.</span>
├── clean-and-build.sh
├── deployJob.js
├── node_modules
│   └── ...
├── package
│   ├── build-web
│   │   ├── wasm-webgpu-matmult.js
│   │   └── ...
│   ├── CMakeLists.txt
│   ├── package.dcp
│   └── src
│       ├── closebravo.js
│       ├── index.html
│       ├── openbravo.js
│       └── wasm-webgpu-matmult.cpp
├── package.json
├── package-lock.json
├── README.md
└── updateVersion.js
</code></pre></div></div>

<p>The source <code class="language-plaintext highlighter-rouge">./package/src/wasm-webgpu-matmult.cpp</code> is similar to the source from the previous post:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;cstring&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;iterator&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;vector&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;webgpu/webgpu_cpp.h&gt;</span><span class="cp">
</span>
<span class="cp">#ifdef __EMSCRIPTEN__
#include</span> <span class="cpf">&lt;emscripten.h&gt;</span><span class="cp">
#endif
</span>
<span class="n">wgpu</span><span class="o">::</span><span class="n">Instance</span> <span class="n">instance</span><span class="p">;</span>
<span class="n">wgpu</span><span class="o">::</span><span class="n">Adapter</span> <span class="n">adapter</span><span class="p">;</span>
<span class="n">wgpu</span><span class="o">::</span><span class="n">Device</span> <span class="n">device</span><span class="p">;</span>

<span class="n">wgpu</span><span class="o">::</span><span class="n">Buffer</span> <span class="n">gpuReadBuffer</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">resultMatrixSize</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">work_done</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>

<span class="c1">// GetAdapter gets a callback function that it's get called</span>
<span class="c1">// after the RequestAdapter resolves.</span>
<span class="kt">void</span> <span class="nf">GetAdapter</span><span class="p">(</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">callback</span><span class="p">)())</span> <span class="p">{</span>
  <span class="n">instance</span><span class="p">.</span><span class="n">RequestAdapter</span><span class="p">(</span>
      <span class="nb">nullptr</span><span class="p">,</span>
      <span class="p">[](</span><span class="n">WGPURequestAdapterStatus</span> <span class="n">status</span><span class="p">,</span> <span class="n">WGPUAdapter</span> <span class="n">cAdapter</span><span class="p">,</span>
         <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">message</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">userdata</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">message</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"RequestAdapter message: "</span> <span class="o">&lt;&lt;</span> <span class="n">message</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="n">WGPURequestAdapterStatus_Success</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"AdapterRequest was not successfull"</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
          <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="n">adapter</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">Adapter</span><span class="o">::</span><span class="n">Acquire</span><span class="p">(</span><span class="n">cAdapter</span><span class="p">);</span>
        <span class="c1">// (2) Cast userdata back to the callback and then call it</span>
        <span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="p">)()</span><span class="o">&gt;</span><span class="p">(</span><span class="n">userdata</span><span class="p">)();</span>
      <span class="p">},</span>
      <span class="c1">// (1) Cast the call back to void pointer and pass it in</span>
      <span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">void</span> <span class="o">*&gt;</span><span class="p">(</span><span class="n">callback</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// Similar to GetAdapter, the callback is called when RequestDevice resolves</span>
<span class="kt">void</span> <span class="n">GetDevice</span><span class="p">(</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">callback</span><span class="p">)())</span> <span class="p">{</span>
  <span class="n">adapter</span><span class="p">.</span><span class="n">RequestDevice</span><span class="p">(</span>
      <span class="nb">nullptr</span><span class="p">,</span>
      <span class="p">[](</span><span class="n">WGPURequestDeviceStatus</span> <span class="n">status</span><span class="p">,</span> <span class="n">WGPUDevice</span> <span class="n">cDevice</span><span class="p">,</span>
         <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">message</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">userdata</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">message</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"RequestDevice message: "</span> <span class="o">&lt;&lt;</span> <span class="n">message</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">!=</span> <span class="n">WGPURequestDeviceStatus_Success</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"AdapterRequest was not successfull"</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
          <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="n">device</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">Device</span><span class="o">::</span><span class="n">Acquire</span><span class="p">(</span><span class="n">cDevice</span><span class="p">);</span>
        <span class="n">device</span><span class="p">.</span><span class="n">SetUncapturedErrorCallback</span><span class="p">(</span>
            <span class="p">[](</span><span class="n">WGPUErrorType</span> <span class="n">type</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">message</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">userdata</span><span class="p">)</span> <span class="p">{</span>
              <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Error: "</span> <span class="o">&lt;&lt;</span> <span class="n">type</span> <span class="o">&lt;&lt;</span> <span class="s">" - message: "</span> <span class="o">&lt;&lt;</span> <span class="n">message</span><span class="p">;</span>
            <span class="p">},</span>
            <span class="nb">nullptr</span><span class="p">);</span>
        <span class="c1">// (2) Cast userdata back to the callback and then call it</span>
        <span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="p">)()</span><span class="o">&gt;</span><span class="p">(</span><span class="n">userdata</span><span class="p">)();</span>
      <span class="p">},</span>
      <span class="c1">// (1) Cast the call back to void pointer and pass it in</span>
      <span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">void</span> <span class="o">*&gt;</span><span class="p">(</span><span class="n">callback</span><span class="p">));</span>
<span class="p">}</span>

<span class="k">const</span> <span class="kt">char</span> <span class="n">shaderCode</span><span class="p">[]</span> <span class="o">=</span> <span class="s">R"(
    struct Matrix {
        size : vec2&lt;f32&gt;,
        numbers: array&lt;f32&gt;,
    };

    @group(0) @binding(0) var&lt;storage, read&gt; firstMatrix : Matrix;
    @group(0) @binding(1) var&lt;storage, read&gt; secondMatrix : Matrix;
    @group(0) @binding(2) var&lt;storage, read_write&gt; resultMatrix : Matrix;

    @compute @workgroup_size(8, 8)
    fn main(@builtin(global_invocation_id) global_id : vec3&lt;u32&gt;) {
        // Guard against out-of-bounds work group sizes
        if (global_id.x &gt;= u32(firstMatrix.size.x) || global_id.y &gt;= u32(secondMatrix.size.y)) {
            return;
        }

        resultMatrix.size = vec2(firstMatrix.size.x, secondMatrix.size.y);

        let resultCell = vec2(global_id.x, global_id.y);
        var result = 0.0;
        for (var i = 0u; i &lt; u32(firstMatrix.size.y); i = i + 1u) {
            let a = i + resultCell.x * u32(firstMatrix.size.y);
            let b = resultCell.y + i * u32(secondMatrix.size.y);
            result = result + firstMatrix.numbers[a] * secondMatrix.numbers[b];
        }

        let index = resultCell.y + resultCell.x * u32(secondMatrix.size.y);
        resultMatrix.numbers[index] = result;
    }
)"</span><span class="p">;</span>

<span class="c1">// This callback is called when the last mapAsync is resolved</span>
<span class="kt">void</span> <span class="n">BufferMapCallbackFunction</span><span class="p">(</span><span class="n">WGPUBufferMapAsyncStatus</span> <span class="n">status</span><span class="p">,</span>
                               <span class="kt">void</span> <span class="o">*</span><span class="n">userdata</span><span class="p">)</span> <span class="p">{</span>

  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"In Buffer async call back, status: "</span> <span class="o">&lt;&lt;</span> <span class="n">status</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">status</span> <span class="o">==</span> <span class="n">WGPUBufferMapAsyncStatus_Success</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">const</span> <span class="kt">float</span> <span class="o">*</span><span class="n">resultData</span> <span class="o">=</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="k">const</span> <span class="kt">float</span> <span class="o">*&gt;</span><span class="p">(</span>
        <span class="n">gpuReadBuffer</span><span class="p">.</span><span class="n">GetConstMappedRange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">resultMatrixSize</span><span class="p">));</span>

    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Result Matrix: "</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">resultMatrixSize</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">);</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">resultData</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&lt;&lt;</span> <span class="s">" "</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>

    <span class="n">gpuReadBuffer</span><span class="p">.</span><span class="n">Unmap</span><span class="p">();</span>
  <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Failed to map result buffer"</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
  <span class="p">}</span>
  <span class="o">*</span><span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">bool</span> <span class="o">*&gt;</span><span class="p">(</span><span class="n">userdata</span><span class="p">)</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">RunMatMult</span><span class="p">()</span> <span class="p">{</span>
  <span class="c1">// First Matrix</span>
  <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span> <span class="n">firstMatrix</span> <span class="o">=</span> <span class="p">{</span><span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">};</span>
  <span class="kt">size_t</span> <span class="n">firstMatrixSize</span> <span class="o">=</span> <span class="n">firstMatrix</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">);</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">Buffer</span> <span class="n">gpuBufferFirstMatrix</span> <span class="o">=</span>
      <span class="n">device</span><span class="p">.</span><span class="n">CreateBuffer</span><span class="p">(</span><span class="k">new</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferDescriptor</span><span class="p">{</span>
          <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">Storage</span><span class="p">,</span>
          <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">firstMatrixSize</span><span class="p">,</span>
          <span class="p">.</span><span class="n">mappedAtCreation</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
      <span class="p">});</span>
  <span class="n">std</span><span class="o">::</span><span class="n">memcpy</span><span class="p">(</span><span class="n">gpuBufferFirstMatrix</span><span class="p">.</span><span class="n">GetMappedRange</span><span class="p">(),</span> <span class="n">firstMatrix</span><span class="p">.</span><span class="n">data</span><span class="p">(),</span>
              <span class="n">firstMatrixSize</span><span class="p">);</span>
  <span class="n">gpuBufferFirstMatrix</span><span class="p">.</span><span class="n">Unmap</span><span class="p">();</span>

  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"First Matrix: "</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
  <span class="n">std</span><span class="o">::</span><span class="n">copy</span><span class="p">(</span><span class="n">firstMatrix</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">firstMatrix</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span>
            <span class="n">std</span><span class="o">::</span><span class="n">ostream_iterator</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="p">,</span> <span class="s">" "</span><span class="p">));</span>
  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>

  <span class="c1">// Second Matrix</span>
  <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span> <span class="n">secondMatrix</span> <span class="o">=</span> <span class="p">{</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">};</span>
  <span class="kt">size_t</span> <span class="n">secondMatrixSize</span> <span class="o">=</span> <span class="n">secondMatrix</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">);</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">Buffer</span> <span class="n">gpuBufferSecondMatrix</span> <span class="o">=</span>
      <span class="n">device</span><span class="p">.</span><span class="n">CreateBuffer</span><span class="p">(</span><span class="k">new</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferDescriptor</span><span class="p">{</span>
          <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">Storage</span><span class="p">,</span>
          <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">secondMatrixSize</span><span class="p">,</span>
          <span class="p">.</span><span class="n">mappedAtCreation</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
      <span class="p">});</span>
  <span class="n">std</span><span class="o">::</span><span class="n">memcpy</span><span class="p">(</span><span class="n">gpuBufferSecondMatrix</span><span class="p">.</span><span class="n">GetMappedRange</span><span class="p">(),</span> <span class="n">secondMatrix</span><span class="p">.</span><span class="n">data</span><span class="p">(),</span>
              <span class="n">secondMatrixSize</span><span class="p">);</span>
  <span class="n">gpuBufferSecondMatrix</span><span class="p">.</span><span class="n">Unmap</span><span class="p">();</span>

  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Second Matrix: "</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
  <span class="n">std</span><span class="o">::</span><span class="n">copy</span><span class="p">(</span><span class="n">secondMatrix</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">secondMatrix</span><span class="p">.</span><span class="n">end</span><span class="p">(),</span>
            <span class="n">std</span><span class="o">::</span><span class="n">ostream_iterator</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="p">,</span> <span class="s">" "</span><span class="p">));</span>
  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>

  <span class="c1">// Result Matrix</span>
  <span class="n">resultMatrixSize</span> <span class="o">=</span>
      <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="mi">2</span> <span class="o">+</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">firstMatrix</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">*</span>
                               <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">secondMatrix</span><span class="p">[</span><span class="mi">1</span><span class="p">]));</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">Buffer</span> <span class="n">resultMatrixBuffer</span> <span class="o">=</span>
      <span class="n">device</span><span class="p">.</span><span class="n">CreateBuffer</span><span class="p">(</span><span class="k">new</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferDescriptor</span><span class="p">{</span>
          <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">Storage</span> <span class="o">|</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">CopySrc</span><span class="p">,</span>
          <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">resultMatrixSize</span><span class="p">,</span>
      <span class="p">});</span>

  <span class="c1">// Compute shader code</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">ShaderModuleWGSLDescriptor</span> <span class="n">shaderModuleDesc</span> <span class="o">=</span> <span class="p">{};</span>
  <span class="n">shaderModuleDesc</span><span class="p">.</span><span class="n">code</span> <span class="o">=</span> <span class="n">shaderCode</span><span class="p">;</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">ShaderModuleDescriptor</span> <span class="n">shaderModuleDescriptor</span><span class="p">{.</span><span class="n">nextInChain</span> <span class="o">=</span>
                                                          <span class="o">&amp;</span><span class="n">shaderModuleDesc</span><span class="p">};</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">ShaderModule</span> <span class="n">shaderModule</span> <span class="o">=</span>
      <span class="n">device</span><span class="p">.</span><span class="n">CreateShaderModule</span><span class="p">(</span><span class="o">&amp;</span><span class="n">shaderModuleDescriptor</span><span class="p">);</span>
  <span class="c1">// Pipeline setup</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">ComputePipelineDescriptor</span> <span class="n">pipelineDesc</span> <span class="o">=</span> <span class="p">{};</span>
  <span class="n">pipelineDesc</span><span class="p">.</span><span class="n">compute</span><span class="p">.</span><span class="n">module</span> <span class="o">=</span> <span class="n">shaderModule</span><span class="p">;</span>
  <span class="n">pipelineDesc</span><span class="p">.</span><span class="n">compute</span><span class="p">.</span><span class="n">entryPoint</span> <span class="o">=</span> <span class="s">"main"</span><span class="p">;</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">ComputePipeline</span> <span class="n">computePipeline</span> <span class="o">=</span>
      <span class="n">device</span><span class="p">.</span><span class="n">CreateComputePipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pipelineDesc</span><span class="p">);</span>

  <span class="c1">// Bind group</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">BindGroupDescriptor</span> <span class="n">bindGroupDesc</span> <span class="o">=</span> <span class="p">{};</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">BindGroupEntry</span> <span class="n">entries</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="p">{};</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">gpuBufferFirstMatrix</span><span class="p">;</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">gpuBufferSecondMatrix</span><span class="p">;</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
  <span class="n">entries</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">resultMatrixBuffer</span><span class="p">;</span>
  <span class="n">bindGroupDesc</span><span class="p">.</span><span class="n">entryCount</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
  <span class="n">bindGroupDesc</span><span class="p">.</span><span class="n">entries</span> <span class="o">=</span> <span class="n">entries</span><span class="p">;</span>
  <span class="n">bindGroupDesc</span><span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="n">computePipeline</span><span class="p">.</span><span class="n">GetBindGroupLayout</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">BindGroup</span> <span class="n">bindGroup</span> <span class="o">=</span> <span class="n">device</span><span class="p">.</span><span class="n">CreateBindGroup</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bindGroupDesc</span><span class="p">);</span>

  <span class="c1">// Commands submission</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">CommandEncoder</span> <span class="n">commandEncoder</span> <span class="o">=</span> <span class="n">device</span><span class="p">.</span><span class="n">CreateCommandEncoder</span><span class="p">();</span>

  <span class="n">wgpu</span><span class="o">::</span><span class="n">ComputePassEncoder</span> <span class="n">passEncoder</span> <span class="o">=</span> <span class="n">commandEncoder</span><span class="p">.</span><span class="n">BeginComputePass</span><span class="p">();</span>
  <span class="n">passEncoder</span><span class="p">.</span><span class="n">SetPipeline</span><span class="p">(</span><span class="n">computePipeline</span><span class="p">);</span>
  <span class="n">passEncoder</span><span class="p">.</span><span class="n">SetBindGroup</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">bindGroup</span><span class="p">);</span>
  <span class="kt">uint32_t</span> <span class="n">workgroupCountX</span> <span class="o">=</span>
      <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ceil</span><span class="p">(</span><span class="n">firstMatrix</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="mf">8.0</span><span class="n">f</span><span class="p">));</span>
  <span class="kt">uint32_t</span> <span class="n">workgroupCountY</span> <span class="o">=</span>
      <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">uint32_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ceil</span><span class="p">(</span><span class="n">secondMatrix</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="mf">8.0</span><span class="n">f</span><span class="p">));</span>
  <span class="n">passEncoder</span><span class="p">.</span><span class="n">DispatchWorkgroups</span><span class="p">(</span><span class="n">workgroupCountX</span><span class="p">,</span> <span class="n">workgroupCountY</span><span class="p">);</span>
  <span class="n">passEncoder</span><span class="p">.</span><span class="n">End</span><span class="p">();</span>

  <span class="c1">// Get a GPU buffer for reading in an unmapped state</span>
  <span class="n">gpuReadBuffer</span> <span class="o">=</span> <span class="n">device</span><span class="p">.</span><span class="n">CreateBuffer</span><span class="p">(</span><span class="k">new</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferDescriptor</span><span class="p">{</span>
      <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">CopyDst</span> <span class="o">|</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">BufferUsage</span><span class="o">::</span><span class="n">MapRead</span><span class="p">,</span>
      <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">resultMatrixSize</span><span class="p">,</span>
  <span class="p">});</span>

  <span class="c1">// Encode commands for copying buffer to buffer</span>
  <span class="n">commandEncoder</span><span class="p">.</span><span class="n">CopyBufferToBuffer</span><span class="p">(</span><span class="n">resultMatrixBuffer</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">gpuReadBuffer</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span>
                                    <span class="n">resultMatrixSize</span><span class="p">);</span>

  <span class="c1">// Submit GPU commands</span>
  <span class="n">wgpu</span><span class="o">::</span><span class="n">CommandBuffer</span> <span class="n">commands</span> <span class="o">=</span> <span class="n">commandEncoder</span><span class="p">.</span><span class="n">Finish</span><span class="p">();</span>
  <span class="n">device</span><span class="p">.</span><span class="n">GetQueue</span><span class="p">().</span><span class="n">Submit</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">commands</span><span class="p">);</span>

  <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Commands submitted to the GPU Queue"</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>

  <span class="n">gpuReadBuffer</span><span class="p">.</span><span class="n">MapAsync</span><span class="p">(</span><span class="n">wgpu</span><span class="o">::</span><span class="n">MapMode</span><span class="o">::</span><span class="n">Read</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="mi">0</span><span class="p">,</span> <span class="n">resultMatrixSize</span><span class="p">,</span>
                         <span class="n">BufferMapCallbackFunction</span><span class="p">,</span>
                         <span class="k">reinterpret_cast</span><span class="o">&lt;</span><span class="kt">void</span> <span class="o">*&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">work_done</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// The content of this function could be in the main()</span>
<span class="c1">// I wrote it like this to show how the function export works in emscripten </span>
<span class="c1">// Also, we can pass necessary arguements easier from JS side. </span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">RunMatMultWrapper</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">instance</span> <span class="o">=</span> <span class="n">wgpu</span><span class="o">::</span><span class="n">CreateInstance</span><span class="p">();</span>

  <span class="n">GetAdapter</span><span class="p">([]()</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"GPU Adapter acquired."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="n">GetDevice</span><span class="p">([]()</span> <span class="p">{</span>
      <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"GPU Device acquired."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
      <span class="n">RunMatMult</span><span class="p">();</span>
    <span class="p">});</span>
  <span class="p">});</span>

  <span class="c1">// https://eliemichel.github.io/LearnWebGPU/getting-started/the-command-queue.html#device-polling</span>
<span class="cp">#ifdef __EMSCRIPTEN__
</span>  <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">work_done</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">emscripten_sleep</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>
  <span class="p">}</span>
<span class="cp">#else
</span>  <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">work_done</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">instance</span><span class="p">.</span><span class="n">ProcessEvents</span><span class="p">();</span>
  <span class="p">}</span>
<span class="cp">#endif
</span><span class="p">}</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">RunMatMultWrapper</span><span class="p">();</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="requirements">Requirements</h2>

<ul>
  <li>CMake</li>
  <li>Emscripten</li>
  <li>DCP setup</li>
</ul>

<p>You can use <a href="https://github.com/emscripten-core/emsdk">Emscripten SDK</a> to install all the required tools.
Make sure to set environment variables (either in bash or everytime you want to use WASM toolchain)</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk <span class="nb">install </span>latest
emsdk activate latest
<span class="nb">source</span> <span class="s2">"/path/to/emsdk/emsdk_env.sh"</span>
</code></pre></div></div>

<p>To start your DCP journey, go to <a href="https://docs.dcp.dev/intro/getting-setup.html">here</a>.</p>

<h2 id="build-and-run">Build and Run</h2>

<p>The current example is tested with Emscripten <code class="language-plaintext highlighter-rouge">3.1.61</code> (for DCP) and dawn <code class="language-plaintext highlighter-rouge">chrome/6562</code> (as standalone).</p>

<h3 id="build">Build</h3>

<p>You can build the project using <code class="language-plaintext highlighter-rouge">./clean-and-build.sh</code> script with these options:</p>

<ol>
  <li>DCP (to build and deploy the package)</li>
  <li>web (to test the the code as a standalone web example)</li>
  <li>native (to run the binary file natively, again standalone)</li>
</ol>

<p>For <code class="language-plaintext highlighter-rouge">DCP</code> and <code class="language-plaintext highlighter-rouge">web</code> options, the script uses emscripten cmake:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emcmake cmake <span class="nt">-B</span> build-web <span class="o">&amp;&amp;</span> cmake <span class="nt">--build</span> build-web
</code></pre></div></div>

<p>For native builds, make sure to set the correct address to the <code class="language-plaintext highlighter-rouge">dawn</code> directory in the <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> and the script has CMake taking care of building it with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake <span class="nt">-B</span> build <span class="o">&amp;&amp;</span> cmake <span class="nt">--build</span> build <span class="nt">-j4</span>

<span class="c"># For debugging, you can add the following option</span>
cmake <span class="nt">-DCMAKE_BUILD_TYPE</span><span class="o">=</span>Debug <span class="nt">-B</span> build <span class="o">&amp;&amp;</span> cmake <span class="nt">--build</span> build <span class="nt">-j4</span>
</code></pre></div></div>

<p>This is the content of the <code class="language-plaintext highlighter-rouge">./clean-and-build.sh</code> script:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nb">set</span> <span class="nt">-eux</span>

<span class="c"># Arg could be DCP (default), web, or native</span>
<span class="c"># Default to DCP if no argument is passed</span>
<span class="nv">MODE</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">1</span><span class="k">:-</span><span class="nv">DCP</span><span class="k">}</span><span class="s2">"</span>

<span class="nv">BUILD_DIR</span><span class="o">=</span>package/build
<span class="nv">BUILD_WEB_DIR</span><span class="o">=</span>package/build-web

<span class="c"># Function to prompt the user for confirmation</span>
confirm<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local dir</span><span class="o">=</span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
  <span class="nb">read</span> <span class="nt">-r</span> <span class="nt">-p</span> <span class="s2">"Are you sure you want to remove </span><span class="k">${</span><span class="nv">dir</span><span class="k">}</span><span class="s2">? [y/N] "</span> response
  <span class="k">case</span> <span class="s2">"</span><span class="nv">$response</span><span class="s2">"</span> <span class="k">in</span>
  <span class="o">[</span>yY][eE][sS] <span class="p">|</span> <span class="o">[</span>yY]<span class="p">)</span>
    <span class="nb">true</span>
    <span class="p">;;</span>
  <span class="k">*</span><span class="p">)</span>
    <span class="nb">false</span>
    <span class="p">;;</span>
  <span class="k">esac</span>
<span class="o">}</span>
<span class="c"># Set CMake options based on the MODE</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"DCP"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nv">CMAKE_OPTIONS</span><span class="o">=</span><span class="s2">"-DBUILD_FOR_DCP=ON"</span>
<span class="k">elif</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"web"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Standalone web mode is enabled. Setting DCP to off."</span>
  <span class="nv">CMAKE_OPTIONS</span><span class="o">=</span><span class="s2">"-DBUILD_FOR_DCP=OFF"</span>
<span class="k">elif</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"native"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span><span class="nb">echo</span> <span class="s2">"Standalone native mode is enabled. Setting DCP to off."</span>
  <span class="nv">CMAKE_OPTIONS</span><span class="o">=</span><span class="s2">"-DBUILD_FOR_DCP=OFF"</span>
<span class="k">else
  </span><span class="nb">echo</span> <span class="s2">"No valid option is passed. Options are DCP (default) or local."</span>
  <span class="nv">CMAKE_OPTIONS</span><span class="o">=</span><span class="s2">"-DBUILD_FOR_DCP=ON"</span>
  <span class="nv">MODE</span><span class="o">=</span><span class="s2">"DCP"</span>
<span class="k">fi

if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"native"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  if </span>confirm <span class="s2">"</span><span class="nv">$BUILD_DIR</span><span class="s2">"</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"Doing a clean build!"</span>
    <span class="nb">rm</span> <span class="nt">-rf</span> <span class="s2">"</span><span class="nv">$BUILD_DIR</span><span class="s2">"</span>
  <span class="k">fi
  </span>cmake <span class="nt">-B</span> build <span class="nt">-S</span> package <span class="nt">-B</span> package/build <span class="o">&amp;&amp;</span> cmake <span class="nt">--build</span> package/build <span class="nt">-j4</span>
<span class="k">else
  if </span>confirm <span class="s2">"</span><span class="nv">$BUILD_WEB_DIR</span><span class="s2">"</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"Doing a clean build!"</span>
    <span class="nb">rm</span> <span class="nt">-rf</span> <span class="s2">"</span><span class="nv">$BUILD_WEB_DIR</span><span class="s2">"</span>
  <span class="k">fi
  </span>emcmake cmake <span class="nt">-S</span> package <span class="nt">-B</span> package/build-web <span class="nv">$CMAKE_OPTIONS</span> <span class="o">&amp;&amp;</span>
    cmake <span class="nt">--build</span> package/build-web <span class="nt">--</span> <span class="nv">VERBOSE</span><span class="o">=</span>1
<span class="k">fi</span>

<span class="c"># Run additional commands only if MODE is DCP</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"DCP"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
  </span>node ./updateVersion.js
  npm i <span class="nt">-g</span> dcp-client
  publish package package/package.dcp
<span class="k">fi</span>
</code></pre></div></div>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">./package/CMakelists.txt</code> file:</li>
</ul>

<div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cmake_minimum_required</span><span class="p">(</span>VERSION 3.13<span class="p">)</span>
<span class="nb">project</span><span class="p">(</span>wasm-webgpu-matmult LANGUAGES C CXX<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_CXX_STANDARD 20<span class="p">)</span>

<span class="nb">add_executable</span><span class="p">(</span>wasm-webgpu-matmult <span class="s2">"src/wasm-webgpu-matmult.cpp"</span><span class="p">)</span>

<span class="nb">if</span><span class="p">(</span>EMSCRIPTEN<span class="p">)</span>
  <span class="c1"># Create a JS file only, and not the html template file</span>
  <span class="nb">set_target_properties</span><span class="p">(</span>wasm-webgpu-matmult PROPERTIES SUFFIX <span class="s2">".js"</span><span class="p">)</span>

  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sSINGLE_FILE=1"</span><span class="p">)</span>

  <span class="c1"># Enable WebGPU through (webgpu/webgpu.h)</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sUSE_WEBGPU=1"</span><span class="p">)</span>

  <span class="c1"># Help with printing stack trace, error prevention</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sASSERTIONS=1"</span><span class="p">)</span>

  <span class="c1"># Enable memory growth at runtime and refrain from throwing exception</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sALLOW_MEMORY_GROWTH=1"</span><span class="p">)</span>

  <span class="c1"># Disable WASM module generation. (Everything will be in a JS file)</span>
  <span class="c1"># So far, passing -sWASM=0 or -sWASM=1 doesn't make any difference :-?</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sWASM=1"</span><span class="p">)</span>

  <span class="c1"># Whether to support async operations in the compiled code. This makes it</span>
  <span class="c1"># possible to call JS functions from synchronous-looking code in C/C++.</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sASYNCIFY=1"</span><span class="p">)</span>

  <span class="c1"># Enable optimization in code speed and size</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-O3"</span><span class="p">)</span>

  <span class="nb">target_link_options</span><span class="p">(</span>
    wasm-webgpu-matmult PRIVATE
    <span class="s2">"-sEXPORTED_RUNTIME_METHODS=['ccall','cwrap','callMain']"</span>
  <span class="p">)</span>

  <span class="c1"># Symbols that are explicitly exported. These symbols are kept alive through</span>
  <span class="c1"># LLVM dead code elimination, and also made accessible outside of the</span>
  <span class="c1"># generated code even after running closure compiler (on "Module").  Native</span>
  <span class="c1"># symbols listed here require an ``_`` prefix. By default if this setting is</span>
  <span class="c1"># not specified on the command line the ``_main`` function will be implicitly</span>
  <span class="c1"># exported.  In STANDALONE_WASM mode the default export is ``__start`` (or</span>
  <span class="c1"># ``__initialize`` if --no-entry is specified). JS Library symbols can also be</span>
  <span class="c1"># added to this list (without the leading `$`). var EXPORTED_FUNCTIONS = [];</span>
  <span class="nb">target_link_options</span><span class="p">(</span>
    wasm-webgpu-matmult PRIVATE
    <span class="s2">"-sEXPORTED_FUNCTIONS=['_RunMatMultWrapper','_main']"</span>
  <span class="p">)</span>

  <span class="c1"># Whether we will run the main() function. Disable if you embed the generated</span>
  <span class="c1"># code in your own, and will call main() yourself at the right time (which you</span>
  <span class="c1"># can do with Module.callMain()</span>
  <span class="nb">if</span><span class="p">(</span>BUILD_FOR_DCP<span class="p">)</span>
    <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sINVOKE_RUN=0"</span><span class="p">)</span>
  <span class="nb">else</span><span class="p">()</span>
    <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sINVOKE_RUN=1"</span><span class="p">)</span>
  <span class="nb">endif</span><span class="p">()</span>

  <span class="c1"># Specify which runtime environments the JS output will be capable of running</span>
  <span class="c1"># in.  For maximum portability this can configured to support all environments</span>
  <span class="c1"># or it can be limited to reduce overall code size.</span>
  <span class="c1"># var ENVIRONMENT = 'web,webview,worker,node';</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sENVIRONMENT=worker"</span><span class="p">)</span>

  <span class="c1"># If set to 0, does not build in any filesystem support. Useful if you are</span>
  <span class="c1"># just doing pure computation, but not reading files or using any streams</span>
  <span class="c1"># (including fprintf, and other stdio.h things) or anything related.</span>
  <span class="nb">target_link_options</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE <span class="s2">"-sFILESYSTEM=1"</span><span class="p">)</span>

  <span class="nb">if</span><span class="p">(</span>BUILD_FOR_DCP<span class="p">)</span>
    <span class="nb">target_link_options</span><span class="p">(</span>
      wasm-webgpu-matmult PUBLIC
      <span class="s2">"--extern-pre-js=</span><span class="si">${</span><span class="nv">PROJECT_SOURCE_DIR</span><span class="si">}</span><span class="s2">/src/openbravo.js"</span>
    <span class="p">)</span>

    <span class="nb">target_link_options</span><span class="p">(</span>
      wasm-webgpu-matmult PUBLIC
      <span class="s2">"--extern-post-js=</span><span class="si">${</span><span class="nv">PROJECT_SOURCE_DIR</span><span class="si">}</span><span class="s2">/src/closebravo.js"</span>
    <span class="p">)</span>
  <span class="nb">endif</span><span class="p">()</span>
  
<span class="nb">else</span><span class="p">()</span>
  <span class="nb">set</span><span class="p">(</span>DAWN_FETCH_DEPENDENCIES ON<span class="p">)</span>
  <span class="nb">add_subdirectory</span><span class="p">(</span><span class="s2">"../../dawn"</span> <span class="s2">"build"</span> EXCLUDE_FROM_ALL<span class="p">)</span>
  <span class="nb">target_link_libraries</span><span class="p">(</span>wasm-webgpu-matmult PRIVATE webgpu_cpp webgpu_dawn<span class="p">)</span>
<span class="nb">endif</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="test-in-the-browser-without-dcp">Test in the browser without DCP</h3>

<p>To test the example in the browser without DCP, simply open the file <code class="language-plaintext highlighter-rouge">index.html</code> under <code class="language-plaintext highlighter-rouge">package/src</code> in the browser.</p>

<p>Make sure to enable <code class="language-plaintext highlighter-rouge">WebGPU</code> in your browser first! For instance, if you are on Linux, and your browser is chrome-unstable, pass the necessary options:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>google-chrome-unstable <span class="nt">--enable-unsafe-webgpu</span> <span class="nt">--enable-features</span><span class="o">=</span>Vulkan <span class="se">\ </span>
  <span class="nt">--disable-dawn-features</span><span class="o">=</span>disallow_unsafe_apis &amp;
</code></pre></div></div>

<p>This is the <code class="language-plaintext highlighter-rouge">./package/src/index.html</code> file:</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;!doctype html&gt;</span>
<span class="nt">&lt;html</span> <span class="na">lang=</span><span class="s">"en"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;head&gt;</span>
    <span class="nt">&lt;meta</span> <span class="na">charset=</span><span class="s">"UTF-8"</span> <span class="nt">/&gt;</span>
    <span class="nt">&lt;title&gt;</span>WASM + WebGPU<span class="nt">&lt;/title&gt;</span>
    <span class="nt">&lt;script </span><span class="na">type=</span><span class="s">"module"</span> <span class="na">crossorigin</span> <span class="na">src=</span><span class="s">"../build-web/wasm-webgpu-matmult.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
  <span class="nt">&lt;/head&gt;</span>

  <span class="nt">&lt;body&gt;</span>
    <span class="nt">&lt;pre&gt;</span>Open the console!<span class="nt">&lt;/pre&gt;</span>
  <span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre></div></div>

<h3 id="test-the-binary-natively">Test the binary natively</h3>

<p>It is unnecessary, but if you could build the standalone native binary, the result should be something like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./package/build/wasm-webgpu-matmult

GPU Adapter acquired.
Warning: SetUncapturedErrorCallback is deprecated. Pass the callback <span class="k">in </span>the device descriptor instead.
GPU Device acquired.
First Matrix: 
2 4 1 2 3 4 5 6 7 8 
Second Matrix: 
4 2 1 2 3 4 5 6 7 8 
Commands submitted to the GPU Queue
Warning: Old MapAsync APIs are deprecated. If using C please pass a CallbackInfo struct that has two userdatas. Otherwise, <span class="k">if </span>using C++, please use templated helpers.
In Buffer async call back, status: 1
Result Matrix: 
2 2 50 60 114 140 
Warning: No Dawn device lost callback was set. This is probably not intended. If you really want to ignore device lost and suppress this message, <span class="nb">set </span>the callback explicitly.
</code></pre></div></div>

<h3 id="deploying-the-job-on-dcp">Deploying the job on DCP</h3>

<p>Again, the script <code class="language-plaintext highlighter-rouge">./clean-and-build.sh DCP</code> performs the necessary preparation steps. More specifically:</p>

<ol>
  <li>The code gets built and wrapped between <code class="language-plaintext highlighter-rouge">openbravo.js</code> and <code class="language-plaintext highlighter-rouge">closebravo.js</code> under the <code class="language-plaintext highlighter-rouge">package/src/</code> directory to make it a DCP-friendly module.</li>
  <li>The version number under package/package.dcp will be update</li>
  <li>The npm package <code class="language-plaintext highlighter-rouge">dcp-client</code> will be installed</li>
  <li>The source <code class="language-plaintext highlighter-rouge">wasm-webgpu-matmult.js</code> under <code class="language-plaintext highlighter-rouge">package/build-web/</code> will be deployed.</li>
</ol>

<p>After that, the job can be deployed to the scheduler specified in the environment variable <code class="language-plaintext highlighter-rouge">DCP_SCHEDULER_LOCATION</code>. Make sure you have valid authentication keys. Also, note that the current deploy function in <code class="language-plaintext highlighter-rouge">deployJob.js</code> deploys the job to the default compute group.</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">./updateVersion.js</code> script which is used to update the version number in the <code class="language-plaintext highlighter-rouge">./package/package.dcp</code> file:</li>
</ul>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">node:fs</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">content</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span>
  <span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="dl">'</span><span class="s1">./package/package.dcp</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span> <span class="na">encoding</span><span class="p">:</span> <span class="dl">'</span><span class="s1">utf8</span><span class="dl">'</span> <span class="p">}),</span>
<span class="p">);</span>

<span class="kd">const</span> <span class="nx">version</span> <span class="o">=</span> <span class="nx">content</span><span class="p">.</span><span class="nx">version</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="dl">'</span><span class="s1">.</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">version</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="o">+</span><span class="nx">version</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="nx">content</span><span class="p">.</span><span class="nx">version</span> <span class="o">=</span> <span class="nx">version</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1">.</span><span class="dl">'</span><span class="p">);</span>

<span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="dl">'</span><span class="s1">./package/package.dcp</span><span class="dl">'</span><span class="p">,</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">content</span><span class="p">),</span> <span class="p">{</span>
  <span class="na">encoding</span><span class="p">:</span> <span class="dl">'</span><span class="s1">utf8</span><span class="dl">'</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">./package/src/package.dcp</code> file:</li>
</ul>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"wasm-webgpu-matmult"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.0.19"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"files"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"./build-web/wasm-webgpu-matmult.js"</span><span class="p">:</span><span class="w"> </span><span class="s2">"wasm-webgpu-matmult.js"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">./package/src/openbravo.js</code> file:</li>
</ul>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// file name: openbravo.js</span>

<span class="c1">// This is a BravoJS module definition, generated for DCP</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">declare</span><span class="p">([],</span> <span class="kd">function</span><span class="p">(</span><span class="nx">require</span><span class="p">,</span> <span class="nx">exports</span><span class="p">,</span> <span class="nx">module</span><span class="p">)</span> <span class="p">{</span>
</code></pre></div></div>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">./package/src/closebravo.js</code> file:</li>
</ul>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// file name: closebravo.js</span>

<span class="nx">exports</span><span class="p">.</span><span class="nx">Module</span> <span class="o">=</span> <span class="nx">Module</span><span class="p">;</span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">ccall</span> <span class="o">=</span> <span class="nx">ccall</span><span class="p">;</span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">cwrap</span> <span class="o">=</span> <span class="nx">cwrap</span><span class="p">;</span>
<span class="p">});</span>

<span class="c1">// This concludes the BravoJS module definition</span>
</code></pre></div></div>

<ul>
  <li>This is the content of the <code class="language-plaintext highlighter-rouge">./deployJob.js</code> script:</li>
</ul>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#!/usr/bin/env node
</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">workFn</span><span class="p">(</span><span class="nx">sliceNumber</span><span class="p">,</span> <span class="nx">arg</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">progress</span><span class="p">();</span>
  <span class="kd">const</span> <span class="p">{</span> <span class="nx">Module</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">wasm-webgpu-matmult.js</span><span class="dl">'</span><span class="p">);</span>

  <span class="k">async</span> <span class="kd">function</span> <span class="nx">matmult</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// cwrap(function name, return type, args type); null means void here</span>
    <span class="nx">RunMatMultWrapper</span> <span class="o">=</span> <span class="nx">Module</span><span class="p">.</span><span class="nx">cwrap</span><span class="p">(</span><span class="dl">'</span><span class="s1">RunMatMultWrapper</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">null</span><span class="dl">'</span><span class="p">,</span> <span class="p">[</span><span class="dl">'</span><span class="s1">null</span><span class="dl">'</span><span class="p">],</span> <span class="p">{</span>
      <span class="na">async</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
    <span class="p">});</span>
    <span class="k">await</span> <span class="nx">RunMatMultWrapper</span><span class="p">();</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="k">new</span> <span class="nb">Promise</span><span class="p">((</span><span class="nx">res</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="nx">Module</span><span class="p">.</span><span class="nx">onRuntimeInitialized</span><span class="p">)</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">matmult</span><span class="p">();</span>
      <span class="nx">res</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="nx">Module</span><span class="p">.</span><span class="nx">onRuntimeInitialized</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">matmult</span><span class="p">();</span>
        <span class="nx">res</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span>
      <span class="p">};</span>
    <span class="p">}</span>
  <span class="p">});</span>
<span class="p">}</span>

<span class="k">async</span> <span class="kd">function</span> <span class="nx">deployJob</span><span class="p">()</span> <span class="p">{</span>
  <span class="k">await</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">dcp-client</span><span class="dl">'</span><span class="p">).</span><span class="nx">init</span><span class="p">();</span>

  <span class="kd">let</span> <span class="nx">startTime</span><span class="p">;</span>

  <span class="kd">const</span> <span class="nx">compute</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">dcp/compute</span><span class="dl">'</span><span class="p">);</span>
  <span class="kd">const</span> <span class="nx">wallet</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">dcp/wallet</span><span class="dl">'</span><span class="p">);</span>

  <span class="kd">const</span> <span class="nx">job</span> <span class="o">=</span> <span class="nx">compute</span><span class="p">.</span><span class="k">for</span><span class="p">([</span><span class="mi">1</span><span class="p">],</span> <span class="nx">workFn</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">]);</span>

  <span class="c1">// Get the stringified message from the worker and log</span>
  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">console</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">message</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">message</span><span class="p">));</span>

  <span class="c1">// job.requirements.discrete = true;</span>
  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">accepted</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1"> - Job accepted by scheduler, waiting for results</span><span class="dl">'</span><span class="p">);</span>
    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">` - Job has id </span><span class="p">${</span><span class="nx">job</span><span class="p">.</span><span class="nx">id</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
    <span class="nx">startTime</span> <span class="o">=</span> <span class="nb">Date</span><span class="p">.</span><span class="nx">now</span><span class="p">();</span>
  <span class="p">});</span>

  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">readystatechange</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">arg</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`new ready state: </span><span class="p">${</span><span class="nx">arg</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
  <span class="p">});</span>

  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">result</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">ev</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span>
      <span class="s2">` - Received result for slice </span><span class="p">${</span><span class="nx">ev</span><span class="p">.</span><span class="nx">sliceNumber</span><span class="p">}</span><span class="s2"> at </span><span class="p">${</span>
        <span class="nb">Math</span><span class="p">.</span><span class="nx">round</span><span class="p">((</span><span class="nb">Date</span><span class="p">.</span><span class="nx">now</span><span class="p">()</span> <span class="o">-</span> <span class="nx">startTime</span><span class="p">)</span> <span class="o">/</span> <span class="mi">100</span><span class="p">)</span> <span class="o">/</span> <span class="mi">10</span>
      <span class="p">}</span><span class="s2">s`</span><span class="p">,</span>
    <span class="p">);</span>
  <span class="p">});</span>

  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">status</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">ev</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">Got status update: </span><span class="dl">'</span><span class="p">,</span> <span class="nx">ev</span><span class="p">);</span>
  <span class="p">});</span>

  <span class="nx">job</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">error</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">message</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">message</span><span class="p">));</span>

  <span class="kd">const</span> <span class="nx">ks</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">wallet</span><span class="p">.</span><span class="kd">get</span><span class="p">();</span> <span class="cm">/* usually loads ~/.dcp/default.keystore */</span>
  <span class="nx">job</span><span class="p">.</span><span class="nx">requires</span><span class="p">([</span><span class="dl">'</span><span class="s1">wasm-webgpu-matmult/wasm-webgpu-matmult.js</span><span class="dl">'</span><span class="p">]);</span>
  <span class="nx">job</span><span class="p">.</span><span class="kr">public</span><span class="p">.</span><span class="nx">name</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">wasm-webgpu-matmult</span><span class="dl">'</span><span class="p">;</span>
  <span class="nx">job</span><span class="p">.</span><span class="nx">requirements</span><span class="p">.</span><span class="nx">environment</span> <span class="o">=</span> <span class="p">{</span> <span class="na">webgpu</span><span class="p">:</span> <span class="kc">true</span> <span class="p">};</span>
  <span class="nx">job</span><span class="p">.</span><span class="nx">setPaymentAccountKeystore</span><span class="p">(</span><span class="nx">ks</span><span class="p">);</span>

  <span class="kd">const</span> <span class="nx">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">job</span><span class="p">.</span><span class="nx">exec</span><span class="p">();</span>
  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">results=</span><span class="dl">'</span><span class="p">,</span> <span class="nb">Array</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">results</span><span class="p">));</span>
<span class="p">}</span>

<span class="nx">exports</span><span class="p">.</span><span class="nx">deployJob</span> <span class="o">=</span> <span class="nx">deployJob</span><span class="p">;</span>
<span class="nx">deployJob</span><span class="p">();</span>
</code></pre></div></div>

<p>To deploy the job, simply run:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>node deployJob.js
</code></pre></div></div>

<p>The current code has a lot of logging messages, so you should see something like the following as the output:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>node deployJob.js
new ready state: <span class="nb">exec
</span>new ready state: init
new ready state: preauth
new ready state: deploying
new ready state: listeners
new ready state: compute-groups
new ready state: uploading
new ready state: deployed
 - Job accepted by scheduler, waiting <span class="k">for </span>results

<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'GPU Adapter acquired.'</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'GPU Device acquired.'</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'First Matrix: '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'2 4 1 2 3 4 5 6 7 8 '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'Second Matrix: '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'4 2 1 2 3 4 5 6 7 8 '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'Commands submitted to the GPU Queue'</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'In Buffer async call back, status: 0'</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'Result Matrix: '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
<span class="o">{</span>
  level: <span class="s1">'log'</span>,
  message: <span class="o">[</span> <span class="s1">'2 2 50 60 114 140 '</span> <span class="o">]</span>,
  sliceNumber: 1
<span class="o">}</span>
 - Received result <span class="k">for </span>slice 1 at 5.2s
Got status update:  <span class="o">{</span>
  runStatus: <span class="s1">'finished'</span>,
  total: 1,
  distributed: 1,
  computed: 1,
<span class="o">}</span>
</code></pre></div></div>

<p>Some points:</p>

<ol>
  <li>The deployJob script, deploys the one-slice job with the entry point <code class="language-plaintext highlighter-rouge">workFn</code>. When a worker picks up a job slice, it starts to execute this function.</li>
  <li>Before we can call any function from the generated WASM module we should wait for the runtime to be initialized. This is done like this:</li>
</ol>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="kd">const</span> <span class="p">{</span> <span class="nx">Module</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">wasm-webgpu-matmult.js</span><span class="dl">'</span><span class="p">);</span>

  <span class="k">if</span> <span class="p">(</span><span class="nx">Module</span><span class="p">.</span><span class="nx">onRuntimeInitialized</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// the module is already initialized</span>
    <span class="c1">// ...</span>
  <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
    <span class="c1">// the module is not initialized, we will set a callback  </span>
    <span class="nx">Module</span><span class="p">.</span><span class="nx">onRuntimeInitialized</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
      <span class="c1">// ..</span>
    <span class="p">};</span>
  <span class="p">}</span>
</code></pre></div></div>

<ol>
  <li>Next, the function <code class="language-plaintext highlighter-rouge">RunMatMultWrapper</code> gets called, and as we need its return value, the function should behave synchronously. However, the example in C++ <code class="language-plaintext highlighter-rouge">wasm-webgpu-matmult.cpp</code> uses callbacks everywhere (to hanle device and adapter initialization, etc.). According to <a href="https://eliemichel.github.io/LearnWebGPU/getting-started/the-command-queue.html#device-polling">here</a>, on the C/C++ side, we need to wait a little bit, and importantly to call tick/poll the device so that it updates its awaiting tasks. This is a part of the API that is not standard yet, so we must adapt our implementation to the backend.</li>
  <li>All the optinos in <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> is explained to some extent. Note that we need the <code class="language-plaintext highlighter-rouge">-sASYNCIFY</code> option, specifically, so that we can <code class="language-plaintext highlighter-rouge">await</code> on the <code class="language-plaintext highlighter-rouge">cwrap</code>ed function from our <code class="language-plaintext highlighter-rouge">workFn</code> in JS.</li>
</ol>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">RunMatMultWrapper</span> <span class="o">=</span> <span class="nx">Module</span><span class="p">.</span><span class="nx">cwrap</span><span class="p">(</span><span class="dl">'</span><span class="s1">RunMatMultWrapper</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">null</span><span class="dl">'</span><span class="p">,</span> <span class="p">[</span><span class="dl">'</span><span class="s1">null</span><span class="dl">'</span><span class="p">],</span> <span class="p">{</span><span class="na">async</span><span class="p">:</span> <span class="kc">true</span><span class="p">});</span>
<span class="k">await</span> <span class="nx">RunMatMultWrapper</span><span class="p">();</span>
</code></pre></div></div>

<p>There you go! An example of how to deploy a WebGPU example on DCP using WASM. You can find the complete source of this post <a href="https://github.com/Distributive-Network/dcp-wasm-webgpu-example">here</a></p>]]></content><author><name>AmirHossein Sojoodi</name><email>amir.sojoodi@gmail.com</email></author><category term="Programming" /><category term="WebGPU" /><category term="WASM" /><category term="DCP" /><category term="Emscripten" /><summary type="html"><![CDATA[This example is a follow-up to my previous post on how to write a cross-platform WebGPU example. In this one, I’ll demonstrate how to deploy a matmult example written in C/C++ and WebGPU in a DCP worker using WASM. Note that for verification purposes, I provide dawn-based native test, too, but this example doesn’t need to build/install dawn in order to work.]]></summary></entry></feed>