====================================================
Particle Life on GPU CuPy port for long-form renders
====================================================

.. raw:: html

   <table class="provenance-header" style="border: 0; border-collapse: collapse; margin: 0 0 16px 0; width: 100%;">

.. raw:: html

   <tr style="border: 0;">

.. raw:: html

   <td style="border: 0; vertical-align: top; padding: 0 24px 0 0;">

..

   | **Source:**
     https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders
   | **Snapshot:** 2026-06-30T04:02:31Z
   | **Generator:** Remarkbox ``dba4024``

   *This is a thread snapshot. The living document lives at the source
   URI above — it may have been edited, extended, or replied-to since.*

.. raw:: html

   </td>

.. raw:: html

   <td style="border: 0; vertical-align: top; width: 200px; text-align: right;">

.. figure:: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMQAAADEAQAAAADo3bPGAAAB/klEQVR42t2YQW4EQQgDUT7A/3/JDwguei6R5hK1D7uzyhx6pXYAY8xGvzwVH/pNRGRkRVTVnGR16qh+4u35zzfROXdnVkYLZzCLo/s4kan7B2g+tQGlB0dJ2wQOVqUNZ+4WyMDNJ0w4U4yJpaZEk7YhQHvqo3T9eSx80wPhJh4RHLoN4+/Xp/RqtZBoINQqR966T/oGBU60I2/05Sn/xKQUtiOeHoTSxUmVFIvOruPofuqiblV5bLpDdwqs+SvRz8I3JE3C00eBsh15Yx6AVALN9vSp6KyaDBmQN2SuDbwWm9WrSQdtP1n6Ry+4oDSqSmngtRSH0SY6oHc6MfBN8rzjdPImDWXe3edbM3fASPjg0bckou1OEYE57phzICltEFwYUiFDn6o6SwJm0VHX63mTWPMmiTRrp4PXVGWHgyIr/KvD7+jfjzhimieu6/FIdhZAr6RjHfrGJOh+jDw23jC31/DSP8eKTqUs8Wh0q/QkTxZBGnF//kStwonQxe6Ahbuet3o8CPIpqSbA+75XhXmSppalpe7HE4jATh3m3Lqr+7qD183Io287hyz7AsYj15DUziHL/rN9k4lfrLT4g8SGYnp6pziy6tlP1/RoqgZTO0z7dsEzBUWdujw4LNzYq2Lfdugb+zbt06xbtaPIs28fW6WtRMkrAw++63exX6FOrqlbHKOOAAAAAElFTkSuQmCC
   :alt: Scan for living source

   Scan for living source

.. raw:: html

   </td>

.. raw:: html

   </tr>

.. raw:: html

   </table>

Particle Life on GPU CuPy port for long-form renders
====================================================

.. container::

   | **Code:** ``render_seed_480_panoramic_gpu.py`` (337 lines, single
     file)
   | **Engine:** CuPy + one custom CUDA splat kernel
   | **Host:** ``ai.foxhop.net`` (RTX 4090, sm_89, 24 GB)
   | **Use:** render seed-480 particle-life to MP4, MPEG-2 (DVD-Video),
     or HEVC for BD-R

--------------

.. container::

--------------

What this script renders
------------------------

Particle Life is a 2-D agent toy: N particles, K species, an asymmetric
K×K force matrix, periodic torus boundary on both axes. One seed value
fully determines the matrix, initial positions, & species assignments,
so a given seed gives the same evolving universe every time on every GPU
bit-for-bit. We picked **seed 480** because its matrix produces a
visually active universe that never stalls into a fixed point or limit
cycle for hours of simulation time.

Single CuPy file drives four output targets via one ``MODE`` switch.
Each preset is a sealed dict at the top of our script — flip the
constant, run, get our target file. No CLI args needed.

--------------

Output modes
------------

========= =========== =========== ======== ============================
mode      resolution  N particles duration target file
========= =========== =========== ======== ============================
panoramic 7680 × 2160 115 000     200 s    h264 .mp4 (~2 GB)
dvd_mp4   720 × 480   2 500       7.5 h    h264 .mp4 on data DVD-R
dvd_video 720 × 480   2 500       2 h      MPEG-2 .mpg → real DVD-Video
bd_4k_6h  3840 × 2160 115 000     6 h      HEVC .mp4 on BD-R
========= =========== =========== ======== ============================

``dvd_video`` is our most-tested target: ffmpeg ``-target ntsc-dvd``
produces a spec-compliant MPEG-2 Program Stream with a silent AC-3 audio
track that ``dvdauthor`` turns into a ``VIDEO_TS/`` folder. That folder
gets ``mkisofs -dvd-video`` packed into an ISO &
``growisofs -dvd-compat -Z`` burnt to a Verbatim DVD-R blank. Bootable
on cheap DVD players & Xbox 360, verified on both 2026-06-29.

--------------

The four optimisations that mattered
------------------------------------

Naïve CuPy port of our CPU renderer hit OOM on our 4090 at N=22 000
(peak 19.4 GB of 24 GB). Four changes brought peak down to 2.8 GB at the
same N & raised throughput an order of magnitude:

**1. Custom CUDA splat kernel — one launch per frame.**

The 5×5 soft-particle splat was 25 ``np.add.at`` ops in our CPU code,
which became 25 sequential ``cp.add.at`` kernel launches per frame. We
replaced our entire splat with one ``cp.RawKernel`` that takes
``(pos, buf, kern, N, H, W)`` & atomically adds the 25 contributions for
every particle inside one launch. The whole 720×480 frame splat now
costs one kernel dispatch instead of 25.

**2. \``cp.fuse`\` on the force formula.**

The piecewise force ``f(rn)`` — repulsive below β, attractive between β
& 1, zero beyond — was three array passes in our pure-CuPy version.
Wrapping it in ``@cp.fuse()`` lets CuPy generate one fused elementwise
CUDA kernel that does the whole branch on one read of ``rn``,
``Aij_pairs`` and one write to the output.

**3. Raw bytes straight into ffmpeg's stdin.**

Our CPU script wrote PNGs to disk & let ffmpeg read them back. At 540
000 frames × 720×480 grayscale that would have been 186 GB of raw data
on ``/tmp`` — wouldn't fit. We open ffmpeg as a subprocess with
``stdin=PIPE``, send each frame as ``frame.tobytes()``, & ffmpeg encodes
in parallel with our simulation. Disk usage stays bounded at our final
output size (~4 GB for the 2-hour DVD-Video).

**4. CUDA async memory pool + periodic \``free_all_blocks`\`.**

``cp.cuda.set_allocator(cp.cuda.MemoryAsyncPool().malloc)`` lets the
runtime overlap allocation with kernels. ``mem_pool.free_all_blocks()``
fires every 32 frames so peak allocation doesn't climb monotonically
across our 216 000-frame render. The two together kept us under 3 GB
peak across our entire 2-hour render.

--------------

Benchmarks
----------

Wall-clock fps on RTX 4090, seed 480, our four presets:

+----------------+----------------+----------------+----------------+
| mode           | resolution     | N              | wall fps       |
+================+================+================+================+
| CPU baseline   | 2880 × 1080    | 22 000         |    6 – 7 fps   |
| (panoramic)    |                |                |                |
+----------------+----------------+----------------+----------------+
| GPU panoramic  | 7680 × 2160    | 115 000        |    33 fps      |
+----------------+----------------+----------------+----------------+
| GPU dvd_video  | 720 × 480      | 2 500          | 326 fps        |
+----------------+----------------+----------------+----------------+

The 2-hour DVD-Video (216 000 frames) renders in about **11 minutes wall
time** on our 4090. The CPU-version equivalent at the same resolution &
N would have taken roughly **9 hours**.

--------------

Simulation kernel (short recap)
-------------------------------

Per particle, per frame:

1. **Spatial hash** — bin all N particles by ``floor(pos / rmax)`` into
   a grid; sort by cell index; ``cp.searchsorted`` gives a
   ``cell_start[cell_id]`` array in one pass.
2. **Candidate pairs** — for each particle, look up its 3×3 neighbour
   cells; concatenate every particle in those cells as a candidate for
   force interaction. O(N) instead of O(N²).
3. **Distance + torus wrap** — apply periodic boundary in both x & y by
   ``d -= S * round(d / S)``.
4. **Force** — fused kernel returns the piecewise force value; multiply
   by direction & species-pair coefficient ``A[typ_i, typ_j]``; scatter
   back to ``acc`` via ``cp.add.at``.
5. **Integrate** — ``vel = vel * friction + acc * dt`` then
   ``pos = (pos + vel * dt) % S``.
6. **Render** — decay buffer, splat 5×5 gaussian per particle, log
   normalise, emit one uint8 grayscale frame to ffmpeg stdin.

``friction = 0.85``, ``dt = 0.6``, ``fs = 0.8``, ``rmax = 88``,
``beta = 0.3``, ``K = 5``. Buffer decays at 0.80 between frames which
gives the visible motion trails.

--------------

Burn pipeline (DVD-Video target)
--------------------------------

Once ``MODE = 'dvd_video'`` produces ``animation.mpg`` (~3.9 GB), four
shell steps cut a bootable DVD:

::

   # 1. author VIDEO_TS/ from the .mpg
   cd $OUT/480_dvd_video
   mkdir -p dvd_root
   VIDEO_FORMAT=NTSC dvdauthor -o dvd_root -T
   VIDEO_FORMAT=NTSC dvdauthor -o dvd_root \
       -f animation.mpg -t

   # 2. stage extras alongside VIDEO_TS/
   mkdir -p dvd_root/EXTRAS
   cp -r path/to/source dvd_root/EXTRAS/

   # 3. pack ISO
   mkisofs -dvd-video -V "PARTICLE_LIFE" \
       -r -J -o disc.iso dvd_root/

   # 4. burn single-session, finalised
   growisofs -dvd-compat -Z /dev/sr0=disc.iso

``-dvd-compat`` finalises our disc on first write — single session, no
multi-session firmware quirks. Tested cheap-DVD-player & Xbox 360
playback 2026-06-29 on Verbatim MCC 03RG20 (Mitsubishi AZO) blanks.

--------------

Why one file
------------

Whole pipeline lives in 337 lines of one Python file. No build system,
no config files, no separate tokenizer/encoder/renderer split. ``cupy``
& ``ffmpeg`` are our only runtime requirements; ``dvdauthor`` /
``mkisofs`` / ``growisofs`` only enter our pipeline at burn time, not
render time. Every render mode is one literal dict near the top — to add
a 4K Blu-ray or a vertical-phone aspect, copy & edit one preset.

The renderer & our authoring shell steps together fit on one printed
page. That printed page is in our v2 disc's ``EXTRAS/py/`` directory.

--------------

Full source
-----------

Self-contained — only stdlib + ``numpy`` + ``cupy``. No project-local
imports. Runtime deps: a ``cupy-cudaXX`` wheel matching our CUDA toolkit
& an ``ffmpeg`` binary on ``$PATH``. ``dvdauthor`` / ``mkisofs`` /
``growisofs`` only enter for the DVD burn path, not for render.

.. container::
   :name: cb2

   ::

      #!/usr/bin/env python3
      """GPU port of render_seed_480_panoramic.py using CuPy — optimised.

      Streams raw uint8 frames directly into ffmpeg's stdin (no intermediate
      raw file on disk) so very long renders are bounded by GPU memory, not
      storage. At 540,000 frames × 720×480 the un-streamed raw bytes would
      have been 186 GB — would not fit on /tmp.

      Two top-level parameter sets at the bottom: one for the
      panoramic-pan-piece (4K-wide aspect, short duration) and one for the
      disc-fill target (TV native aspect, multi-hour duration).  Flip the
      `MODE` constant to select.
      """
      import os
      import subprocess
      import sys
      from pathlib import Path
      import numpy as np
      import cupy as cp


      try:
          cp.cuda.set_allocator(cp.cuda.MemoryAsyncPool().malloc)
      except Exception:
          pass


      # --- which render to do --------------------------------------------------
      # 'panoramic'  : 7680×2160 × 4000 frames  (200 sec, pan-piece, ~2 GB)
      # 'dvd_mp4'    : 720×480 × 540,000 frames (7.5 hours, data DVD-R 4.7 GB)
      # 'dvd_video'  : 720×480 × 216,000 frames (2 hours, real DVD-Video, any player)
      # 'bd_4k_6h'   : 3840×2160 × 432,000 frames (6 hours, fills BD-R 25 GB)
      MODE = 'dvd_video'


      _PRESETS = {
          'panoramic': dict(width=7680, height=2160, N=115000, iters=4000,
                            crf=23, aspect=None),
          # DVD-R MP4 fill (data disc, NOT DVD-Video).  ~1.4 Mbps avg h264.
          'dvd_mp4':   dict(width=720,  height=480,  N=2500,   iters=540_000,
                            bitrate='1400k', maxrate='2500k',
                            bufsize='5000k', aspect='16:9'),
          # Real DVD-Video — bootable in any DVD player + Xbox 360.
          # 720×480 NTSC, ~30 fps, MPEG-2 + silent AC-3 audio, DVD-PS muxer.
          # Uses ffmpeg's `-target ntsc-dvd` which fixes codec/bitrate/GOP/mux
          # to spec.  Output is a .mpg (MPEG-2 Program Stream); dvdauthor
          # turns that into the VIDEO_TS/ folder you burn to disc.
          # DVD-R single layer = 4.7 GB marketed = 4.38 GiB binary.  Target
          # final file at ~4.2 GiB so dvdauthor's IFO/BUP overhead still fits.
          # `-target ntsc-dvd` defaults to 6 Mbps video + 448 kbps audio
          # (5.8 GB total over 2 hr — too big).  Audio is SILENT so 192 kbps
          # is plenty; the savings go to the video budget.
          'dvd_video': dict(width=720, height=480, N=2500,
                            iters=216_000, fps=30,
                            dvd_target='ntsc-dvd',
                            video_bitrate='4400k',
                            audio_bitrate='192k',
                            aspect='16:9',
                            out_ext='.mpg', add_silent_audio=True),
          # BD-R fill (~25 GB) at 3840×2160, 6 hr → ~9.3 Mbps avg, HEVC.
          'bd_4k_6h':  dict(width=3840, height=2160, N=115000, iters=432_000,
                            bitrate='9000k', maxrate='15000k',
                            bufsize='30000k', aspect=None, codec='libx265'),
      }

      PARAMS = _PRESETS[MODE]


      # --- splat kernel (one CUDA launch per frame) ----------------------------
      _SPLAT_KERNEL = cp.RawKernel(r"""
      extern "C" __global__
      void splat(const float* __restrict__ pos,
                 float* __restrict__ buf,
                 const float* __restrict__ kern,
                 const int N, const int H, const int W) {
          int p = blockIdx.x * blockDim.x + threadIdx.x;
          if (p >= N) return;
          float px = pos[p * 2 + 1];
          float py = pos[p * 2];
          int ix = ((int)px % W + W) % W;
          int iy = ((int)py % H + H) % H;
          #pragma unroll
          for (int dy = -2; dy <= 2; ++dy) {
              int ky = ((iy + dy) % H + H) % H;
              #pragma unroll
              for (int dx = -2; dx <= 2; ++dx) {
                  int kx = ((ix + dx) % W + W) % W;
                  float w = kern[(dy + 2) * 5 + (dx + 2)];
                  atomicAdd(&buf[ky * W + kx], w);
              }
          }
      }
      """, 'splat')


      _KERN_HOST = np.array([
          [0.05, 0.20, 0.30, 0.20, 0.05],
          [0.20, 0.60, 0.85, 0.60, 0.20],
          [0.30, 0.85, 1.00, 0.85, 0.30],
          [0.20, 0.60, 0.85, 0.60, 0.20],
          [0.05, 0.20, 0.30, 0.20, 0.05],
      ], dtype=np.float32)
      KERN_GPU = cp.asarray(_KERN_HOST.ravel())

      _DY_OFFSETS = cp.asarray(np.array([-1, -1, -1, 0, 0, 0, 1, 1, 1],
                                        dtype=np.int32))
      _DX_OFFSETS = cp.asarray(np.array([-1, 0, 1, -1, 0, 1, -1, 0, 1],
                                        dtype=np.int32))


      def render_points(buf, pos, S):
          buf *= 0.80
          H, W = int(S[0]), int(S[1])
          threads = 256
          blocks = (pos.shape[0] + threads - 1) // threads
          _SPLAT_KERNEL((blocks,), (threads,),
                        (pos.astype(cp.float32), buf, KERN_GPU,
                         pos.shape[0], H, W))
          buf_max = float(buf.max())
          bright = cp.clip(cp.log1p(buf) / cp.log1p(buf_max + 1e-9) * 255,
                           0, 255).astype(cp.uint8)
          return cp.asnumpy(bright)


      @cp.fuse()
      def _force_value(rn, Aij_pairs, beta):
          rep = rn / beta - 1.0
          att = Aij_pairs * (1.0 - cp.abs(2.0 * rn - 1.0 - beta) / (1.0 - beta))
          return cp.where(rn < beta, rep, cp.where(rn < 1.0, att, 0.0))


      def particle_life_stream(width, height, iters, ffmpeg_stdin,
                                N=2000, K=5, seed=480):
          """Run the simulation; write each uint8 grayscale frame directly
          to the supplied open file/pipe (so encoding can happen in parallel
          with simulation and no large raw file is materialised)."""
          rng = np.random.default_rng(seed)
          S_host = np.array([height, width], dtype=np.float32)
          S = cp.asarray(S_host)

          pos = cp.asarray(rng.random((N, 2)).astype(np.float32)) * S
          vel = cp.zeros((N, 2), dtype=cp.float32)
          typ = cp.asarray(rng.integers(0, K, N).astype(np.int32))
          A = cp.asarray(rng.uniform(-1, 1, (K, K)).astype(np.float32))

          rmax = cp.float32(88.0)
          rmax_val = 88.0
          beta = cp.float32(0.3)
          friction = cp.float32(0.85)
          fs = cp.float32(0.8)
          dt = cp.float32(0.6)

          buf = cp.zeros((height, width), dtype=cp.float32)
          cell_size = rmax_val
          grid_h = int(np.ceil(height / cell_size))
          grid_w = int(np.ceil(width / cell_size))
          n_cells = grid_h * grid_w

          print(f"Rendering {iters} frames at {width}×{height}, N={N} on GPU...",
                flush=True)
          mem_pool = cp.get_default_memory_pool()

          import time
          t_start = time.time()
          last_report = t_start

          for t in range(iters):
              cell_y = (pos[:, 0] / cell_size).astype(cp.int32) % grid_h
              cell_x = (pos[:, 1] / cell_size).astype(cp.int32) % grid_w
              cell_idx = cell_y * grid_w + cell_x

              sort_order = cp.argsort(cell_idx)
              cell_idx_sorted = cell_idx[sort_order]

              all_cells = cp.arange(n_cells, dtype=cp.int32)
              cell_start_all = cp.searchsorted(cell_idx_sorted, all_cells,
                                                side='left')
              cell_end_all = cp.searchsorted(cell_idx_sorted, all_cells,
                                              side='right')
              cell_start = cp.where(cell_end_all > cell_start_all,
                                    cell_start_all, -1)
              cell_end = cell_end_all

              ny = (cell_y[:, None] + _DY_OFFSETS[None, :]) % grid_h
              nx = (cell_x[:, None] + _DX_OFFSETS[None, :]) % grid_w
              neighbour_ids = (ny * grid_w + nx).astype(cp.int32)
              flat_ne = neighbour_ids.reshape(-1)
              starts = cell_start[flat_ne]
              ends = cell_end[flat_ne]
              counts = cp.where(starts >= 0, ends - starts, 0).astype(cp.int32)

              total = int(counts.sum())

              per_particle = counts.reshape(N, 9).sum(axis=1)
              idx_i = cp.repeat(cp.arange(N, dtype=cp.int32), per_particle)

              cum_counts = cp.concatenate(
                  [cp.zeros(1, dtype=cp.int64),
                   cp.cumsum(counts.astype(cp.int64))])
              positions = cp.arange(total, dtype=cp.int64)
              pair_idx = cp.searchsorted(cum_counts[1:], positions,
                                          side='right').astype(cp.int32)
              local_offset = (positions - cum_counts[pair_idx]).astype(cp.int32)
              sort_idx = starts[pair_idx] + local_offset
              idx_j = sort_order[sort_idx].astype(cp.int32)

              keep = idx_i != idx_j
              idx_i = idx_i[keep]
              idx_j = idx_j[keep]

              d = pos[idx_j] - pos[idx_i]
              cp.subtract(d[:, 0], S[0] * cp.round(d[:, 0] / S[0]), out=d[:, 0])
              cp.subtract(d[:, 1], S[1] * cp.round(d[:, 1] / S[1]), out=d[:, 1])
              dist = cp.sqrt((d * d).sum(axis=1))

              within = dist <= rmax
              idx_i = idx_i[within]
              idx_j = idx_j[within]
              d = d[within]
              dist = dist[within]

              rn = dist / rmax
              dirv = d / (dist[:, None] + cp.float32(1e-9))

              Aij_pairs = A[typ[idx_i], typ[idx_j]]
              F_val = _force_value(rn, Aij_pairs, beta)
              force_contrib = fs * F_val[:, None] * dirv

              acc = cp.zeros((N, 2), dtype=cp.float32)
              cp.add.at(acc, idx_i, force_contrib)

              vel = vel * friction + acc * dt
              pos = (pos + vel * dt) % S

              frame_host = render_points(buf, pos, S)
              ffmpeg_stdin.write(frame_host.tobytes())

              if (t & 0x1F) == 0:
                  mem_pool.free_all_blocks()

              # Progress report every ~10 sec wall clock
              now = time.time()
              if now - last_report > 10:
                  elapsed = now - t_start
                  fps = (t + 1) / elapsed
                  eta = (iters - t - 1) / fps
                  print(f"  frame {t + 1}/{iters}  "
                        f"({(t + 1) / iters * 100:5.1f}%)  "
                        f"fps={fps:5.1f}  "
                        f"eta={eta / 60:5.1f} min", flush=True)
                  last_report = now

          print(f"Completed {iters} frames in {time.time() - t_start:.0f} sec")


      if __name__ == "__main__":
          seed = 480
          P = PARAMS
          width, height = P['width'], P['height']
          iters, N = P['iters'], P['N']
          outdir = (f"/home/fox/Downloads/py/output/even_more/particle_life/"
                    f"{seed}_{MODE}")
          Path(outdir).mkdir(parents=True, exist_ok=True)
          fps = P.get('fps', 20)
          out_ext = P.get('out_ext', '.mp4')
          mp4_path = Path(outdir) / f"animation{out_ext}"

          print(f"=== Particle Life Rendering: MODE={MODE} ===")
          print(f"Seed: {seed}   Resolution: {width}×{height}   "
                f"Duration: {iters / fps:.1f} sec ({iters} frames @ {fps} fps)")
          print(f"N: {N}   GPU: "
                f"{cp.cuda.runtime.getDeviceProperties(0)['name'].decode()}")
          print()

          # ffmpeg command construction.  Two distinct flavours: real DVD-Video
          # (-target ntsc-dvd, MPEG-2 PS, silent AC-3 audio track) vs the
          # h264/x265 .mp4 path.  Both stream video frames from stdin so no
          # giant raw file is materialised.
          if P.get('dvd_target'):
              # DVD-Video pipeline — outputs a .mpg you feed to dvdauthor.
              cmd = [
                  'ffmpeg', '-y',
                  '-f', 'rawvideo', '-pix_fmt', 'gray',
                  '-s', f'{width}x{height}',
                  '-framerate', str(fps),
                  '-i', '-',
              ]
              if P.get('add_silent_audio'):
                  cmd += ['-f', 'lavfi',
                          '-i', 'anullsrc=channel_layout=stereo:sample_rate=48000']
              cmd += ['-target', P['dvd_target']]
              # Override video bitrate that -target ntsc-dvd would default to.
              # The DVD-Video max combined bitrate is 10.08 Mbps; we deliberately
              # stay well under so we hit a specific final file size.
              if P.get('video_bitrate'):
                  cmd += ['-b:v', P['video_bitrate']]
              if P.get('audio_bitrate'):
                  cmd += ['-b:a', P['audio_bitrate']]
              if P.get('aspect'):
                  cmd += ['-aspect', P['aspect']]
              cmd += ['-shortest', str(mp4_path)]
          else:
              cmd = [
                  'ffmpeg', '-y',
                  '-f', 'rawvideo', '-pix_fmt', 'gray',
                  '-s', f'{width}x{height}',
                  '-framerate', str(fps),
                  '-i', '-',
                  '-c:v', P.get('codec', 'libx264'),
                  '-pix_fmt', 'yuv420p',
                  '-preset', 'medium',
              ]
              if P.get('bitrate'):
                  cmd += ['-b:v', P['bitrate']]
                  if P.get('maxrate'):
                      cmd += ['-maxrate', P['maxrate'], '-bufsize', P['bufsize']]
              elif P.get('crf') is not None:
                  cmd += ['-crf', str(P['crf'])]
              if P.get('aspect'):
                  cmd += ['-aspect', P['aspect']]
              cmd += [str(mp4_path)]

          print(f"ffmpeg cmd: {' '.join(cmd)}")
          print()
          ff = subprocess.Popen(cmd, stdin=subprocess.PIPE,
                                stderr=subprocess.DEVNULL)
          try:
              particle_life_stream(width, height, iters, ff.stdin,
                                   N=N, seed=seed)
          finally:
              ff.stdin.close()
              ff.wait()

          sz = mp4_path.stat().st_size if mp4_path.exists() else 0
          print(f"\nMP4 saved: {mp4_path} ({sz / 1024**3:.2f} GB, "
                f"{sz / 1024**2:.1f} MB)")
          print(f"\n=== Complete ===")

--------------

Related pages
-------------

-  `3-GPU
   mesh <https://www.foxhop.net/053ba7da-6277-11f1-82fc-040140774501/gpu-mesh>`__
   — hardware context for the 4090 this script targets.

--------------

.. container::

   *Updated 2026-06-29.*

fox@neoblanka:~/git/uncloseai-cli$

   '-preset', 'medium', ] if P.get('bitrate'): cmd += ['-b:v',
   P['bitrate']] if P.get('maxrate'): cmd += ['-maxrate', P['maxrate'],
   '-bufsize', P['bufsize']] elif P.get('crf') is not None: cmd +=
   ['-crf', str(P['crf'])] if P.get('aspect'): cmd += ['-aspect',
   P['aspect']] cmd += [str(mp4_path)]

   print(f"ffmpeg cmd: {' '.join(cmd)}") print() ff =
   subprocess.Popen(cmd, stdin=subprocess.PIPE,
   stderr=subprocess.DEVNULL) try: particle_life_stream(width, height,
   iters, ff.stdin, N=N, seed=seed) finally: ff.stdin.close() ff.wait()

   sz = mp4_path.stat().st_size if mp4_path.exists() else 0 print(f"nMP4
   saved: {mp4_path} ({sz / 1024\*\ *3:.2f} GB, " f"{sz /
   1024*\ \*2:.1f} MB)") print(f"n=== Complete ===")

--------------

.. _related-pages-1:

Related pages
-------------

-  `3-GPU
   mesh <https://www.foxhop.net/053ba7da-6277-11f1-82fc-040140774501/gpu-mesh>`__
   — hardware context for the 4090 this script targets.

--------------

| **Source:**
  https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders
| **Snapshot:** 2026-06-30T04:02:31Z
| **Generator:** Remarkbox ``dba4024``
