> **Source:** [https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders](https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders) > **Snapshot:** 2026-06-30T04:05:37Z > **Generator:** Remarkbox `dba4024` > > *This is a thread snapshot. The living document lives at the source URI above — it may have been edited, extended, or replied-to since.*

![Scan for living source](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMQAAADEAQAAAADo3bPGAAAB/klEQVR42t2YQW4EQQgDUT7A/3/JDwguei6R5hK1D7uzyhx6pXYAY8xGvzwVH/pNRGRkRVTVnGR16qh+4u35zzfROXdnVkYLZzCLo/s4kan7B2g+tQGlB0dJ2wQOVqUNZ+4WyMDNJ0w4U4yJpaZEk7YhQHvqo3T9eSx80wPhJh4RHLoN4+/Xp/RqtZBoINQqR966T/oGBU60I2/05Sn/xKQUtiOeHoTSxUmVFIvOruPofuqiblV5bLpDdwqs+SvRz8I3JE3C00eBsh15Yx6AVALN9vSp6KyaDBmQN2SuDbwWm9WrSQdtP1n6Ry+4oDSqSmngtRSH0SY6oHc6MfBN8rzjdPImDWXe3edbM3fASPjg0bckou1OEYE57phzICltEFwYUiFDn6o6SwJm0VHX63mTWPMmiTRrp4PXVGWHgyIr/KvD7+jfjzhimieu6/FIdhZAr6RjHfrGJOh+jDw23jC31/DSP8eKTqUs8Wh0q/QkTxZBGnF//kStwonQxe6Ahbuet3o8CPIpqSbA+75XhXmSppalpe7HE4jATh3m3Lqr+7qD183Io287hyz7AsYj15DUziHL/rN9k4lfrLT4g8SGYnp6pziy6tlP1/RoqgZTO0z7dsEzBUWdujw4LNzYq2Lfdugb+zbt06xbtaPIs28fW6WtRMkrAw++63exX6FOrqlbHKOOAAAAAElFTkSuQmCC)

# Particle Life on GPU CuPy port for long-form renders

**Code:** `render_seed_480_panoramic_gpu.py` (337 lines, single file)\ **Engine:** CuPy + one custom CUDA splat kernel\ **Host:** `ai.foxhop.net` (RTX 4090, sm_89, 24 GB)\ **Use:** render seed-480 particle-life to MP4, MPEG-2 (DVD-Video), or HEVC for BD-R

------------------------------------------------------------------------

------------------------------------------------------------------------ ## What this script renders Particle Life is a 2-D agent toy: N particles, K species, an asymmetric K×K force matrix, periodic torus boundary on both axes. One seed value fully determines the matrix, initial positions, & species assignments, so a given seed gives the same evolving universe every time on every GPU bit-for-bit. We picked **seed 480** because its matrix produces a visually active universe that never stalls into a fixed point or limit cycle for hours of simulation time. Single CuPy file drives four output targets via one `MODE` switch. Each preset is a sealed dict at the top of our script --- flip the constant, run, get our target file. No CLI args needed. ------------------------------------------------------------------------ ## Output modes mode resolution N particles duration target file ----------- ------------- ------------- ---------- ------------------------------ panoramic 7680 × 2160 115 000 200 s h264 .mp4 (\~2 GB) dvd_mp4 720 × 480 2 500 7.5 h h264 .mp4 on data DVD-R dvd_video 720 × 480 2 500 2 h MPEG-2 .mpg → real DVD-Video bd_4k_6h 3840 × 2160 115 000 6 h HEVC .mp4 on BD-R `dvd_video` is our most-tested target: ffmpeg `-target ntsc-dvd` produces a spec-compliant MPEG-2 Program Stream with a silent AC-3 audio track that `dvdauthor` turns into a `VIDEO_TS/` folder. That folder gets `mkisofs -dvd-video` packed into an ISO & `growisofs -dvd-compat -Z` burnt to a Verbatim DVD-R blank. Bootable on cheap DVD players & Xbox 360, verified on both 2026-06-29. ------------------------------------------------------------------------ ## The four optimisations that mattered Naïve CuPy port of our CPU renderer hit OOM on our 4090 at N=22 000 (peak 19.4 GB of 24 GB). Four changes brought peak down to 2.8 GB at the same N & raised throughput an order of magnitude: **1. Custom CUDA splat kernel --- one launch per frame.** The 5×5 soft-particle splat was 25 `np.add.at` ops in our CPU code, which became 25 sequential `cp.add.at` kernel launches per frame. We replaced our entire splat with one `cp.RawKernel` that takes `(pos, buf, kern, N, H, W)` & atomically adds the 25 contributions for every particle inside one launch. The whole 720×480 frame splat now costs one kernel dispatch instead of 25. **2. \`\`cp.fuse\`\` on the force formula.** The piecewise force `f(rn)` --- repulsive below β, attractive between β & 1, zero beyond --- was three array passes in our pure-CuPy version. Wrapping it in `@cp.fuse()` lets CuPy generate one fused elementwise CUDA kernel that does the whole branch on one read of `rn`, `Aij_pairs` and one write to the output. **3. Raw bytes straight into ffmpeg\'s stdin.** Our CPU script wrote PNGs to disk & let ffmpeg read them back. At 540 000 frames × 720×480 grayscale that would have been 186 GB of raw data on `/tmp` --- wouldn\'t fit. We open ffmpeg as a subprocess with `stdin=PIPE`, send each frame as `frame.tobytes()`, & ffmpeg encodes in parallel with our simulation. Disk usage stays bounded at our final output size (\~4 GB for the 2-hour DVD-Video). **4. CUDA async memory pool + periodic \`\`free_all_blocks\`\`.** `cp.cuda.set_allocator(cp.cuda.MemoryAsyncPool().malloc)` lets the runtime overlap allocation with kernels. `mem_pool.free_all_blocks()` fires every 32 frames so peak allocation doesn\'t climb monotonically across our 216 000-frame render. The two together kept us under 3 GB peak across our entire 2-hour render. ------------------------------------------------------------------------ ## Benchmarks Wall-clock fps on RTX 4090, seed 480, our four presets: +-----------------+-----------------+-----------------+-----------------+ | mode | resolution | N | wall fps | +=================+=================+=================+=================+ | CPU baseline | 2880 × 1080 | 22 000 | > 6 -- 7 fps | | (panoramic) | | | | +-----------------+-----------------+-----------------+-----------------+ | GPU panoramic | 7680 × 2160 | 115 000 | > 33 fps | +-----------------+-----------------+-----------------+-----------------+ | GPU dvd_video | 720 × 480 | 2 500 | 326 fps | +-----------------+-----------------+-----------------+-----------------+ The 2-hour DVD-Video (216 000 frames) renders in about **11 minutes wall time** on our 4090. The CPU-version equivalent at the same resolution & N would have taken roughly **9 hours**. ------------------------------------------------------------------------ ## Simulation kernel (short recap) Per particle, per frame: 1. **Spatial hash** --- bin all N particles by `floor(pos / rmax)` into a grid; sort by cell index; `cp.searchsorted` gives a `cell_start[cell_id]` array in one pass. 2. **Candidate pairs** --- for each particle, look up its 3×3 neighbour cells; concatenate every particle in those cells as a candidate for force interaction. O(N) instead of O(N²). 3. **Distance + torus wrap** --- apply periodic boundary in both x & y by `d -= S * round(d / S)`. 4. **Force** --- fused kernel returns the piecewise force value; multiply by direction & species-pair coefficient `A[typ_i, typ_j]`; scatter back to `acc` via `cp.add.at`. 5. **Integrate** --- `vel = vel * friction + acc * dt` then `pos = (pos + vel * dt) % S`. 6. **Render** --- decay buffer, splat 5×5 gaussian per particle, log normalise, emit one uint8 grayscale frame to ffmpeg stdin. `friction = 0.85`, `dt = 0.6`, `fs = 0.8`, `rmax = 88`, `beta = 0.3`, `K = 5`. Buffer decays at 0.80 between frames which gives the visible motion trails. ------------------------------------------------------------------------ ## Burn pipeline (DVD-Video target) Once `MODE = 'dvd_video'` produces `animation.mpg` (\~3.9 GB), four shell steps cut a bootable DVD: # 1. author VIDEO_TS/ from the .mpg cd $OUT/480_dvd_video mkdir -p dvd_root VIDEO_FORMAT=NTSC dvdauthor -o dvd_root -T VIDEO_FORMAT=NTSC dvdauthor -o dvd_root \ -f animation.mpg -t # 2. stage extras alongside VIDEO_TS/ mkdir -p dvd_root/EXTRAS cp -r path/to/source dvd_root/EXTRAS/ # 3. pack ISO mkisofs -dvd-video -V "PARTICLE_LIFE" \ -r -J -o disc.iso dvd_root/ # 4. burn single-session, finalised growisofs -dvd-compat -Z /dev/sr0=disc.iso `-dvd-compat` finalises our disc on first write --- single session, no multi-session firmware quirks. Tested cheap-DVD-player & Xbox 360 playback 2026-06-29 on Verbatim MCC 03RG20 (Mitsubishi AZO) blanks. ------------------------------------------------------------------------ ## Why one file Whole pipeline lives in 337 lines of one Python file. No build system, no config files, no separate tokenizer/encoder/renderer split. `cupy` & `ffmpeg` are our only runtime requirements; `dvdauthor` / `mkisofs` / `growisofs` only enter our pipeline at burn time, not render time. Every render mode is one literal dict near the top --- to add a 4K Blu-ray or a vertical-phone aspect, copy & edit one preset. The renderer & our authoring shell steps together fit on one printed page. That printed page is in our v2 disc\'s `EXTRAS/py/` directory. ------------------------------------------------------------------------ ## Full source Self-contained --- only stdlib + `numpy` + `cupy`. No project-local imports. Runtime deps: a `cupy-cudaXX` wheel matching our CUDA toolkit & an `ffmpeg` binary on `$PATH`. `dvdauthor` / `mkisofs` / `growisofs` only enter for the DVD burn path, not for render. ::: {#cb2} #!/usr/bin/env python3 """GPU port of render_seed_480_panoramic.py using CuPy — optimised. Streams raw uint8 frames directly into ffmpeg's stdin (no intermediate raw file on disk) so very long renders are bounded by GPU memory, not storage. At 540,000 frames × 720×480 the un-streamed raw bytes would have been 186 GB — would not fit on /tmp. Two top-level parameter sets at the bottom: one for the panoramic-pan-piece (4K-wide aspect, short duration) and one for the disc-fill target (TV native aspect, multi-hour duration). Flip the `MODE` constant to select. """ import os import subprocess import sys from pathlib import Path import numpy as np import cupy as cp try: cp.cuda.set_allocator(cp.cuda.MemoryAsyncPool().malloc) except Exception: pass # --- which render to do -------------------------------------------------- # 'panoramic' : 7680×2160 × 4000 frames (200 sec, pan-piece, ~2 GB) # 'dvd_mp4' : 720×480 × 540,000 frames (7.5 hours, data DVD-R 4.7 GB) # 'dvd_video' : 720×480 × 216,000 frames (2 hours, real DVD-Video, any player) # 'bd_4k_6h' : 3840×2160 × 432,000 frames (6 hours, fills BD-R 25 GB) MODE = 'dvd_video' _PRESETS = { 'panoramic': dict(width=7680, height=2160, N=115000, iters=4000, crf=23, aspect=None), # DVD-R MP4 fill (data disc, NOT DVD-Video). ~1.4 Mbps avg h264. 'dvd_mp4': dict(width=720, height=480, N=2500, iters=540_000, bitrate='1400k', maxrate='2500k', bufsize='5000k', aspect='16:9'), # Real DVD-Video — bootable in any DVD player + Xbox 360. # 720×480 NTSC, ~30 fps, MPEG-2 + silent AC-3 audio, DVD-PS muxer. # Uses ffmpeg's `-target ntsc-dvd` which fixes codec/bitrate/GOP/mux # to spec. Output is a .mpg (MPEG-2 Program Stream); dvdauthor # turns that into the VIDEO_TS/ folder you burn to disc. # DVD-R single layer = 4.7 GB marketed = 4.38 GiB binary. Target # final file at ~4.2 GiB so dvdauthor's IFO/BUP overhead still fits. # `-target ntsc-dvd` defaults to 6 Mbps video + 448 kbps audio # (5.8 GB total over 2 hr — too big). Audio is SILENT so 192 kbps # is plenty; the savings go to the video budget. 'dvd_video': dict(width=720, height=480, N=2500, iters=216_000, fps=30, dvd_target='ntsc-dvd', video_bitrate='4400k', audio_bitrate='192k', aspect='16:9', out_ext='.mpg', add_silent_audio=True), # BD-R fill (~25 GB) at 3840×2160, 6 hr → ~9.3 Mbps avg, HEVC. 'bd_4k_6h': dict(width=3840, height=2160, N=115000, iters=432_000, bitrate='9000k', maxrate='15000k', bufsize='30000k', aspect=None, codec='libx265'), } PARAMS = _PRESETS[MODE] # --- splat kernel (one CUDA launch per frame) ---------------------------- _SPLAT_KERNEL = cp.RawKernel(r""" extern "C" __global__ void splat(const float* __restrict__ pos, float* __restrict__ buf, const float* __restrict__ kern, const int N, const int H, const int W) { int p = blockIdx.x * blockDim.x + threadIdx.x; if (p >= N) return; float px = pos[p * 2 + 1]; float py = pos[p * 2]; int ix = ((int)px % W + W) % W; int iy = ((int)py % H + H) % H; #pragma unroll for (int dy = -2; dy <= 2; ++dy) { int ky = ((iy + dy) % H + H) % H; #pragma unroll for (int dx = -2; dx <= 2; ++dx) { int kx = ((ix + dx) % W + W) % W; float w = kern[(dy + 2) * 5 + (dx + 2)]; atomicAdd(&buf[ky * W + kx], w); } } } """, 'splat') _KERN_HOST = np.array([ [0.05, 0.20, 0.30, 0.20, 0.05], [0.20, 0.60, 0.85, 0.60, 0.20], [0.30, 0.85, 1.00, 0.85, 0.30], [0.20, 0.60, 0.85, 0.60, 0.20], [0.05, 0.20, 0.30, 0.20, 0.05], ], dtype=np.float32) KERN_GPU = cp.asarray(_KERN_HOST.ravel()) _DY_OFFSETS = cp.asarray(np.array([-1, -1, -1, 0, 0, 0, 1, 1, 1], dtype=np.int32)) _DX_OFFSETS = cp.asarray(np.array([-1, 0, 1, -1, 0, 1, -1, 0, 1], dtype=np.int32)) def render_points(buf, pos, S): buf *= 0.80 H, W = int(S[0]), int(S[1]) threads = 256 blocks = (pos.shape[0] + threads - 1) // threads _SPLAT_KERNEL((blocks,), (threads,), (pos.astype(cp.float32), buf, KERN_GPU, pos.shape[0], H, W)) buf_max = float(buf.max()) bright = cp.clip(cp.log1p(buf) / cp.log1p(buf_max + 1e-9) * 255, 0, 255).astype(cp.uint8) return cp.asnumpy(bright) @cp.fuse() def _force_value(rn, Aij_pairs, beta): rep = rn / beta - 1.0 att = Aij_pairs * (1.0 - cp.abs(2.0 * rn - 1.0 - beta) / (1.0 - beta)) return cp.where(rn < beta, rep, cp.where(rn < 1.0, att, 0.0)) def particle_life_stream(width, height, iters, ffmpeg_stdin, N=2000, K=5, seed=480): """Run the simulation; write each uint8 grayscale frame directly to the supplied open file/pipe (so encoding can happen in parallel with simulation and no large raw file is materialised).""" rng = np.random.default_rng(seed) S_host = np.array([height, width], dtype=np.float32) S = cp.asarray(S_host) pos = cp.asarray(rng.random((N, 2)).astype(np.float32)) * S vel = cp.zeros((N, 2), dtype=cp.float32) typ = cp.asarray(rng.integers(0, K, N).astype(np.int32)) A = cp.asarray(rng.uniform(-1, 1, (K, K)).astype(np.float32)) rmax = cp.float32(88.0) rmax_val = 88.0 beta = cp.float32(0.3) friction = cp.float32(0.85) fs = cp.float32(0.8) dt = cp.float32(0.6) buf = cp.zeros((height, width), dtype=cp.float32) cell_size = rmax_val grid_h = int(np.ceil(height / cell_size)) grid_w = int(np.ceil(width / cell_size)) n_cells = grid_h * grid_w print(f"Rendering {iters} frames at {width}×{height}, N={N} on GPU...", flush=True) mem_pool = cp.get_default_memory_pool() import time t_start = time.time() last_report = t_start for t in range(iters): cell_y = (pos[:, 0] / cell_size).astype(cp.int32) % grid_h cell_x = (pos[:, 1] / cell_size).astype(cp.int32) % grid_w cell_idx = cell_y * grid_w + cell_x sort_order = cp.argsort(cell_idx) cell_idx_sorted = cell_idx[sort_order] all_cells = cp.arange(n_cells, dtype=cp.int32) cell_start_all = cp.searchsorted(cell_idx_sorted, all_cells, side='left') cell_end_all = cp.searchsorted(cell_idx_sorted, all_cells, side='right') cell_start = cp.where(cell_end_all > cell_start_all, cell_start_all, -1) cell_end = cell_end_all ny = (cell_y[:, None] + _DY_OFFSETS[None, :]) % grid_h nx = (cell_x[:, None] + _DX_OFFSETS[None, :]) % grid_w neighbour_ids = (ny * grid_w + nx).astype(cp.int32) flat_ne = neighbour_ids.reshape(-1) starts = cell_start[flat_ne] ends = cell_end[flat_ne] counts = cp.where(starts >= 0, ends - starts, 0).astype(cp.int32) total = int(counts.sum()) per_particle = counts.reshape(N, 9).sum(axis=1) idx_i = cp.repeat(cp.arange(N, dtype=cp.int32), per_particle) cum_counts = cp.concatenate( [cp.zeros(1, dtype=cp.int64), cp.cumsum(counts.astype(cp.int64))]) positions = cp.arange(total, dtype=cp.int64) pair_idx = cp.searchsorted(cum_counts[1:], positions, side='right').astype(cp.int32) local_offset = (positions - cum_counts[pair_idx]).astype(cp.int32) sort_idx = starts[pair_idx] + local_offset idx_j = sort_order[sort_idx].astype(cp.int32) keep = idx_i != idx_j idx_i = idx_i[keep] idx_j = idx_j[keep] d = pos[idx_j] - pos[idx_i] cp.subtract(d[:, 0], S[0] * cp.round(d[:, 0] / S[0]), out=d[:, 0]) cp.subtract(d[:, 1], S[1] * cp.round(d[:, 1] / S[1]), out=d[:, 1]) dist = cp.sqrt((d * d).sum(axis=1)) within = dist <= rmax idx_i = idx_i[within] idx_j = idx_j[within] d = d[within] dist = dist[within] rn = dist / rmax dirv = d / (dist[:, None] + cp.float32(1e-9)) Aij_pairs = A[typ[idx_i], typ[idx_j]] F_val = _force_value(rn, Aij_pairs, beta) force_contrib = fs * F_val[:, None] * dirv acc = cp.zeros((N, 2), dtype=cp.float32) cp.add.at(acc, idx_i, force_contrib) vel = vel * friction + acc * dt pos = (pos + vel * dt) % S frame_host = render_points(buf, pos, S) ffmpeg_stdin.write(frame_host.tobytes()) if (t & 0x1F) == 0: mem_pool.free_all_blocks() # Progress report every ~10 sec wall clock now = time.time() if now - last_report > 10: elapsed = now - t_start fps = (t + 1) / elapsed eta = (iters - t - 1) / fps print(f" frame {t + 1}/{iters} " f"({(t + 1) / iters * 100:5.1f}%) " f"fps={fps:5.1f} " f"eta={eta / 60:5.1f} min", flush=True) last_report = now print(f"Completed {iters} frames in {time.time() - t_start:.0f} sec") if __name__ == "__main__": seed = 480 P = PARAMS width, height = P['width'], P['height'] iters, N = P['iters'], P['N'] outdir = (f"/home/fox/Downloads/py/output/even_more/particle_life/" f"{seed}_{MODE}") Path(outdir).mkdir(parents=True, exist_ok=True) fps = P.get('fps', 20) out_ext = P.get('out_ext', '.mp4') mp4_path = Path(outdir) / f"animation{out_ext}" print(f"=== Particle Life Rendering: MODE={MODE} ===") print(f"Seed: {seed} Resolution: {width}×{height} " f"Duration: {iters / fps:.1f} sec ({iters} frames @ {fps} fps)") print(f"N: {N} GPU: " f"{cp.cuda.runtime.getDeviceProperties(0)['name'].decode()}") print() # ffmpeg command construction. Two distinct flavours: real DVD-Video # (-target ntsc-dvd, MPEG-2 PS, silent AC-3 audio track) vs the # h264/x265 .mp4 path. Both stream video frames from stdin so no # giant raw file is materialised. if P.get('dvd_target'): # DVD-Video pipeline — outputs a .mpg you feed to dvdauthor. cmd = [ 'ffmpeg', '-y', '-f', 'rawvideo', '-pix_fmt', 'gray', '-s', f'{width}x{height}', '-framerate', str(fps), '-i', '-', ] if P.get('add_silent_audio'): cmd += ['-f', 'lavfi', '-i', 'anullsrc=channel_layout=stereo:sample_rate=48000'] cmd += ['-target', P['dvd_target']] # Override video bitrate that -target ntsc-dvd would default to. # The DVD-Video max combined bitrate is 10.08 Mbps; we deliberately # stay well under so we hit a specific final file size. if P.get('video_bitrate'): cmd += ['-b:v', P['video_bitrate']] if P.get('audio_bitrate'): cmd += ['-b:a', P['audio_bitrate']] if P.get('aspect'): cmd += ['-aspect', P['aspect']] cmd += ['-shortest', str(mp4_path)] else: cmd = [ 'ffmpeg', '-y', '-f', 'rawvideo', '-pix_fmt', 'gray', '-s', f'{width}x{height}', '-framerate', str(fps), '-i', '-', '-c:v', P.get('codec', 'libx264'), '-pix_fmt', 'yuv420p', '-preset', 'medium', ] if P.get('bitrate'): cmd += ['-b:v', P['bitrate']] if P.get('maxrate'): cmd += ['-maxrate', P['maxrate'], '-bufsize', P['bufsize']] elif P.get('crf') is not None: cmd += ['-crf', str(P['crf'])] if P.get('aspect'): cmd += ['-aspect', P['aspect']] cmd += [str(mp4_path)] print(f"ffmpeg cmd: {' '.join(cmd)}") print() ff = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.DEVNULL) try: particle_life_stream(width, height, iters, ff.stdin, N=N, seed=seed) finally: ff.stdin.close() ff.wait() sz = mp4_path.stat().st_size if mp4_path.exists() else 0 print(f"\nMP4 saved: {mp4_path} ({sz / 1024**3:.2f} GB, " f"{sz / 1024**2:.1f} MB)") print(f"\n=== Complete ===") ::: ------------------------------------------------------------------------ ## Related pages - [3-GPU mesh](https://www.foxhop.net/053ba7da-6277-11f1-82fc-040140774501/gpu-mesh){rel="nofollow" target="_blank"} --- hardware context for the 4090 this script targets. ------------------------------------------------------------------------

*Updated 2026-06-29.*

:\~/git/uncloseai-cli\$ > \'-preset\', \'medium\', \] if P.get(\'bitrate\'): cmd += \[\'-b:v\', > P\[\'bitrate\'\]\] if P.get(\'maxrate\'): cmd += \[\'-maxrate\', > P\[\'maxrate\'\], \'-bufsize\', P\[\'bufsize\'\]\] elif P.get(\'crf\') > is not None: cmd += \[\'-crf\', str(P\[\'crf\'\])\] if > P.get(\'aspect\'): cmd += \[\'-aspect\', P\[\'aspect\'\]\] cmd += > \[str(mp4_path)\] > > print(f\"ffmpeg cmd: {\' \'.join(cmd)}\") print() ff = > subprocess.Popen(cmd, stdin=subprocess.PIPE, > stderr=subprocess.DEVNULL) try: particle_life_stream(width, height, > iters, ff.stdin, N=N, seed=seed) finally: ff.stdin.close() ff.wait() > > sz = mp4_path.stat().st_size if mp4_path.exists() else 0 print(f\"nMP4 > saved: {mp4_path} ({sz / 1024\**3:.2f} GB, \" f\"{sz / 1024*\*2:.1f} > MB)\") print(f\"n=== Complete ===\") ------------------------------------------------------------------------ ## Related pages - [3-GPU mesh](https://www.foxhop.net/053ba7da-6277-11f1-82fc-040140774501/gpu-mesh){rel="nofollow" target="_blank"} --- hardware context for the 4090 this script targets. --- **Source:** [https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders](https://foxhop.net/0b6d520b-7414-11f1-9565-040140774501/particle-life-on-gpu-cupy-port-for-long-form-renders) **Snapshot:** 2026-06-30T04:05:37Z **Generator:** Remarkbox `dba4024`