ProcessCache

ProcessCache is a system for automatically memoizing the work of multi-process Linux programs. It transparently caches results and determines when cached outputs can be reused versus recomputed, generalizing ideas from build systems to arbitrary programs such as shell scripts and data workflows.

It supports unmodified Linux binaries by tracing system calls via `ptrace` to infer program inputs. To mitigate ptrace’s performance constraints, I designed and implemented a custom asynchronous runtime in Rust to efficiently coordinate tracing across processes.

In practice, ProcessCache enables incremental computation for existing programs, accelerating workloads by up to 65×.

I served as technical lead, driving the system design, implementation, and overall project direction. The open-source code for the project can be found here.

DetTrace

DetTrace is a userspace system for enforcing deterministic execution of Linux programs. It provides a container abstraction where execution is a pure function of the initial filesystem state, enabling reproducibility for builds, data pipelines, and distributed workloads.

We applied DetTrace to over 12,000 Debian package builds (800M+ lines of code), as well as bioinformatics and machine learning workflows, achieving reproducibility without requiring changes to hardware, OS, or applications.

My primary contribution was redesigning the scheduler to allow parallel execution in system-call-free regions while preserving determinism, reducing overhead to as low as 2% on compute-bound workloads.

The open-source code can be found here, and our ASPLOS paper can be found here.

VMware Research (D3log)

During my internship with the VMware Research Group, I worked on D3log, a distributed Datalog engine implemented in Rust on top of a timely dataflow model.

My main contribution was extending the system to support incremental, dynamic reconfiguration of cluster nodes, improving fault tolerance and operational flexibility in distributed environments.

The open-source code can be found here.

Publications

Reproducible Containers

Omar S. Navarro Leija, Kelly Shiptoski, Ryan Scott, Baojun Wang, Nicholas Renner, Ryan Newton, and Joseph Devietti.

ASPLOS 2020 (International Conference on Architectural Support for Programming Languages and Operating Systems)