Abstract: The performance and energy efficiency offered by heterogeneous systems are highly useful for modern C++ applications, but the technological variety demands adequate portability and programmability. Initiatives such as Intel oneAPI facilitate the exploitation of Intel CPUs and GPUs, but not NVIDIA GPUs, which are present in
systems of all kinds and are necessarily leveraged by CUDA technology. Frequently, only GPUs are used, leaving the CPU for management tasks, with the consequent loss of energy and system utilization. In this work, the CoexecutorRuntime system design and API are extended to transparently integrate backends of diverse technologies, unifying offloading mechanisms under a consistent co-execution API and scheduling runtime. Moreover, CPU-GPU co-execution of hybrid technologies is enabled to ensure performance portability. Experimental results show performance improvements for all programs studied, achieving average efficiencies of 0.91 and speedups of 1.31 over using only the GPU.