Skip to main content

Posts

Showing posts from November, 2015

OpenCL - Part 2

CL_MEM_COPY_HOST_PTR: Copies memory from the host to the device. In my case it was taking about 15ms. Instead I use a zero buffer operation which saves that time. Also, creating buffers with zero buffer operations makes a big difference in terms of performance. CL_MEM_USE_HOST_PTR: In my case, this is a preferred choice as the memory will use the memory referenced by the host as the storage. After some reading, this would be a better option when using GPU as memory will be allocated in the Pinned memory. As a result, we get the following values (compared to the previous post) 1. Creating Buffers: From 6ms to 9microSec 2. Writing Buffers: From 15ms to 6ms 3. Reading results: 6ms (enqueueReadBuffer). This is still an issue. That's already impressive. in the end, the result varies from 10 to 14ms (compared to the previous 32ms). Doubling the amount of data keeps the same ratio between both versions, so there's still no advantage on the parallel version. A further ...