I used to work on a hosting company and proper use of dd was important when copying data from LVM in a Xen host; unfortunately I seem to have forgotten most of it.

Some pointers: dd has oflags; may be oflags=direct is faster?

You can also use oflags=sparse and sometimes save space by creating a sparse file.

> may be oflags=direct is faster?

oflag=direct does direct I/O => copied data won't go into the buffercache.

On Linux search for 'O_DIRECT' in the open(2) manpage.

oflag (for output) and iflag (for input) are indeed useful. During/after a massive non-'direct' copy a system running other processes which benefit from data in the buffercache may crawl if the system, while copying, replaces some of it by the copied data, then has to re-read it.

In other terms this seems adequate when copying data which will not be, after the copy, soon read by any process. A raw filesystem image is a good candidate.

As usual YMMV. If most of the data to be copied already is in the buffercache or if it will occupy some unused part of the core memory... such optimization is useless. However in most cases (on most adequately-dimensioned non-idle systems) 'O_DIRECT' induces less systemwide load than cp, cat, pv(...) when copying a large set of data if most of it will not be, then, immediately read by anything.

Other tools (cp, cat, pv...) just cannot easily work in 'O_DIRECT' mode. Using some trick to enable it thank to a local version of openat() and LD_PRELOAD (which calls openat in O_DIRECT mode), albeit possible, isn't realistic in most contexts.

$ cd ~/tmp

$ strace -e openat dd if=/etc/hosts of=useless.tmp count=1 >& nodirect

$ strace -e openat dd if=/etc/hosts of=useless.tmp iflag=direct oflag=direct count=1 >& direct

$ diff direct nodirect

5,6c5,6

< openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_DIRECT) = 3

< openat(AT_FDCWD, "useless.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT, 0666) = 3

---

> openat(AT_FDCWD, "/etc/hosts", O_RDONLY) = 3

> openat(AT_FDCWD, "useless.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

Moreover 'dd' has many options without equivalent in most other readily available tools.