If you are a programmer, scrapy[0] will be a good bet. It can handle robots.txt, request throttling by ip, request throttling by domain, proxies and all other common nitty-gritties of crawling. The only drawback is handling pure javascript sites. We have to manually dig into the api or add a headless browser invocation within the scrapy handler.
Scrapy also has the ability to pause and restart crawls [1], run the crawlers distributed [2] etc. It is my goto option.
Haven't tried this[0] yet, but Scrapy should be able to handle JavaScript sites with the JavaScript rendering service Splash[1]. scrapy-splash[2] is the plugin to integrate Scrapy and Splash.
[0] https://blog.scrapinghub.com/2015/03/02/handling-javascript-...