> A user’s decision to move data to another service should not result in any loss of transparency or control over that data.
> It is worth noting that the Data Transfer Project doesn’t include any automated deletion architecture. Once a user has verified that the desired data is migrated, they would have to delete their data from their original service using that service’s deletion tool if they wanted the data deleted.
This project has copy, not move semantics. Therefore, in contrast to the stated purpose of allowing users to control their data, it actually has the opposite consequence of making it simpler to spread users' data around. Without a delete capability, the bias is towards multiple copies of user data.
This project normalizes web scraping to export data from non-participating APIs that project partners benefit from asymmetrically by establishing this as an open-source effort. In other words, API providers that do not provide export tools will nonetheless be subject to DTP adapters that exfiltrate data and send it to the (no doubt excellent) DTP importers maintained by DTP partners. This has the effect of creating a partial vacuum, sucking data from non-participants into participants' systems.
The economics of maintaining a high-volume bidirectional synchronization pipeline between DTP partners guarantees that these toy DTP adapters will not be the technology used to copy data between DTP partners, but rather, a dedicated pathway will be established instead. In other words, the public open-source DTP effort could be understood as a facade designed to create a plausible reason for why DTP partners have cross-connected their systems.
TLDR:
- Copy semantics are counterproductive to the goal of providing user control of their data.
- The approach of using existing APIs to scrape data from non-participating vendors is a priori hostile.
- Economics dictate that the lowest cost option for providing bidirectional synchronization between vendors involve dedicated links and specialized transport schemes that DTP project itself does not provide equally.
There is some merit to providing abstract representations of common data formats -- look at EDI, for instance. I'd welcome someone from the project stopping by to explain away my concerns.
I wanted to provide my thinking on some of these very valid wories,
Re: Copy vs. Move: This was a conscious choice that I think has a solid backing in two things: 1) In our user research for Takeout, the majority of users who user Takeout don't do it to leave Google. We suspect that the same will be true for DTP, users will want to try out a new service, or user a complementary service, instead of a replacement. 2) Users should absolutely be able to delete their data once they copy it. However we think that separating the two is better for the user. For instance you want to make sure the user has a chance to verify the fidelity of the data at the destination. It would be terrible if a user ported their photos to a new provider and the new provider down-sampled them and originals were automatically deleted.
Re: Scraping Its true that DTP can use API of companies that are 'participating' in DTP. But we don't do it by scraping their UIs. We do it like any other app developer, asking for an API key, which that service is free to decline to give. One of the foundational principals we cover in the white paper is that the source service maintain control over who, how, and and when to give the data out via their API. So if they aren't interesteed in their data being used via DTP, that is absolutely their choice.
Re: Economics As with all future looking statements we'll have to wait and see how it works out. But I'll give one antidote on why I don't think this will happen. Google Takeout (which I also work on) allows users to export their data to OneDrive, DropBox, and Box (as well as Google Drive). One of the reasons we wanted to make DTP is we were tired of dealing with other peoples APIs, as it doesn't scale well. Google should build adapters for Google, and Microsoft should build adapters for Microsoft. So with Takeout we tried the specialized transport method, but it was a lot of work, so we went with the DTP approach specifically to try to avoid having specialized transports.
DTP is still in the early phases, and I would encourage you, and everyone else, to get involved in the project (https://github.com/google/data-transfer-project) and help shape the direction of the project.