This is actually something that is a huge issue once you start to think about it. Huge not just because of the volumes of data involved but also because there is the perpetual problem of what to preserve and the internet for the first time there is no obvious mediator of what is worth preserving.
How about the idea of e-museums where curators would source and solicit items of interest.
Obvious examples might be for something like source code or applications.
Where it gets tricky is things like blogs, facebook or youtube. Most linked and blogs with most hits could be a metric. They could develop archives of most viewed pages for every given time period as some kind of cultural barometer.
Although satirical, there is the possibility of finding random pieces of information on discarded machines leading to the emergence of e-archaeology.