Many blogs, including this one, contains duplicate content.
That unfortunately clogs search engines.
Being a perfectionist that I am, I want that a search for e.g. “15minutes” leads people to individual blog posts about it and not the redundant pages.
There’s a way to tell search engines to not index parts of your site. It’s
quite simple and in five minutes I created the following
robots.txt for my site:
User-agent: *
Disallow: /page/
Disallow: /tag/
Disallow: /notes/
In my particular case, archive pages all start with /page/
or /notes/
and /tag/
is another namespace with duplicate content (shows a list of articles with a given tag).
For this technique to work the names duplicate pages have to follow a pattern, but that’s easy enough to ensure, especially if you write your own blog software, like I do.