It seems to me that you've answered your own question - the present
situation is evidently inconvenient for you, the list members, and folks on
the net as a whole. I can't think of anyone else to irritate :-)
If you think there is a value in having some of the mail topics indexed,
then you might consider extracting just the subject lines, or have an index
of items you've extracted using your own search engine, and make that
available to the search engines, then block the robots from the mail
content itself.
'Course, now the engines have indexed the entirety, the cat's probably out
of the bag - I doubt they drop anything once indexed. I'm looking forward
to the meta-content engines, such as proposed by that guy at Apple.
(BTW - I used to work with a concept based full text search tool called
Metamorph - highly recommended.)
gb
>Basically, I received a complaint that because I make the sum total of my
>list archives available on the web, it is impossible to search for useful
>information because my archive pages keep popping up. For instance, a
>search for "fvwm" (a UNIX window manager whose mailing lists I host) will
>(at some search engines) reveal a pile of hits to individual messages while
>the main page is buried. Essentially, I was accused of "polluting the
>indexes," especially with old messages which I make available.
>My reply was "too bad." I make my archives publicly available; anyone
>can search them. I also have a homebrew front end search engine based on
>Glimpse which allows things like limiting by date. Why should I care if
>some over-zealous spider went through my entire archives and added them to
>its index? It is they who aren't serving their customers well by doing
>this; my search engine works fine.
>
>Yes, I know about robot exclusion, but why should I have to?
end | Gary Bickford <garyb@fxt.com>
--+-- FXT Corp. / Informat Communications
| 541-923-3060
| fax 541-923-5537
|
|