Searching \ for '[OT] [EE] Google free advertising ?' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=
Search entire site for: '[EE] Google free advertising ?'.

Exact match. Not showing close matches.
PICList Thread
'[OT] [EE] Google free advertising ?'
2007\04\23@160457 by Peter P.

picon face
Hi all,

I have noticed that a lot of my searches turn up links to PDF files which are
really not there. These are hosted at reputable insitutions, like ieee.org,
elsevier etc. I find that I waste quite some time bothering to look at the links
(wrongly believing that they contain what Google says they contain). Example:

 http://linkinghub.elsevier.com/retrieve/pii/S0041624X05000582

Yet, when accessing the link one gets an intro and an invitation to purchase. At
the same time, the Google link has no HTML backup for such pages. I am under the
impression that Google does not allow such tricks, and that one must pay to
allow such use. Am I wrong ? If I am wrong, how does one go about achieving such
excellent free advertising for one's pay products ?

thanks,
Peter P.

Note: The link above was the 1st on page 51 (!!) of the search with key:

 http://www.google.com/search?q=piezo+driver+schematic&hl=en&start=50&sa=N

The text found by Google was:

"A simplified. schematic of the driver stage is shown in Fig. 4. The dri-. ver  
discharges and charges the piezoelectric disc via. transistors MN1 and MP1, ..."

which is not at all on the page at the link above.


2007\04\23@162523 by M. Adam Davis

face picon face
Very interesting.

The google crawler doesn't use cookies, so the only way they could
give Google access that they deny to ordinary users is by either 1)
checking the user agent string or 2) using the IP address (or reverse
DNS) of the crawler.

If it's just the user agent string, then it's fairly easy to spoof
using, say, firefox or a proxy server:

http://johnbokma.com/mexit/2004/04/24/changinguseragent.html

That page also talks a bit about the cloaking that seems to be taking
place here.

If it's the IP address, then it'll be much more difficult.

-Adam

On 4/23/07, Peter P. <spam_OUTplpeter2006TakeThisOuTspamyahoo.com> wrote:
{Quote hidden}

> -

2007\04\23@163159 by M. Adam Davis

face picon face
A quick test showed that it's not a simple user agent setting.  I
changed my user agent to google's crawler, disabled cookies, java, and
javascript, and it still didn't give me the pdf.

Perhaps wget would work better, if they're also looking at connection
type and such.

-Adam

On 4/23/07, M. Adam Davis <.....stienmanKILLspamspam@spam@gmail.com> wrote:
{Quote hidden}

2007\04\23@163307 by wouter van ooijen

face picon face
> Yet, when accessing the link one gets an intro and an invitation to
purchase.

Do you think the text as shown by google is likely to be part of the
document that can be purchased? In that case I would be interested in
that trick!

Otherwise - maybe that link refers to a page that changes very often
(and maybe was not meant to be crawled)?

Wouter van Ooijen

-- -------------------------------------------
Van Ooijen Technische Informatica: http://www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: http://www.voti.nl/hvu



2007\04\23@164551 by M. Adam Davis

face picon face
Google indicated that when it downloaded that page, it was given a PDF
document.  It then has a few snippets of the text of that PDF as part
of the search results.

PDFs are not cached at google like web pages (or, at least, they
aren't provided inside their cache for you to grab) otherwise one
could just do a google search for
"cache:linkinghub.elsevier.com/retrieve/pii/S0041624X05000582"
and get the PDF that google saw when it last crawled it.

If one can look exactly like the google crawler, then one will get
that PDF just as google saw it (which may not be the full pdf,
either).  But that may be non-trivial depending on how cloaked the pdf
is.

-Adam

On 4/23/07, wouter van ooijen <.....wouterKILLspamspam.....voti.nl> wrote:
{Quote hidden}

> -

2007\04\23@174741 by Peter P.

picon face
M. Adam Davis <stienman <at> gmail.com> writes:

> Google indicated that when it downloaded that page, it was given a PDF
> document.  It then has a few snippets of the text of that PDF as part
> of the search results.
>
> PDFs are not cached at google like web pages (or, at least, they
> aren't provided inside their cache for you to grab) otherwise one
> could just do a google search for
> "cache:linkinghub.elsevier.com/retrieve/pii/S0041624X05000582"
> and get the PDF that google saw when it last crawled it.
>
> If one can look exactly like the google crawler, then one will get
> that PDF just as google saw it (which may not be the full pdf,
> either).  But that may be non-trivial depending on how cloaked the pdf
> is.

It isn't just that they are not cached, Google *knows* that it should not cache
them as there is no link to 'view as HTML'. There must be something else going
on there. And it's a something that has a strong flavor of unequal treatment to
me. Then there's Ebay pages which appear on Google a few hours after an item is
put up. Contrast this to the shortest time it took (ever) to get a fairly public
site noticed and ranked. Oh, and the Ebay pages appear very near the top of the
link listing (in the 1st page at least). There are several 'strange' things like
this out there, all connected to 'brand names'. I suspect that there are ways to
integrate Google appliances and certain service contracts which are unavailable
to normal mortals. It would be interesting to read more about this. But where ?
Maybe ranking.com etc. I don't have time for this now. In a few days ... (famous
last words). Anyway, here is a short list of sites that have this 'feature' on
Google: iop.org EJs, ieee.org (Standards etc), elsevier.com and several others.
All have in common the lack of the 'view as HTML' link in Google. So I have
taken to not visiting those links anymore, since they are wasting my time.

Peter P.


2007\04\23@181747 by Nate Duehr

face
flavicon
face
On 4/23/07, Peter P. <EraseMEplpeter2006spam_OUTspamTakeThisOuTyahoo.com> wrote:

> It isn't just that they are not cached, Google *knows* that it should not cache
> them as there is no link to 'view as HTML'. There must be something else going
> on there. And it's a something that has a strong flavor of unequal treatment to

Maybe the Googlebots know how to use bugmenot.com ?  :-) :-P

Nate

2007\04\23@192814 by Gerhard Fiedler

picon face
Peter P. wrote:

> It isn't just that they are not cached, Google *knows* that it should not
> cache them as there is no link to 'view as HTML'. There must be
> something else going on there. And it's a something that has a strong
> flavor of unequal treatment to me. Then there's Ebay pages which appear
> on Google a few hours after an item is put up. Contrast this to the
> shortest time it took (ever) to get a fairly public site noticed and
> ranked. Oh, and the Ebay pages appear very near the top of the link
> listing (in the 1st page at least). There are several 'strange' things
> like this out there, all connected to 'brand names'.

A few years ago I vaguely noticed that the Google ranking lost a lot of the
up to then quite astonishing usefulness for me, and this trend has itself
confirmed over the years. This could well be related.

FWIW, the robots.txt file at http://linkinghub.elsevier.com goes like this:

 # /robots.txt file for http://linkinghub.elsevier.com/
 User-agent: *
 Disallow: /

No crawling at all...

The sciencedirect.com site allows Googlebot (among others), but that's not
the link listed in Google. It looks like elsevier.com made the content
available to Google for indexing, and probably not through the normal bot
access.

Gerhard

2007\04\23@213936 by Peter P.

picon face
Nate Duehr <nate <at> natetech.com> writes:

> Maybe the Googlebots know how to use bugmenot.com ?   :-P

Nah, they use their master's credit cards automatically (I bet you a virtual sum
that that is true in some way).

Peter P.


2007\04\24@031631 by Jim Franklin

flavicon
face

As an aside to this, but still on the (off) topic of Google.

If you are looking for technical information, and get linked to websites
that have a question and answer section, but your answer is only available
if you subscribe to the page, go back to the google search results, and
check if the page has a "Cached" link - often this will drop you straight
into the "logged in" answer page.

:)





On Mon, 23 Apr 2007 16:25:19 -0400, M. Adam Davis wrote
{Quote hidden}

which are
> > really not there. These are hosted at reputable insitutions, like
ieee.org,
> > elsevier etc. I find that I waste quite some time bothering to look at
the links
> > (wrongly believing that they contain what Google says they contain).
Example:
> >
> >   linkinghub.elsevier.com/retrieve/pii/S0041624X05000582
> >
> > Yet, when accessing the link one gets an intro and an invitation to
purchase. At
> > the same time, the Google link has no HTML backup for such pages. I am
under the
> > impression that Google does not allow such tricks, and that one must pay
to
> > allow such use. Am I wrong ? If I am wrong, how does one go about
achieving such
> > excellent free advertising for one's pay products ?
> >
> > thanks,
> > Peter P.
> >
> > Note: The link above was the 1st on page 51 (!!) of the search with key:
> >
> >   http://www.google.com/search?
q=piezo+driver+schematic&hl=en&start=50&sa=N
> >
> > The text found by Google was:
> >
> >  "A simplified. schematic of the driver stage is shown in Fig. 4. The
dri-. ver
> >  discharges and charges the piezoelectric disc via. transistors MN1 and
MP1, ..."
> >
> > which is not at all on the page at the link above.
> >
> >
> > --

More... (looser matching)
- Last day of these posts
- In 2007 , 2008 only
- Today
- New search...