Thrice as much
Vocaboly.com Forum Index Vocaboly.com
Vocabulary builder software for SAT, TOEFL, GRE, GMAT and more
 
 FAQFAQ   MemberlistMemberlist 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Thrice as much
Goto page Previous  1, 2
 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Vocaboly.com Forum Index -> alt.usage.english
Author Message
Thomas Koenig
Guest





Posted: Thu May 12, 2005 6:33 am    Post subject: Re: Thrice as much Reply with quote

Donna Richoux wrote:

Quote:
The other reasons not to trust numbers obtained with "site:" are (l) not
all UK sites have "uk" in the address;

True, but very few US/AUS/other clutter sites are hosted under the .uk
domain.

Quote:
(2) not all writing on "site:uk"
page are written by UK speakers;

Also true, but the proportion of BE speakers on .uk sites should easily
be higher than the ratio on other domains. (as a native German speaker
and AE imposter, I'm trying my best on my .uk site to confuse these
stats, though).

Quote:
I've used the "site" trick as a fast way to get a rough estimate, but I
really wouldn't stake anything on it, not at this point.

I hear similar objections/cautions w/ respect to various google stats on
all sorts of occasions. I am yet to meet a convincing criticism that
cannot be remedied with Statistics 101.
Back to top
Donna Richoux
Guest





Posted: Thu May 12, 2005 3:33 pm    Post subject: Re: Thrice as much Reply with quote

Thomas Koenig <fossa@gmx.li> wrote:

Quote:
Donna Richoux wrote:

The other reasons not to trust numbers obtained with "site:" are (l) not
all UK sites have "uk" in the address;

True, but very few US/AUS/other clutter sites are hosted under the .uk
domain.

(2) not all writing on "site:uk"
page are written by UK speakers;

Also true, but the proportion of BE speakers on .uk sites should easily
be higher than the ratio on other domains. (as a native German speaker
and AE imposter, I'm trying my best on my .uk site to confuse these
stats, though).

I've used the "site" trick as a fast way to get a rough estimate, but I
really wouldn't stake anything on it, not at this point.

I hear similar objections/cautions w/ respect to various google stats on
all sorts of occasions. I am yet to meet a convincing criticism that
cannot be remedied with Statistics 101.

Well, I'm pleased to hear it, and may call on you to solve some of the
problems *I've* encountered. Unfortunately, I don't keep a record of
them, so I can't quickly haul out description and evidence.

Off the top of my head, I remember these numerical problems:

l) The minus problem -- adding a term preceded by a minus sign (to
excluse it) can actually increase the estimated number of results. (Note
that I mean the estimation, not the actual number.) Since this is not
logically possible, it can only mean one or the other of the two
estimations (with the minus term, without the minus term) was unreliable
(or both).

2) The geographic variation problem -- someone in, say, California
running a search can get totally different estimation numbers for the
same search as someone in, say, Europe. Not just mildly different, but
WAY different, like ten or fifty or a hundred times as much.
The related erratic problem -- sometimes the person in Europe, after
a few days, would start getting numbers similar to the California ones.

3) The "cat dog" problem Mark Brader reported a couple of years ago,
where certain combinations of words yielded zero result even though
other searches showed they existed.

4) The estimation/reported hits variation -- too often, the Google
estimation figure in the top corner reports a high number, but when you
examine the list of actual hits it can find, it is piddling few. Even
accounting for the suppression of duplicates. To me that says the
estimate was wrong, thrown off by some unknown circumstance.

People here know I like Google for various tasks, and I think the
estimation numbers do show something when used cautiously. But Google
uses some sort of formula to generate those estimates, and it has flaws.

I know the above items are sketchy and if you are serious about wanting
to investigate them, I can supply more details.

--
Best wishes -- Donna Richoux
Back to top
Mark Brader
Guest





Posted: Fri May 13, 2005 3:40 am    Post subject: Re: Thrice as much Reply with quote

Donna Richoux writes:
Quote:
Off the top of my head, I remember these numerical problems:

l) The minus problem ...
2) The geographic variation problem ...
3) The "cat dog" problem ...
4) The estimation/reported hits variation -- too often, the Google
estimation figure in the top corner reports a high number, but when you
examine the list of actual hits it can find, it is piddling few. Even
accounting for the suppression of duplicates. To me that says the
estimate was wrong, thrown off by some unknown circumstance.

In fact, that problem is the reason why I didn't post any Google counts
in relation to this thread. On some of my phrase searches the estimated
hit counts seemed suspiciously high (in the range 50,000 to 250,000, as
I recall), so I asked for 100 hits per page and started stepping through
pages. Sure enough, Google ran out of hits well before I reached the
1,000-hit limit, and when I asked it to include suppressed duplicates,
it still did.

There is also (5) counts for search terms with different numbers of
words sometimes seem out of whack, especially when it's a single word
versus a short, common phrase including it. I haven't seen this lately
and don't remember a specific example, but I presume it's related to
the way the information from different words in the phrase is combined.
It might very well be related to the minus problem.
--
Mark Brader | "I don't have to stay here to be insulted."
Toronto | "I realize that. You're insulted everywhere, I imagine."
msb@vex.net | -- Theodore Sturgeon

My text in this article is in the public domain.
Back to top
 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Vocaboly.com Forum Index -> alt.usage.english All times are GMT + 1 Hour
Goto page Previous  1, 2
Page 2 of 2

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Office Forum Access Forum Electronics Exchange Server
Powered by phpBB