Searching \ for '[OT] piclist phrase popularity' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/microchip/devices.htm?key=pic
Search entire site for: 'piclist phrase popularity'.

Exact match. Not showing close matches.
PICList Thread
'[OT] piclist phrase popularity'
2008\04\14@152914 by Martin

face
flavicon
face
I've been indexing piclist emails for the past few weeks. Here are the
top 100 most popular phrases according to my scoring algorithm. It's
"sort of" interesting. I don't know what I expected. Maybe when there
are several thousand emails in the database it will look more
interesting. A "phrase" here is defined as three unlikely words placed
near each other. Often-occurring words such as "and the I is" etc. are
not included for obvious reasons. The highest scoring phrases are mostly
gibberish like: "psychotherapists psychopaths homeopaths" and "cpi ctaf
displayitem" or "zhmezhme hansjurgen fajs".

p.s. originally I would have put this in EE 'everything engineering' but
it seems as though that's in a self image crisis as it is.


count        phrase        score

105        wikipedia org wiki        2.964
75        own certificate saying        2.9742
69        sean breheny shb7        2.9863
61        william chops westfield        2.9912
56        advertising opportunities available        2.9833
54        apptech apptech paradise        2.9777
53        content transfer encoding        2.9801
53        leading company china        2.9728
53        cause great concerns        2.9784
53        large online game        2.9675
53        community china ruining        2.9771
53        large number young        2.9703
52        tencent really amazing        2.9672
52        article sid 1634218        2.9852
52        most popular malicious        2.967
52        begin pgp signed        2.9857
50        apparently sizable virtual        2.964
49        picmicro products docent        2.9879
47        steal online gaming        2.976
46        games slashdot org        2.9642
46        either ian smith        2.9744
46        voti consultancy development        2.9839
46        hogeschool van utrecht        2.9866
44        text plain charset        2.9811
43        dollar going worth        2.964
43        same buying downloadable        2.964
41        wiki liberty_dollar interesting        2.9712
39        real exchange rate        2.964
39        real bank accounts        2.9641
39        real gold silver        2.9649
38        linden dollar virtual        2.964
38        message hash sha1        2.9676
37        been sizable economy        2.9642
35        sent monday april        2.9719
35        end pgp signature        2.9782
34        jinx joecolquitt clear        2.9874
34        virus database 269        2.9901
33        xiaofan chen xiaofanc        2.9784
33        embedded software hardware        2.9794
33        mhz surface mount        2.9833
33        hc49 base glad        2.9853
32        absolutely reason expect        2.9842
32        dollar hold against        2.9795
32        price gold 2000        2.9771
32        avg version 519        2.9799
31        content type text        2.9671
31        ascii mime version        2.9794
31        7bit content disposition        2.9837
31        question motor noise        2.9698
31        really online gaming        2.964
31        virtual currencies expect        2.9657
30        taxonomy term feed        2.9863
30        why blockbuster offering        2.9731
30        month blockbuster total        2.9912
29        specie returned its        2.9774
29        worrying trend china        2.978
29        type text plain        2.9743
29        ooijen technische informatica        2.9897
29        few projects working        2.9697
29        old warp bluepole        2.9831
28        another crackpot paper        2.9789
28        money scheme unless        2.9705
28        legal tender gold        2.9826
28        silver coin anyway        2.9691
28        because political tricksters        2.9675
28        place bob axtell        2.9678
27        wouter van ooijen        2.9805
26        bob axtell engineer        2.9671
26        harold fcc rules        2.988
26        silver currency soon        2.9648
26        internal osc clock        2.9871
26        above link gives        2.9791
26        yahoo tired spam        2.9854
26        best spam protection        2.9746
25        paper money instead        2.9643
25        agree powerpoles anything        2.9715
25        much easier rarely        2.9662
25        prefer soldering check        2.9866
25        lot different colors        2.9691
25        voltage color bnc        2.9685
25        male center conductor        2.9925
25        sounds off color        2.9683
25        again replaced number        2.9741
25        choice actually fwiw        2.9762
25        work nicely male        2.966
25        right adaptor sleeve        2.9825
25        andrew official uhf        2.9907
25        standard uhf silver        2.9724
25        aspx 306403 seems        2.9779
25        chips become unprogrammable        2.9893
25        funny group electronics        2.9791
25        john gardner goflo3        2.9889
24        rudonix doublesaver did        2.9776
24        16f88s update bought        2.9935
24        buy pickit2 glad        2.9759
24        serial port too        2.9649
24        looked anyway regards        2.9711
24        jack martin klingensmith        2.9797
24        martin nnytech net        2.9713
24        whole liberty dollar        2.9763


--
Martin Klingensmith

2008\04\14@154901 by M. Adam Davis

face picon face
That's very interesting!  I wonder, though, if you could re-run it
without any quoted emails?  I suspect that if you ran it against the
email as-is, you'd get 5 of any particular phrase in a thread just due
to quoting the same email in several other email.

Would be fun to run some statistical analysis against the whole
piclist archive.  I'd like to get a google trends like interface so I
can see the rise and fall of each PIC type, particular problems, and
other topics...

-Adam

On 4/14/08, Martin <spam_OUTmartinTakeThisOuTspamnnytech.net> wrote:
> I've been indexing piclist emails for the past few weeks. Here are the
> top 100 most popular phrases according to my scoring algorithm. It's
> "sort of" interesting. I don't know what I expected. Maybe when there
> are several thousand emails in the database it will look more
> interesting. A "phrase" here is defined as three unlikely words placed
> near each other. Often-occurring words such as "and the I is" etc. are
> not included for obvious reasons. The highest scoring phrases are mostly
> gibberish like: "psychotherapists psychopaths homeopaths" and "cpi ctaf
> displayitem" or "zhmezhme hansjurgen fajs".

2008\04\14@165904 by Wouter van Ooijen

face picon face
Martin wrote:
> I've been indexing piclist emails for the past few weeks. Here are the
> top 100 most popular phrases according to my scoring algorithm. It's

> 61        william chops westfield        2.9912
> 29        ooijen technische informatica        2.9897

Maybe you should at least remove all sigs?

and: don't extent a phrase 'over' a .


--

Wouter van Ooijen

-- -------------------------------------------
Van Ooijen Technische Informatica: http://www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: http://www.voti.nl/hvu

2008\04\14@201305 by Apptech

face
flavicon
face
> A "phrase" here is defined as three unlikely words placed
> near each other. Often-occurring words such as "and the I
> is" etc. are
> not included for obvious reasons. The highest scoring
> phrases are mostly
> gibberish like: "psychotherapists psychopaths homeopaths"
> and "cpi ctaf
> displayitem" or "zhmezhme hansjurgen fajs".

Zhar zhou ghetting aht mhe perhsonallhy?


                           Z H Fajs


2008\04\15@020024 by William \Chops\ Westfield

face picon face

On Apr 14, 2008, at 1:58 PM, Wouter van Ooijen wrote:
>> 61        william chops westfield        2.9912
>> 29        ooijen technische informatica        2.9897
>
> Maybe you should at least remove all sigs?

And skip the headers?  AFAIK, 'William "Chops" Westfield' only  
appears in the From: line of my headers.  (if you don't see the  
quotes around "chops", then something in your email path is broken  
and not RFC822 compliant.  The only reason that nickname (last used  
in college!) still appears at all is because of the difficulty some  
systems have in parsing it correctly!  (We had a discussion when  
RFC822 first came out.  It SHOULD work.)

BillW

2008\04\15@083742 by Martin

face
flavicon
face
Wouter van Ooijen wrote:
> Martin wrote:
>> I've been indexing piclist emails for the past few weeks. Here are the
>> top 100 most popular phrases according to my scoring algorithm. It's
>
>> 61        william chops westfield        2.9912
>> 29        ooijen technische informatica        2.9897
>
> Maybe you should at least remove all sigs?
>
> and: don't extent a phrase 'over' a .
>
>

It seems like it would be very difficult to not include sigs. My 'email
client' is a perl script and there is no standard way to denote the end
of content and the start of a signature.

It seems like it would be smart to not run a phrase over delimiting words.
-
Martin

2008\04\15@084052 by Martin

face
flavicon
face
Your nickname does show up in replies if the client automatically cites
your quote. My script ignores punctuation. Perhaps for the purpose of
finding phrases it should pay attention to commas, semicolons, etc.

i.e:

William "Chops" Westfield wrote:
{Quote hidden}

2008\04\15@084249 by Martin

face
flavicon
face
M. Adam Davis wrote:
> That's very interesting!  I wonder, though, if you could re-run it
> without any quoted emails?  I suspect that if you ran it against the
> email as-is, you'd get 5 of any particular phrase in a thread just due
> to quoting the same email in several other email.
>
> Would be fun to run some statistical analysis against the whole
> piclist archive.  I'd like to get a google trends like interface so I
> can see the rise and fall of each PIC type, particular problems, and
> other topics...
>
> -Adam
>

That's true that quoted information may be less relevant. At least, the
last reply would probably be relevant to the current response. For
example, the reply I'm typing right now would mean nothing if someone
couldn't read your quoted message above.

-
Martin

2008\04\17@001750 by Sean Breheny

face picon face
On Mon, Apr 14, 2008 at 3:28 PM, Martin <.....martinKILLspamspam@spam@nnytech.net> wrote:

>  count   phrase  score
>
>  105     wikipedia org wiki      2.964
>  75      own certificate saying  2.9742
>  69      sean breheny shb7       2.9863

Good work there Martin ;)

Sean

More... (looser matching)
- Last day of these posts
- In 2008 , 2009 only
- Today
- New search...