Searching \ for 'tokenize' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/index.htm?key=tokenize
Search entire site for: 'tokenize'.

Truncated match.
PICList Thread
'tokenize'
2000\03\15@031046 by Soon Lee

flavicon
face
Hi everyone
anyone explain what is tokenize??

regards
soon lee

2000\03\15@032324 by douglas.burkett

flavicon
face
Tokenize is usually the process a interpreter goes through to convert the
normal human text commands for a programming language to an internal machine
representation.  For example:

10 for i=1 to 100
20 next

The machine may convert the keywords above: for, to, next : into single
character representation.  This conservers space and allows a quicker lookup
of keywords during program execution.

Doug

----- Original Message -----
From: "Soon Lee" <spam_OUTpslnTakeThisOuTspamCYBERWAY.COM.SG>
To: <.....PICLISTKILLspamspam@spam@MITVMA.MIT.EDU>
Sent: Wednesday, March 15, 2000 9:13 AM
Subject: tokenize


> Hi everyone
> anyone explain what is tokenize??
>
> regards
> soon lee

2000\03\15@062257 by Soon Lee

flavicon
face
Can any one please enlighten me how to go about doing this

thanks


----- Original Message -----
From: douglas.burkett <douglas.burkettspamKILLspamUS.ARMY.MIL>
To: <.....PICLISTKILLspamspam.....MITVMA.MIT.EDU>
Sent: Wednesday, March 15, 2000 4:22 PM
Subject: Re: tokenize


> Tokenize is usually the process a interpreter goes through to convert the
> normal human text commands for a programming language to an internal
machine
> representation.  For example:
>
> 10 for i=1 to 100
> 20 next
>
> The machine may convert the keywords above: for, to, next : into single
> character representation.  This conservers space and allows a quicker
lookup
{Quote hidden}

2000\03\15@063956 by Spehro Pefhany

picon face
At 07:24 PM 3/15/00 +0800, you wrote:
>Can any one please enlighten me how to go about doing this
>

It's done in the piece of software called a "parser".

There are numerous examples in any book on how to
write a compiler (the parser is the easiest part to
write). Also there are parser-generators such as
flex available.

It's best to start with a grammar in BNF and
then design your program to recognize the
grammar you have designed. It will typically
discard white-space (except in strings and
literals) and comments prior to tokenizing.

Best regards,




=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Spehro Pefhany                                    "The Journey is the reward"
@spam@speffKILLspamspaminterlog.com
Fax:(905) 271-9838                      (small micro system devt hw/sw + mfg)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

2000\03\15@122154 by Harold M Hallikainen
picon face
On Wed, 15 Mar 2000 09:22:19 +0100 "douglas.burkett"
<KILLspamdouglas.burkettKILLspamspamUS.ARMY.MIL> writes:
> Tokenize is usually the process a interpreter goes through to convert
> the
> normal human text commands for a programming language to an internal
> machine
> representation.  For example:
>
> 10 for i=1 to 100
> 20 next
>
> The machine may convert the keywords above: for, to, next : into
> single
> character representation.  This conservers space and allows a
> quicker lookup
> of keywords during program execution.
>

       The Microsoft 6800 Basic interpreter that I did a WHOLE lot of work with
did interesting tokenizing of a line as soon as you hit return after
entering it. It scanned thru the line looking for key words and
substituting tokens (which could be distinguished from plain text of the
line since the token had the MSB set). The line was then placed in memory
in the proper place (based on line number), doing block moves of old
lines as necessary. The line format was something like:

LineNumHiByte
LineNumLowByte
LengthOfLine
TokenizedText
EndOfLineFlag (zero byte)

       When searching for a line (either goto or gosub), it would start at the
beginning of the program (if the destination line number was lower than
the current line number) or at the current location (if the destination
was higher), and jump line to line using the LengthOfLine byte to
determine where the next line began. Once the line was found, it would
scan the line recursively calling appropriate routines to execute it.
       The neat thing about the tokens was that on discovering a token, they'd
clear the MSB and do an indexed jump right to the appropriate code to
execute the token. For example, on finding the token for sqrt, it would
go to that code, eat the open parenthesis, then call the function
evaluator AGAIN (it's already the function evaluator that called the
sqrt). This would determine what was inside the parenthesis and put the
result in the FAC (floating point accumulator). The sqrt function routine
would then "eat" the closing paren and return with the result in the FAC.
       I thought the use of tokens was very clever and resulted in pretty fast
code. The only thing that slowed it down was those line searches and
repeated evaluation of constants (though you could get rid of that by
declaring variables at the beginning of your code to hold those
constants, then you only had to put up with the variable search, which,
if I recall correctly, was a linear search, which could be sped up by
going to a binary search or something).
       I spent about 10 years mucking around in that code developing and
maintaining a product based on it. Lotsa fun!

Harold




FCC Rules Online at http://hallikainen.com/FccRules
Lighting control for theatre and television at http://www.dovesystems.com

________________________________________________________________
YOU'RE PAYING TOO MUCH FOR THE INTERNET!
Juno now offers FREE Internet Access!
Try it today - there's no risk!  For your FREE software, visit:
dl.http://www.juno.com/get/tagj.

2000\03\15@130819 by William Chops Westfield

face picon face
   [building a basic interpretter]
   Can any one please enlighten me how to go about doing this

Start reading up on compiler design, either in general, or look for
texts on specific languages.  (FORTH is sort of nice for understanding
interpreters, since there's pretty much a 1:1 corrospondence between the
"high level language" and the "interpreter codes.")

Most compiler courses and books will "start out" with compiling to a
"machine independent pseudo-machine-language."  For an tokenizing
interpreter, stop there and write machine code to execute the MIPML.
(for example.)

BillW

2000\03\15@135444 by Eisermann, Phil [Ridg/CO]

flavicon
face
>     [building a basic interpretter]
>     Can any one please enlighten me how to go about doing this
>
> Start reading up on compiler design, either in general, or look for
> texts on specific languages.  (FORTH is sort of nice for understanding
> interpreters, since there's pretty much a 1:1 corrospondence between the
> "high level language" and the "interpreter codes.")
>
Like Bill says, start reading. While it's not really difficult to write a
simple (eg basic, un-optimized) compiler (if you have a decent
grasp on programming in the first place, that is), it is a big
(non-trivial) task.

There are numerous ways to approach 'tokenizing' One of the
first things the person who asked needs to do is learn how
to do 'pattern matching'.  A common text-book approach for
this is using state-machines. The individual words are matched
to some generic pattern like "text", "number" (integers or
floating point), "operators" (eg assignment statements,
mathematical operators), and so on. the generic patterns
are then further analyzed and put into tokens. For example,
is the "text" a keyword like 'if', or 'while', is it a variable, or is
it a comment. Then the tokens are turned into assembly
language.

I had to do this for a class once upon a time. I remember that
we spent the entire semester writing a compiler. Started with
having to write an assembler, then a tokenizer, then a parser,
and finally a compiler. The compiler, of course, required all of
the previous programs. As stated previously, it's not a trivial
task that can be explained in a few emails...


Phil Eisermann
H:(440) 284-3787 (RemoveMEmazerTakeThisOuTspamix.netcom.com)
O:(440) 329-4680 (spamBeGonepeisermaspamBeGonespamridgid.com)

2000\03\15@141948 by Andrew Kunz

flavicon
face
>I had to do this for a class once upon a time. I remember that
>we spent the entire semester writing a compiler. Started with
>having to write an assembler, then a tokenizer, then a parser,
>and finally a compiler. The compiler, of course, required all of
>the previous programs. As stated previously, it's not a trivial
>task that can be explained in a few emails...

As I recall, the assembler came immediately before the compiler.  The assembler
was actually a special case of the compiler, in that it treated the EOLN as a
statement delimiter.  Our compiler ignored the EOLN and voila - free format
language.

Just yesterday I just found copies of the compiler I did on a CD of the old VAX
tapes.  It writes Z-80 assembly code.

Andy

2000\03\15@143648 by WF

flavicon
face
Term used in developting of COMPILERS.

----- Original Message -----
From: Soon Lee <TakeThisOuTpslnEraseMEspamspam_OUTCYBERWAY.COM.SG>
To: <RemoveMEPICLISTspamTakeThisOuTMITVMA.MIT.EDU>
Sent: Wednesday, March 15, 2000 5:13 AM
Subject: tokenize


> Hi everyone
> anyone explain what is tokenize??
>
> regards
> soon lee

2000\03\15@154516 by paulb

flavicon
face
Harold M Hallikainen wrote:

>  The Microsoft 6800 Basic interpreter that I did a WHOLE lot of work
> with did interesting tokenizing of a line as soon as you hit return
> after entering it.

 I suspect it was the same interpreter which I spent a few weeks
translating into 6809 back in 1980 or 1981, for no particular reason
other than "because it was there".  I have an EPROM set with the object,
but the only source code I have now is yellowing fan-fold.

 I certainly wish *I* have a CD of the VAX tapes...
--
 Cheers,
       Paul B.

2000\03\15@160802 by Harold M Hallikainen

picon face
On Thu, 16 Mar 2000 07:43:10 +1100 "Paul B. Webster VK2BZC"
<paulbEraseMEspam.....MIDCOAST.COM.AU> writes:
> Harold M Hallikainen wrote:
>
> >  The Microsoft 6800 Basic interpreter that I did a WHOLE lot of
> work
> > with did interesting tokenizing of a line as soon as you hit
> return
> > after entering it.
>
>   I suspect it was the same interpreter which I spent a few weeks
> translating into 6809 back in 1980 or 1981, for no particular reason
> other than "because it was there".  I have an EPROM set with the
> object,
> but the only source code I have now is yellowing fan-fold.
>
>   I certainly wish *I* have a CD of the VAX tapes...
> --
>   Cheers,
>         Paul B.

       I've still got the source code (on this machine). I have an Avocet
assembler that runs under CP/M, and run it under a CP/M emulator (again,
on this machine). I had to do a Y2K fix to my product a few years ago, so
I got to dig into the code once again. I've still got the license, signed
by Bill Gates, partner (Microsoft wasn't a corporation yet). They
originally wanted to give me source on 9 track tape, but I convinced them
to send it to me on 8 inch floppy.


Harold


FCC Rules Online at http://hallikainen.com/FccRules
Lighting control for theatre and television at http://www.dovesystems.com

________________________________________________________________
YOU'RE PAYING TOO MUCH FOR THE INTERNET!
Juno now offers FREE Internet Access!
Try it today - there's no risk!  For your FREE software, visit:
dl.http://www.juno.com/get/tagj.

2000\03\16@024543 by William Chops Westfield

face picon face
I'l have to admit that I learned more in my college compiler class that
I still use today, even though I was really annoyed with it at the time.
(I was upset because the class ended up essentially skipping "code
generation", and I considered that VERY important at the time.  However,
the stuff about parsing and tokenizing and BNF, and so on has been useful
over and over again...)

BillW

2000\03\16@071812 by Andrew Kunz

flavicon
face
For our class, we were supposed to do both an interpreter and a compiler, and
both a FP and INT-only version.  My interpreter did the FP, and wrote object
code for a virtual machine that was actually decoded by the VAX in a Pascal
program.  The int-only version generated the Z-80 opcodes.  I didn't have a
floating-point library to work with, nor time to make one.

I was the only one in my class who actually finished all the assignments.

But you're right - the parsing side was a lot more useful in the long run than
the codegen, although it has certainly helped in understanding what's happening
in compilers I use sometimes.

Andy










William Chops Westfield <EraseMEbillwspamCISCO.COM> on 03/16/2000 02:45:03 AM

Please respond to pic microcontroller discussion list <RemoveMEPICLISTEraseMEspamEraseMEMITVMA.MIT.EDU>








To:      RemoveMEPICLISTspam_OUTspamKILLspamMITVMA.MIT.EDU

cc:      (bcc: Andrew Kunz/TDI_NOTES)



Subject: Re: tokenize








I'l have to admit that I learned more in my college compiler class that
I still use today, even though I was really annoyed with it at the time.
(I was upset because the class ended up essentially skipping "code
generation", and I considered that VERY important at the time.  However,
the stuff about parsing and tokenizing and BNF, and so on has been useful
over and over again...)

BillW

2000\03\16@113014 by Rob R

flavicon
face
I have been doing this for about 4months now.  And no its not an easy task, not even close.  I have been programming 12hrs a day sometimes.  I am only in my first year of college, no job ;) so i basically have plenty of time.  Now im not a professional programmer, never had any formal training in it, but i 90% done with a full if not better Basic Stamp 1 clone.  Interp. and compiler writen by my own hands.  I wrote the compiler in Visual Basic only becaue its so much simpler and faster to do then C++ would be.  Although I have no formal training in programming, I've been doing it since I was about 11, so I guess I know what im doing.  But writting a compiler and interpretur is the biggest pain in the a** ; )  Especially trying to make things nice and not allow the user to make errors and checking for syntax errors, parsing sentences, checking for variable, are they declared? if not show error, blablalbabla im going to explode soon, but hopefully when my product comes out I'l!
l make some cash of it.

Sorry if I am just babling : )

Rob Rivera.


On Wed, 15 Mar 2000 14:16:43 -0500 Andrew Kunz <RemoveMEakunzTakeThisOuTspamspamTDIPOWER.COM> wrote:
{Quote hidden}

Send someone a cool Dynamitemail flashcard greeting!! And get rewarded.
GO AHEAD! http://cards.dynamitemail.com/index.php3?rid=fc-41

2000\03\16@115718 by rottosen

flavicon
face
William Chops Westfield wrote:

>     [building a basic interpretter]
>     Can any one please enlighten me how to go about doing this
>
> Start reading up on compiler design, either in general, or look for
> texts on specific languages.  (FORTH is sort of nice for understanding
> interpreters, since there's pretty much a 1:1 corrospondence between the
> "high level language" and the "interpreter codes.")
>
> Most compiler courses and books will "start out" with compiling to a
> "machine independent pseudo-machine-language."  For an tokenizing
> interpreter, stop there and write machine code to execute the MIPML.
> (for example.)
>
> BillW

For *complete* information on a compiler of this type see
http://www.idcomm.com/personal/lorenblaney/
You will find lots of great stuff including ALL of the sources!

-- Rich

2000\03\29@145200 by William Chops Westfield

face picon face
I'l have to admit that I learned more in my college compiler class that
I still use today, even though I was really annoyed with it at the time.
(I was upset because the class ended up essentially skipping "code
generation", and I considered that VERY important at the time.  However,
the stuff about parsing and tokenizing and BNF, and so on has been useful
over and over again...)

BillW

2000\03\29@162857 by Quitt, Walter

flavicon
face
BNF?

Please excuse the Brain fade.

-W

-----Original Message-----
From: William Chops Westfield [EraseMEbillwspamspamspamBeGoneCISCO.COM]
Sent: Wednesday, March 15, 2000 11:45 PM
To: RemoveMEPICLISTKILLspamspamMITVMA.MIT.EDU
Subject: Re: tokenize


I'l have to admit that I learned more in my college compiler class that
I still use today, even though I was really annoyed with it at the time.
(I was upset because the class ended up essentially skipping "code
generation", and I considered that VERY important at the time.  However,
the stuff about parsing and tokenizing and BNF, and so on has been useful
over and over again...)

BillW

2000\03\29@181353 by Spehro Pefhany

picon face
At 01:26 PM 3/29/00 -0800, you wrote:
>BNF?

Backus-Naur Form

A metasyntax used to define grammars. There are various extensions such
as EBNF.

It looks something like this (part of definition of Turbo Pascal):

base-type::= simple-type
block::= declaration-part statement part
case-element::= case-list: statement
case-label::= constant
case-label-list::= case-label {, case-label}
case-list::= case-list-element {, case-list-element}
case-list-element::= constant | constant .. constant

::=  means "is defined as"
| means "or"
{} enclosed items may be repeated zero or more times

I don't know the people it was named after, but I'm pretty sure it wasn't
the fellow that played Thurston Howell III on _Gilligan's Island_. ;-)

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Spehro Pefhany --"it's the network..."            "The Journey is the reward"
speffSTOPspamspamspam_OUTinterlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com
Contributions invited->The AVR-gcc FAQ is at: http://www.bluecollarlinux.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

2000\03\29@182032 by jamesnewton

face picon face
If you really want to screw up your mind, take a look at my long term, won't
go anywhere project for a BNF type Meta Language (called metal) at:
http://techref.massmind.org/language/meta-l

---
James Newton spamBeGonejamesnewtonSTOPspamspamEraseMEgeocities.com 1-619-652-0593
http://techref.massmind.org NEW! FINALLY A REAL NAME!
Members can add private/public comments/pages ($0 TANSTAAFL web hosting)


{Original Message removed}

2000\03\29@183722 by David VanHorn

flavicon
face
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>I don't know the people it was named after, but I'm pretty sure it wasn't
>the fellow that played Thurston Howell III on _Gilligan's Island_. ;-)


Hedy Lamarr invented spread spectrum modulation in WWII.

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.2 for non-commercial use <http://www.pgp.com>

iQA/AwUBOOK9FoFlGDz1l6VWEQJCkQCgk22n3WV3MOoQ73hddN61Esi73uwAoNJ+
Y6zAHpN3siSeqW9Ei0Ofvdk1
=BvQS
-----END PGP SIGNATURE-----

More... (looser matching)
- Last day of these posts
- In 2000 , 2001 only
- Today
- New search...