Searching \ for '[PIC] Code challenge: Double quote escape' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/microchip/devices.htm?key=pic
Search entire site for: 'Code challenge: Double quote escape'.

Exact match. Not showing close matches.
PICList Thread
'[PIC] Code challenge: Double quote escape'
2007\12\24@035917 by James Newton

face picon face

I was thinking about code that runs LCD's e.g. serial to parallel LCD
converter like the "serial backpack" or others. They use a weird character
to indicate to the unit that they are sending a string to actually be
displayed rather than a command. It's something like:

\0x00Press "start"\0x00

So I wonder why not use a quote? Of course, now you have to escape the quote
character if you actually want to use it in your string. I've noticed that
everyone always uses the \ as the escape character for stuff like that.

"Press \"start\""

But in vbscript, they use a double quote to escape a quote.

"Press ""start"""

...which seems easier for the user. Perhaps it is not, this is not what I
want to debate in this post. If one does decide to use the double quote
thing, what does the code look like for that? I've done a search and came up
empty and so I started to try to write some generic C code to do it.

I'm amazed: Either my mind is really going or that is a VERY tough problem
to solve... Apparently the \ is used because the code is so much easier. One
problem is that you must have a two character window on the string and at
the same time, you have to save the character after an ending single quote
in case it is a command following the string. Here is what I came up with
(untested, it's late so it probably doesn't even work as written)

what can you do with it?

c = getc();
while true {
if (c=='\'') {
 c=0;
 do {
  temp = getc();
  if (temp=='\'') {
   if (c=='\'') {        // double quote.
    putDST('\'');        // Print it
    c = 0;        // don't record it
    continue;        // and contine
    }
   else {        // new single quote. Need to know what comes next
    c = temp;        // record it
    continue;        // and continue
    }
  else if (c!='\'') {
   putc(temp);
   c = temp;
   }
  } while (c!='\'');
 c = temp;
 }
else { // it's a command
 }

James Newton: PICList webmaster/Admin
spam_OUTjamesnewtonTakeThisOuTspampiclist.com  1-619-652-0593 phone
http://www.piclist.com/member/JMN-EFP-786
PIC/PICList FAQ: http://www.piclist.com


2007\12\24@051855 by John Chung

picon face
Here is my solution, not my best effort though*Tested
under linux, it is Christmas and I am bored :P*. Using
the following words:
'''hi'''
'hi'
hi


<being of file>


#include <stdio.h>
#define true 1

int main(int argc,char **argv){
       int c = getchar();
       int temp;
       
       if ( '\'' == c ){ //detects the beginning of the
string.
               c = getchar();
               int prevchar = 0;
               while('\n' != c){
                       
                       if(prevchar == '\'' && c == '\''){
                               putchar('\'');
                       }
                       else{
                               if(c != '\'' )
                                       putchar(c);
                       }
                       
                       if(!(prevchar == '\'' && c == '\''))
                               prevchar = c;                        
                       else
                               prevchar = 0;
                       
                       c = getchar();
               }
               
   }
   else { // it's a command
               printf("\nIt is a command!!!\n");
       }

       putchar('\n');
       
       return 0;        
}

<end>




--- James Newton <.....jamesnewtonKILLspamspam@spam@massmind.org> wrote:

{Quote hidden}

> --

2007\12\24@095354 by Gerhard Fiedler

picon face
James Newton wrote:

> So I wonder why not use a quote? Of course, now you have to escape the quote
> character if you actually want to use it in your string. I've noticed that
> everyone always uses the \ as the escape character for stuff like that.
>
> "Press \"start\""
>
> But in vbscript, they use a double quote to escape a quote.
>
> "Press ""start"""
>
> ...which seems easier for the user. Perhaps it is not, this is not what I
> want to debate in this post. If one does decide to use the double quote
> thing, what does the code look like for that? I've done a search and came up
> empty and so I started to try to write some generic C code to do it.
>
> I'm amazed: Either my mind is really going or that is a VERY tough problem
> to solve...

IMO it's not /very/ tough, but it's tougher than a single escape character
that /has/ to be followed by a second character. And if you want to be able
to express non-printing characters, you need a "proper" escape character
anyway. So given that you already have such an escape character, why
introduce another special sequence?

The idea of the escape character is that it is a character that does /not/
appear by itself in any function; its only function is to "escape" the
following character. The double quote doesn't fit this description.

Gerhard

2007\12\24@101142 by David VanHorn

picon face
> The idea of the escape character is that it is a character that does /not/
> appear by itself in any function; its only function is to "escape" the
> following character. The double quote doesn't fit this description.
>
There's a whole range of control chars in ASCII that people seem to be
forgetting about.
Very useful.

2007\12\24@173451 by James Newton

face picon face
Yeah, again, I'm not debating that the slash version is better or not. It
most likely is better and for that exact reason: you need all the other
escapes as well so why not use it for the quote? But my point is that IF you
decide to try to implement the double quote escape, you find it isn't as
easy, or is actually harder, than it would appear at first glance. If you
don't think is the "very" hard, try it and see if you can come up with
better code than John.

--
James.

{Original Message removed}

2007\12\25@103200 by Peter P.

picon face
My turn to try:

--snip--
/*
* escaped string interpreter
*
* gcc version
*
* plp 2007
*
* Convention:
*
* Input is a string in ASCIIZ suppied as argv[1]
* Escape characters escape themselves: CHAR_ESC CHAR_ESC -> CHAR_ESC
*
* Example usage:
*
* ./escape 'abc \"\e "ABC\"def\"GHI\t"'
*
* Note that by convention an input consisting of a single escape char
* '\' is illegal. The program does not detect this.
*
* compile with:
*
* gcc -Wall -g -o escape escape.c
*
*/

#include <stdio.h>
#include <errno.h>
#include <err.h>

#define VERSION "0.0"
#define BANNER "escape.c " VERSION "\n"

#define CHAR_ESC    '\\'
#define CHAR_DQUOTE '"'
#define CHAR_CR     '\r'
#define CHAR_NL     '\n'
#define CHAR_TAB    '\t'

enum _interp_states { I_IDLE = 0, I_STRING, I_OTHER };

int main( int argc, char *argv[] )
{
 int Is = I_IDLE;
 char *p;

 printf( BANNER );
 if(argc != 2)
   err( 1, "usage: scape \"string\"\n" );

 // process argv[1]
 p = argv[1];

 if(!*(p+1)) { // special, length == 1
   
   printf( "short input '%c'\n", *p);  

 } else { // length >= 2
   
   while(*p++) { // now *p and *(p-1) exist and are not NUL
     
     // esc is always esc, other escs depend on context
     // this can be changed by moving this code inside the state machine
     if(*(p-1) == CHAR_ESC) {
       switch(*p) {
         case CHAR_ESC:
           printf( "%c", CHAR_ESC );
           ++p;
           continue; // at while()
         default:
           ; // handled in state machine below
       }
     }

     // state machine, defines context (in string etc)
     switch( Is ) {
 
       case I_IDLE:
         if(*(p-1) == CHAR_ESC) {
           
           switch(*p) {
             case CHAR_DQUOTE:
               printf( "ESC-QUOTE\n" );
               ++p;
               break;
             default:
               printf( "ESC-OTHER: '%c'\n", *p );
               ++p;
               break;
           }
         
         } else {
           
           if(*(p-1) == CHAR_DQUOTE) { // string start quote
             Is = I_STRING;
             printf( "\nString: '" );
           } else {
             printf( "%c", *(p-1)); // ordinary char outside string
           }

         }

         break;

       case I_STRING:
           
         if(*(p-1) == CHAR_ESC) {
           
           switch(*p) { // inside string, escape various things
             case CHAR_DQUOTE:
               printf( "%c", CHAR_DQUOTE );
               break;
             case CHAR_CR:
               printf( "\r" );
               break;
             case CHAR_NL:
               printf( "\n" );
               break;
             case CHAR_TAB:
               printf( "\t" );
               break;

             default: // unknown escape, print error and cont.
               printf( "ESC-UNKNOWN: '%c'\n", *p );
               break;
           }
           ++p; // eat both escape and escaped
         
         } else {
         
           if(*(p-1) == CHAR_DQUOTE) { // string closing quote
             Is = I_IDLE;
             printf( "'\n" );
           } else {
             printf( "%c", *(p-1)); // ordinary char inside string
           }
         
         }

         break;
     }  
   }
 }

 printf( "\n\n*** Success. End.\n" );
 return( 0 );
}

// editor settings: set tabs to 2 spaces
// vim:ts=2:sw=2
--snap--

Peter 'state machines always win' P.


2007\12\26@080548 by Gerhard Fiedler

picon face
James Newton wrote:

> If you don't think is the "very" hard, try it and see if you can come up
> with better code than John.

I don't see what one has to do with the other.

I don't think it's very hard, I think that it's harder than handling a
traditional C style escape character, I think that I probably can't come up
with a version that's substantially better than either John's or Peter's
versions, and I don't think that either is "very" hard.

I don't see a contradiction here.

Gerhard

2007\12\26@211938 by Hector Martin

flavicon
face
Gerhard Fiedler wrote:
> I don't think it's very hard, I think that it's harder than handling a
> traditional C style escape character, I think that I probably can't come up
> with a version that's substantially better than either John's or Peter's
> versions, and I don't think that either is "very" hard.

Agreed. Sounds like annoying to code, not hard to code. I've done this
sort of thing before and it's hardly a challenge, but it makes my brain
scream "inefficient!" :)


--
Hector Martin (.....hectorKILLspamspam.....marcansoft.com)
Public Key: http://www.marcansoft.com/marcan.asc

2007\12\26@235551 by William \Chops\ Westfield

face picon face

On Dec 26, 2007, at 6:19 PM, Hector Martin wrote:

> but it makes my brain scream "inefficient!"

I dunno.  It seems to me like by the time you add
real code around it (aggregating strings and commands,
for instance), you ought to wind up with pretty equivilent
code, even if the C is uglier.  Doesn't it amount to one
state of a three-state state machine having ONE additional
decision?  (hmm.  I guess not exactly...)

QuoteChar:
   state_command:
       if (c == STRINGSTART) state = state_string;
       else process_cmd(c);
       break
   state_string:
       if (c == STRINGSTART) state = state_command;
       else if (c == QUOTE) state = state_quote;
       else process_string_char(c);
       break;
   state_quote:
       process_string_char(c);
       state = state_string;
       break;
       

DoubleQuote:
   state_command:
        if (c == STRINGSTART) state = state_string;
       else process_cmd(c);
       break
   state_string:
       if (c == STRINGSTART) state = state_maybequote;
       else process_string_char(c);
       break;
   state_maybequote:
       if (c == STRINGSTART) {  // double quote!
           state = state_string;
           process_string_char(c);
       } else {  // single-quote means end of string
           state = state_command;
           process_cmd (c);
       }

In C it ends up looking messy because the maybequote state duplicates
code from the other states, or would use gotos.  Hopefully a good
would generate similar code to the gotos that would be there anyway.

Or so it seems to me.  Has anyone run the real code that has been
posted for each case through assorted compilers to see what sort of
code is actually generated?

BillW


BillW

2007\12\27@003923 by Forrest W Christian

flavicon
face
From doing similar stuff before, it's not really harder or less hard,
just different.

The main difference between using a hard escape character such as a '\'
and a character which may or may not be an escape is that you have to be
able to deal with all situations where it is *not* an escape.

This really becomes a state machine..... the state machine looks
something like this, in pseudo code:

state=0;
For each character, do:

in state 0: //Not in a quoted string yet.
   if char='"' then state=1; //got a quote, switch to quoted mode
   else, do whatever you do with characters not in outer quotes.

in state 1: // In quoted string, and last char was not a quote.
   if char='"' then state=2; //got a quote.
   else
      do whatever you do with a character in quotes

in state 2: // In quoted string, and last char was not a quote.
   if char='"' then   //another quote
      do whatever you do with a character in quotes (the char is '"' in
this case)
      state=1;  //done with ""
   else (another character)
      // Quote followed by another character is an end quote.
      do whatever you do with characers not in outer quotes
      state=0;

Initially I thought you might have to handle the last '"' differently,
but you don't...   since you don't do anything with a single quote, and
a double quote will already be handled.

Note that in this code things like:

""hello"" are considered as the null string inside quotes, followed by
hello, followed by the null string inside quotes.

But """hello""" will work properly...  That is, the first quote is
always assumed to be the starting quote.

-forrest



2007\12\27@004543 by Forrest W Christian

flavicon
face
Was going to add this to the last message, but hit send too fast...

The rules are then very simple (as shown in the previous message)...

If you get a quote, but you aren't in quotes yet, then it is a start of
the quote quote.

Once you are "in quotes" the rule is that if you get a quote, do nothing
until you receive another character or reach the end of the string/file
(or timeout).   If you receive another character and it is a quote, then
interpret it as a literal quote, otherwise the quote is closed, and the
character/event you received was outside the quotes.

-forrest

2007\12\27@051043 by Peter P.

picon face
William "Chops" Westfield <westfw <at> mac.com> writes:
> for instance), you ought to wind up with pretty equivilent
> code, even if the C is uglier.  Doesn't it amount to one
> state of a three-state state machine having ONE additional
> decision?  (hmm.  I guess not exactly...)

I forgot to mention that my implementation is a form of generalized dichar
interpreter. By moving the first decision (the one that handles CHAR_ESC
CHAR_ESC) into the switch(Is) and setting CHAR_ESC '"' it should work like the
VB version, i.e. "" inside quotes will do as expected. Btw are the orginal
trichar sequences from C supported in new compilers ? (shudder). The
replacements for <>{} for old teletypes ? I think that the first and the last
time I saw those they were in K&R 1st edition.

Peter P.


2007\12\27@154024 by Cristóvão Dalla Costa

picon face
On Dec 24, 2007 6:59 AM, James Newton <EraseMEjamesnewtonspam_OUTspamTakeThisOuTmassmind.org> wrote:

>
>
> I'm amazed: Either my mind is really going or that is a VERY tough problem
> to solve... Apparently the \ is used because the code is so much easier.
> One
> problem is that you must have a two character window on the string and at
> the same time, you have to save the character after an ending single quote
> in case it is a command following the string. Here is what I came up with
> (untested, it's late so it probably doesn't even work as written)


I think the thing is that compilers use properly defined grammars and
lexical scanners so that escaping characters becomes a trivial thing, using
a state machine, which is actually C code auto generated by a compiler
compiler (bison, yacc, etc) from grammar definition files. Doing it "by
hand" you're bound to encounter many special cases that'll make life
difficult.

2007\12\28@055155 by Peter P.

picon face
> Doesn't it amount to one
> state of a three-state state machine having ONE additional
> decision?  (hmm.  I guess not exactly...)

I think that the key to understanding this is it to analyze an extreme case,
like """abc""" . Normally this would be interpreted as "" followed by "abc"""
following normal string conventions. So the 'extra case' is the one that says
that """ outside a string is NOT the empty string followed by a string start,
but a string start followed by a quote inside it (in other words, "" outside a
string is not special). The end case (""" at end of string) is not so bad but
can also be interpreted using standard rules as abc" followed by "", which it
must NOT be. Completing this, one has to match a regexp of the type /[^"]"("")*/
and /("")*"[^"]/ for the string start and end respectively. I suspect that
whoever invented this scheme never had to deal with a stream, he always had the
data in a buffer to work with. Either that, or the VISION came from ABOVE and
the implementor feared for his job.

I don't know why this issue reminds me of a programming language called
Brainf*ck and things like '640k of ram should be enough for anybody'. The code I
posted can be modified to handle the needed exceptions at the top of the loop,
by moving the ESC ESC -> ESC rule in the while loop into the switch(Is) part
where Is == S_INSTRING . This removes the special nature of "" outside a string
and the rest works as before. I hope.

So actually there is no extra case imho. It's just ibfuscated obfuscation.

Peter P.


2007\12\28@063321 by Jan-Erik Soderholm

face picon face
Peter P. wrote:

> like """abc""" .

I read that as :

1 First " : start of a string.
2 Then "" : a quote inside a string.
3 abc : regular text.
4 "" : same as (2).
5 " : end of string.

That is, it's the string "abc" (*including* the quotes!)
expressed as a string constant.

Jan-Erik.

2007\12\28@135121 by Peter P.

picon face
Jan-Erik Soderholm <jan-erik.soderholm <at> telia.com> writes:
> That is, it's the string "abc" (*including* the quotes!)
> expressed as a string constant.

Correct. I was just pointing out that the 'special' meaning of "" outside a
string must bre removed for this to work right. Other than that, it's just an
ordinary dichar interpreter, no magic sauce is needed to make it work.

Peter P.


More... (looser matching)
- Last day of these posts
- In 2007 , 2008 only
- Today
- New search...