Searching \ for '[PIC:] Sorting through millions of numbers' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: www.piclist.com/techref/microchip/devices.htm?key=pic
Search entire site for: 'Sorting through millions of numbers'.

Exact match. Not showing close matches.
PICList Thread
'[PIC:] Sorting through millions of numbers'
2004\04\21@121343 by Kresho

flavicon
face
G'day list.

I have a situation as follows:

A machine will have a mag swipe card reader. The cards are encoded with any
number between 0 and 10,000,000 representing an account number. When a card
is swiped i must store the card number, time and date - 9 byte chunks of
data (24bit card number and 6bytes of date/time). When the card is swiped
again i must recall the same information and can then delete it.

Storing heaps of data is no problem - MMC or similar

However, i see an issue in catering for all these cards. Let's say that a
new person swipes his card every 10 seconds. That's 8640 swipes in a day. So
i must store 8640 chunks of data somewhere. The same person may not swipe
his card again for a number of days or weeks. Suddenly, there are thousands
(millions?) of chunks of stored data.

I can see 2 options:

1) Allocate a linearly determined memory address for each particular card.
eg: card #0 data at address h'00', card #1 data at address h'09', card #XYZ
at address 9*XYZ, etc. I would need 10,000,000 x 9 bytes or approx 90mb of
memory. That's do-able with a 128mb MMC, but rather unelegant.

2) I can store the 9byte chunks as i receive them in a free slot in memory.
When the card is swiped again i go through the stored data, from the start
of memory, till i find a match based on the card number, and then i can
access the time/date for that card and delete it leaving a free slot again.
However, if that card is the 1 millionth entry in this memory, it will take
considerable time to find that card number. Reading a million RAM locations
wouldn't be that bad (seconds) but accessing an MMC takes a bit of time and
i can see the time blowing out of all proportions.

Does anyone have any ideas how this could be done in any way other than
option 1 above? My biggest reason for not liking this option is that there
is talk of upscaling the number of possible card numbers to 1x10^12, which
suddenly makes any MMC (or other memory for that matter) look useless.

Rgds,

Kresho Sprem

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@123837 by Ake Hedman

flavicon
face
You can use a B-tree ( http://www.bluerwhite.org/btree/ ) or a double
linked list for your storage.

/Ake

-----Ursprungligt meddelande-----
Från: pic microcontroller discussion list
[spam_OUTPICLISTTakeThisOuTspamMITVMA.MIT.EDU]För Kresho
Skickat: den 21 april 2004 18:14
Till: .....PICLISTKILLspamspam@spam@MITVMA.MIT.EDU
Ämne: [PIC:] Sorting through millions of numbers


G'day list.

I have a situation as follows:

A machine will have a mag swipe card reader. The cards are encoded with
any
number between 0 and 10,000,000 representing an account number. When a
card
is swiped i must store the card number, time and date - 9 byte chunks of
data (24bit card number and 6bytes of date/time). When the card is
swiped
again i must recall the same information and can then delete it.

Storing heaps of data is no problem - MMC or similar

However, i see an issue in catering for all these cards. Let's say that
a
new person swipes his card every 10 seconds. That's 8640 swipes in a
day. So
i must store 8640 chunks of data somewhere. The same person may not
swipe
his card again for a number of days or weeks. Suddenly, there are
thousands
(millions?) of chunks of stored data.

I can see 2 options:

1) Allocate a linearly determined memory address for each particular
card.
eg: card #0 data at address h'00', card #1 data at address h'09', card
#XYZ
at address 9*XYZ, etc. I would need 10,000,000 x 9 bytes or approx 90mb
of
memory. That's do-able with a 128mb MMC, but rather unelegant.

2) I can store the 9byte chunks as i receive them in a free slot in
memory.
When the card is swiped again i go through the stored data, from the
start
of memory, till i find a match based on the card number, and then i can
access the time/date for that card and delete it leaving a free slot
again.
However, if that card is the 1 millionth entry in this memory, it will
take
considerable time to find that card number. Reading a million RAM
locations
wouldn't be that bad (seconds) but accessing an MMC takes a bit of time
and
i can see the time blowing out of all proportions.

Does anyone have any ideas how this could be done in any way other than
option 1 above? My biggest reason for not liking this option is that
there
is talk of upscaling the number of possible card numbers to 1x10^12,
which
suddenly makes any MMC (or other memory for that matter) look useless.

Rgds,

Kresho Sprem

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@124252 by M. Adam Davis

flavicon
face
This is within the field of Hashing if you expect that you won't ever
need to store all 10,000,000 entries at once.  If you don't have a good
upper limit of the maximum entries you need to store, then hashing isn't
a good solution.

Hashing is taking the input, performing some operation on it (usually
taking a larger number and making it smaller) and then outputting the
result.  The resulting number, which is smaller, tells you where you can
find that data.  So, for instance, if you have 10 million entries, and
you expect that you'll only ever store 100 at a time, then the operation
can be as simple as [card number] mod 100.

This inevitably leads to collisions, where two cards end up with the
same hash number.  This is the interesting area, and there are two
well-known ways of overcoming this.

1) Look at the spot the hash points to.  If it's empty or the number
matches, great use it.  If it's full and the number doesn't match then
simply go along the hash array linearly until you find an empty spot or
the original card number and store it there.   The search process is
considerably shortened by this method, even thoug you are still using a
linear search.  In this case the hash array has to be as large as the
data you'll need to store - ie, you can run out of space if the array is
too small.  Make sure the program accounts for this and either
occasionally deletes old entries, or produces and error message when
approaching a full array.  You could also refactor the entire array and
create a larger hashing function, but it'd be difficult to use the card
reader during this time, and testing is more difficult.

2) Each entry in the hash array points to a linked list of data
structures.  So hash the card number, go to that spot in the array and
you'll find the biginning pointer to a linked list of entries.  Follow
the list until you find the card number, or go to the end and add a new
entry, allocating a new chunk of data in the storage medium.  This
overcomes the problem of running out of array space since you can fill
out the entire 10,000,000 entry space if the storage card is large enough.

However, 64MB cards are cheap, and just because a solution isn't 'cool'
doesn't mean it's a bad solution.  If I were in your shoes, I'd simply
map a single location to each card (don't need the card number, it's
mapped into the location) and store the 6 byte date/time into the
appropiate spot.  It's easier to code, test, and the single fastest
solution.

But if you want an 'elegant' solution, then you can play with hashing.
Note that many memory cards have 512 byte blocks.  You can factor this
into the hash algorithm, and place all the cards for a certian hash into
one block.  This means that you won't have to read more than one data
block to look up any one card number.

Good luck!

-Adam

Kresho wrote:

{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@124500 by Bruce Partridge

flavicon
face
How about a binary tree implemented as a linked list.  Or 10 binary trees
based on the first digit of the number.

Of course if the numbers are issued in either sequential order or as a count
of some physical property, the group starting with 1,2,3, and 4 will be more
heavily used than the group starting 5,6,7,8,9

Bruce Partridge
http://www.rebreather.ca

> {Original Message removed}

2004\04\21@130544 by Jan-Erik Soderholm

face picon face
Kresho wrote :

> A machine will have a mag swipe card reader. The cards are
> encoded with any number between 0 and 10,000,000 representing
> an account  number.

An "account number" as in "bank account number" ? Or is it
some kind us "user account number". Ir simply just a way
to identify each individual card without any other meaning ?

> When a card is swiped i must store the card
> number, time and date - 9  byte chunks of data (24bit card number
> and 6bytes of date/time). When the card is swipedagain i must recall
> the same information and can then delete it.

You didn't mentioned "PIC" one single time, but I'd guess that
that's what you'd like to use, right ? :-)

What is the importance of this application ?
Checking people passing in to and out of some
nuclear power plant ? Or something "simpler" ?
The actual solution whould probably be *highly*
dependent on this parameter.

What happens if you suddenly lose all data ?
Is that allowed at all ? That's also a very important
design parameter !!


Anyway...

I'd look at some solution where the PIC and card reader
is an I/O device to some computer system running a
decent database to keep track of the persons passing.
Could be UNIX, VMS, whatever.

The PIC doesn't realy store anything, it just passes the
card-number and timestamp over to the main system.
Could use a simple serial (RS232) line.
The main system could be whatever needed to cater for
the security and safety, like having shadowed disks and
so on. Maybe runing an OS that can e clustered to get
true 24*7 uptime, if needed (?).

It would also be rather easy to expand the solution to
multiple registration stations. It doesn't matter if the
person passes "in" and "out" (or whatever they actualy
are doing !) using defferent readers.

> My biggest reason for not liking this option [PIC/MCC]
> is that there is talk of upscaling the number of possible
> card numbers to 1x10^12,...

Which definitly tells me that a "real" computer solution probably
is what's needed...

Jan-Erik.

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@132835 by John Ferrell

face picon face
Study up on "Sparse Matrix" in whatever Computer Science books you have
available. Most of the really good work in this area happened when storage
was precious, before the mid 1980's.
Depending on your volume you may get to more than one machine. Assuming your
numbers are base 10, your search algorithm should consider that. IF these
numbers are some kind of self check code, you may be able to exclude a great
number of them by simply using a validation routine.

The telephone company used to use a similar system to time long distance
phone calls in the 1960's using a wide Paper tape as storage media. However,
the tape was copied to mag tape on an IBM 7074 system for processing.
John Ferrell
http://DixieNC.US

{Original Message removed}

2004\04\21@134559 by Bob Ammerman

picon face
Think 'hash table'

Bob Ammerman
RAm Systems

----- Original Message -----
From: "Kresho" <piclistspamKILLspamADVANCEDTECHNIKA.COM>
To: <.....PICLISTKILLspamspam.....MITVMA.MIT.EDU>
Sent: Wednesday, April 21, 2004 12:14 PM
Subject: [PIC:] Sorting through millions of numbers


> G'day list.
>
> I have a situation as follows:
>
> A machine will have a mag swipe card reader. The cards are encoded with
any
> number between 0 and 10,000,000 representing an account number. When a
card
> is swiped i must store the card number, time and date - 9 byte chunks of
> data (24bit card number and 6bytes of date/time). When the card is swiped
> again i must recall the same information and can then delete it.
>
> Storing heaps of data is no problem - MMC or similar
>
> However, i see an issue in catering for all these cards. Let's say that a
> new person swipes his card every 10 seconds. That's 8640 swipes in a day.
So
> i must store 8640 chunks of data somewhere. The same person may not swipe
> his card again for a number of days or weeks. Suddenly, there are
thousands
> (millions?) of chunks of stored data.
>
> I can see 2 options:
>
> 1) Allocate a linearly determined memory address for each particular card.
> eg: card #0 data at address h'00', card #1 data at address h'09', card
#XYZ
> at address 9*XYZ, etc. I would need 10,000,000 x 9 bytes or approx 90mb of
> memory. That's do-able with a 128mb MMC, but rather unelegant.
>
> 2) I can store the 9byte chunks as i receive them in a free slot in
memory.
> When the card is swiped again i go through the stored data, from the start
> of memory, till i find a match based on the card number, and then i can
> access the time/date for that card and delete it leaving a free slot
again.
> However, if that card is the 1 millionth entry in this memory, it will
take
> considerable time to find that card number. Reading a million RAM
locations
> wouldn't be that bad (seconds) but accessing an MMC takes a bit of time
and
{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@140225 by M. Adam Davis

flavicon
face
Having thought about it a little further, I suspect your project will
benefit from a practical analysis.

First of all, define how much data you need to store and how long you
have to store it before you can clear it.  I doubt you need to store
information that's over a year old, for instance.  Chances are good that
this type of project requires the data to be offloaded on a fiarly
regular basis, or moderately old data is useless (ie, a simple building
security system may only need data going back 2 weeks).

Even if you had a person standing at the reader with all 10,000,000
cards swiping a new card once every second until they exhausted all 10
million cards it would take 4 months to go through all those cards.

If you have more than 1,000 people a day swiping their card then data
storage and lookup is not the worst problem you'll run into before the
project is done.

So try and come up with the maximum number of uses per day, then the
maximum days between offloading (or maximum needed backlog, etc) and
that will tell you how many memory slots you'll need.  It won't matter
how many cards exist, only how many cards will be swiped at that
location between clean ups and off loads.

If you need to keep data indefinitely, then you must allocate space for
each possible card number, and if you're doing that then there's /no/
reason to use a special storage mechanism.  If you must accept 1
trillion card numbers then I'd strongly suggest looking into compression
and using a real computer system (terabyte raid array) because nothing
else at this point in time is going to hold the information you want for
that many cards 'forever'.

-Adam

Kresho wrote:

{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@184106 by Russell McMahon

face
flavicon
face
> A machine will have a mag swipe card reader. The cards are encoded with
any
> number between 0 and 10,000,000 representing an account number. When a
card
> is swiped i must store the card number, time and date - 9 byte chunks of
> data (24bit card number and 6bytes of date/time). When the card is swiped
> again i must recall the same information and can then delete it.

Answers so far sound good to me - hashing, B Tree, brute force memory
locations and more.
I assume this is for  something comparable to a car wash system - you buy a
"ticket" which is recorded in the system and it is removed once utilised.

I'd like to address another issue. You need to be really really really
confident of your data integrity here. With such a large number of potential
cards , if it is possible for one card to be read wrongly and thereby return
a valid code from another card, you are in deep trouble. Depending on what
the use of the card is this could be just customer annoyance (car washes) or
substantial expense (trying to find out where Elvis is inside the secure
area when he's already left the building.) You may not have control over the
encoding, but if you do, you want as much data integrity as you can get.
10,000,000 potential numbers is  binary 24 bits. You want to add at least a
32 bit CRC to that, and possibly forward error correcting.


       RM

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@205631 by Kresho

flavicon
face
Thanks for all the replies.

> I assume this is for  something comparable to a car wash system -
> you buy a
> "ticket" which is recorded in the system and it is removed once utilised.

Good guess but not quite. What it's for is a snack promotion. People will
receive a card encoded with a number. This number is stored with the
time/date as previously specified. Then the card holder swipes their card on
a vending machine to get their free snack. There will obviously be a limited
time for them to redeem said snack, and that's why the date is required.

> You didn't mentioned "PIC" one single time, but I'd guess that
> that's what you'd like to use, right ? :-)

Yes, or something similar. I'm dabbling with some hitachi H8s at the moment
which might be a better fit for this application.

The b-tree and hashing schemes seem like a possibility. However, I think
Adam's response with his practical analysis is probably fitting here. If the
limited time for redemption is kept short enough I can just go through the
data every now and then and delete expired entries and keep the list lean.

If i come up with another solution i'll keep you posted.

Rgds,

Kresho Sprem

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@222317 by M. Adam Davis

flavicon
face
If it's a snack promotion that must be used in a specified amount of
time, I'd go about it a completely different way:

Encode the card number and the date it has to be used by on the card.
You might want to encrypt this information so it can't be easily duplicated.

Reserve one bit of memory for each possible card number on the memory
card.  If the card has been used, the bit will be set to one.  If the
card is too late (date) then don't accept it.

Here's the trick: You only need a few thousand numbers.

The cards expire on a regular basis.
The date is stored on the card
The machine only has to store information about used cards during their
valid period.

Let's say that your cards are valid for 4 weeks.  Let's then say that
you'll never give out more than 2048 cards each week.  You encode (and
encrypt, using a symmetric algorithm such as TEA which has a PIC
implementation already) both the date and a number (0-2047) on the card.

You only need 1K of memory to hold all the information.  This can be
held on the PIC FLASH.  Furthermore, you'll never have to maintain it -
it'll automatically remove entries as needed.

You use one bit per number, and a block of 2048 bits for each week that
the card expires in.

When someone remits their card, look at the date.  If it's expired,
reject or eat the card and display "EXPIRED".  If it isn't expired, look
into the block that's holding that particular weeks expiration
information and find the bit that goes with that number.  If the bit is
set, reject or eat the card "USED".  If it isn't set then set it and
dispense the snack.

Once a week, at midnight (or 5 minutes later to be nice) purge one
week's worth of data (2048 bits) and start using that block for
expirations that happen 4 weeks in the future.

Please keep in mind that there are such things as leap days and, if
applicable to your region, daylight saving time.  These can be dealt
with in software, but it can be tricky to test completely.  You don't
want, for instance, to purge two weeks overnight because the time changed.

This can all be implemented in a single, low end flash PIC.  The hand's
off aspect makes it especially nice.  No need to track more than 8
thousand cards at once, and no need to store more than a bit each.  Of
course the numbers change depending on your application, but the idea is
the same.

I hope this helps.

-Adam

Kresho wrote:

{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\21@225848 by Kresho

flavicon
face
G'day Adam.

> Encode the card number and the date it has to be used by on the card.
> You might want to encrypt this information so it can't be easily
> duplicated.

Can't do. The cards already exist. They are membership cards encoded with a
7 digit number and some other useless (to me) info only. That's why my idea
was to go into a setup mode and swipe the card once to set a token for a
snack, then let the user redeem it by swiping their card at a later time.
The offer may be repeated in the future so it should be a repeatable
process.

Come to think of it, i've found another good reason to use a linear address
space: write endurance. If each card has it's own data address then the
endurance only becomes an issue if a particular user(s) uses their card
endurance times - that's not going to happen if the endurance is 1million
writes.

Nice idea though and very clever approach!

Rgds,

Kresho Sprem

> {Original Message removed}

2004\04\21@232832 by Bob Ammerman

picon face
Ok, lets go back to the basics of the problem:

How many total cards will you system have to handle? I expect this is far
less than the possible 10 million.

Are you concerned about forged cards?

Do only certain members get a snack token? How are they determined?

Are the cards read-only or do they some read-write capability you can use?

How many redemption stations will be needed?

How many transactions per day do you anticipate?

Bob Ammerman
RAm Systems




----- Original Message -----
From: "Kresho" <EraseMEpiclistspam_OUTspamTakeThisOuTADVANCEDTECHNIKA.COM>
To: <PICLISTspamspam_OUTMITVMA.MIT.EDU>
Sent: Wednesday, April 21, 2004 11:00 PM
Subject: Re: [PIC:] Sorting through millions of numbers


> G'day Adam.
>
> > Encode the card number and the date it has to be used by on the card.
> > You might want to encrypt this information so it can't be easily
> > duplicated.
>
> Can't do. The cards already exist. They are membership cards encoded with
a
> 7 digit number and some other useless (to me) info only. That's why my
idea
> was to go into a setup mode and swipe the card once to set a token for a
> snack, then let the user redeem it by swiping their card at a later time.
> The offer may be repeated in the future so it should be a repeatable
> process.
>
> Come to think of it, i've found another good reason to use a linear
address
{Quote hidden}

> > {Original Message removed}

2004\04\22@001255 by Kresho

flavicon
face
> How many total cards will you system have to handle? I expect this is far
> less than the possible 10 million.

There are up to 10million different card numbers but i can't ever see this
many cards being used. So, for an example, lets say that one person uses the
card every 10 seconds (even that is unlikey) = 8640 cards in a day. Lets
also say that the redemption time is 14 days. Thats 14x8640 = 120960 cards
that could pass through in that time.

> Are you concerned about forged cards?

Nope!

> Do only certain members get a snack token? How are they determined?

That's up to management. All i told them is they have to swipe the card
(during a setup or recharge mode) to activate a token.

> Are the cards read-only or do they some read-write capability you can use?

Read only.

> How many redemption stations will be needed?

Not sure as yet, but it may be more than 1. If it's more than 1 then i would
have a single "database board" with all the card usage info and seperate
boards at each machine. The machines would send the card number to the
database board and get the token via some sort of connection yet to be
determined, probably a simple multidrop rs485 thing.

> How many transactions per day do you anticipate?

Who knows? I'm taking 1 per 10 seconds as a worst case senario.

Rgds,

Kresho Sprem

{Quote hidden}

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@023349 by Bob Axtell

face picon face
Looks like the car wash enable situation all right.

I would save the 24 bits (3 bytes) + 2-byte "minutes-per-day" + 2-byte
"day-of-year" field.  That's 7 bytes with a coupla bits unused for flags
 such as "discard after 24hrs" or "discard after 30 days", etc.

I'd organize the field so that the DOY is first, followed by MOD. That
way, discards more easily performed than with random searches.

--Bob


Russell McMahon wrote:
{Quote hidden}

--

 Replies: NOTE-Script, EXE,BAT and COM
    files will be rejected by server
             --------------
               Bob Axtell
       PIC Hardware & Firmware Dev
         http://beam.to/baxtell
             1-520-219-2363

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@054925 by michael brown

picon face
"Bob Ammerman" wrote:


> Think 'hash table'

I was thinking b-tree.

michael brown

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@095344 by Bob Ammerman

picon face
I would suggest a system composed of the following parts:

1: The MASTER station. This would contain a card reader used to set a token
on a card. It would also serve as the database repository, and would drive
an RS485 or similar link to the SLAVE stations to poll for card insertions.
This device would need some form of mass storage, so it could be anything
from a PIC with a bunch of external flash memory to a full-blown PC. Again,
I would use some form of hashtable to store the data, designing it to level
wear on the flash (if flash was used).

2: One or more SLAVE stations, each associated with one snack dispenser.
These would be a simple processor that reads the card, responds to polls and
commands from the MASTER, sends a dispense command to the dispenser, and
lights LEDs or an LCD to indicate things like: 'RETRY CARD',  'NO SNACK
PROGRAMMED', 'EXPIRED',  'DISPENSIING', etc


----- Original Message -----
From: "Kresho" <RemoveMEpiclistTakeThisOuTspamADVANCEDTECHNIKA.COM>
To: <spamBeGonePICLISTspamBeGonespamMITVMA.MIT.EDU>
Sent: Thursday, April 22, 2004 12:02 AM
Subject: Re: [PIC:] Sorting through millions of numbers


> > How many total cards will you system have to handle? I expect this is
far
> > less than the possible 10 million.
>
> There are up to 10million different card numbers but i can't ever see this
> many cards being used. So, for an example, lets say that one person uses
the
{Quote hidden}

use?
>
> Read only.
>
> > How many redemption stations will be needed?
>
> Not sure as yet, but it may be more than 1. If it's more than 1 then i
would
{Quote hidden}

> > {Original Message removed}

2004\04\22@110804 by M. Adam Davis

flavicon
face
While it's fairly acedemic, I'm interested in your thoughts one why
you'd choose a btree over a hash table.

If, as I am assuming, these are ID cards that are dispensed
sequentually, then at any given time the btree will be unevenly
weighted.  This could lead to very deep, sparse trees that could require
more storage space than a simple linear array.

Of course, this may be a limitation to my understanding of a btree -
which is why I'm asking.  I've only implemented very simple trees or
very complex self-balancing trees.  The self balancing method I used,
however, would not be well suited to flash storage since more than one
element in the tree might change for each newly inserted element, and
the simple tree did not attempt to maintain any balance since there were
no memory bounds in that project.

-Adam

michael brown wrote:

{Quote hidden}

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@143728 by Ben Hencke

picon face
> > > Think 'hash table'
> >
> > I was thinking b-tree.
> >
> > michael brown

I would really reccomend that you stay away from hashing at all. Never
hash a serial number that already fits into a reasonable ammount of
space. Hashing is really only useful when trying to take a large key of
data (ie a string, file, whole record, etc) and make it into a new key
that can be easily sorted/indexed/etc. A card # that is 1 to 10 milion
already fits nicely into 24 bits. Hashing this small ammount of data
will only complicate things.

A b-tree structure would be optimal for speed, but with the small
ammount of records you will be dealing with, a keyed linked list (aka
dictionary) will be far easier to code on a PIC.

Start by breaking your data storage (flash, battery backed sram, hdd,
etc) into blocks/records that are big enough to hold the data you need.
That way you can index each record. In this case the card#, ptr to next
record in list, date activated, data, and anything else you plan to
store.

When you insert a record (by scanning the card) traverse the list until
you find the 2 records that your new record will need to insert itself
between. That should be as simple as a less than check on the card #s.
You can even speed that up by comparing it byte by byte (msb first).
Then allocate a new record. You can keep a counter and incrementally
scan the memory for free/expired records to average the wear on the
flash. Make the new record point to the next in list and make the
previous in list point to the new record.

When you need to redeam a card, just run through the list. When you
find it, make the previous record point to the record after the
redeamed record and clear the redeamed record. double check the date
only if you want minute by minute expiration.

Every midnight scan the list and delete (as described above) records
that expire.

Each time you insert or remove a record you will end up altering 3
records. If you have 8640 activations and 8640 redemptions/expirations
a day thats 51,600 distributed flash/eeprom record writes per day. If
your flash was just big enoug to hold 120960 records, thats average of
about half a write per day. The larger a flash you use, the more
distributed the wear will be. If it is a heavily used vending machine
in a long term installation you might want to use battery backed sram.
Maxim has good lithium backed srams with an integrated RTC that lasts
10 years without power in a nice dip package. They are pricy for sram,
but since you share the data source it might be worth it in terms of
design complexity and service calls.


Hope that helps,
 Ben Hencke









__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25"
http://photos.yahoo.com/ph/print_splash

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@150019 by Wouter van Ooijen

face picon face
> Of course, this may be a limitation to my understanding of a btree -
> which is why I'm asking.  I've only implemented very simple trees or
> very complex self-balancing trees.  The self balancing method I used,
> however, would not be well suited to flash storage since more than one
> element in the tree might change for each newly inserted element, and
> the simple tree did not attempt to maintain any balance since
> there were
> no memory bounds in that project.

I implemented a b+ tree once (in my first job fresh from university).
IIRC the algorithm certainly allows for an insert to force updates of
lots of nodes, but there is a statistical average that is much lower.
But sequential inserting might defeat this (the statistics are based on
randomness). IIRC I did something to overcome exactly this.

But as said, when sorted traversal is not needed a hash table is the way
to go.

Wouter van Ooijen

-- -------------------------------------------
Van Ooijen Technische Informatica: http://www.voti.nl
consultancy, development, PICmicro products

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@153341 by Jan-Erik Soderholm

face picon face
Ben Hencke wrote :

> I would really reccomend that you stay away from hashing at all. Never
> hash a serial number that already fits into a reasonable ammount of
> space. Hashing is really only useful when trying to take a
> large key of data (ie a string, file, whole record, etc) and make it into a
> new key that can be easily sorted/indexed/etc.

A hash index isn't ment to keep records "sorted" at all.
A hash index should be used when you need optimum performance
in retreiving a single record from a table. A properly designed
hash table/index should be able to retreive a specific fully
qualified record in one single I/O. A b-tree needs as many I/O's
as there are currently "levels" in the tree.

On the other hand, if you need range retreivals (using part of the key),
a B-tree is far better and a hash index sucks...

This applies to databases in general, maybe not to the case at hand.

So there is no "best" here, it, as usual, depends.
And the size of the key has not much to do with this decision.

> A card # that is 1 to 10 milion
> already fits nicely into 24 bits. Hashing this small ammount of data
> will only complicate things.

Maybe, but a hash it gives the fastest retreival of the record.

Jan-Erik

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\22@212021 by Jake Anderson

flavicon
face
it sounds like you will have your information loaded every once in a while
from a PC
why not have your pc upload a sorted list then binary search the mmc.

{Original Message removed}

2004\04\23@014012 by Ben Hencke

picon face
--- Jan-Erik Soderholm <TakeThisOuTjan-erik.soderholmEraseMEspamspam_OUTTELIA.COM> wrote:
> Ben Hencke wrote :
>
> > I would really reccomend that you stay away from hashing at all.
> Never
> > hash a serial number that already fits into a reasonable ammount of
> > space. Hashing is really only useful when trying to take a
> > large key of data (ie a string, file, whole record, etc) and make
> it into a
> > new key that can be easily sorted/indexed/etc.
>
> A hash index isn't ment to keep records "sorted" at all.

A "hash" can be used for all sorts of things including as keys in a
sorted array when it is impractical to sort the original key itself.

A hash table is sorted with the exception of the collisions. You use a
hashing function to reduce the data into a smaller key or index and
store the data (or pointers to the data) in a simple table or array.
Each element is sorted by its index/hash by virtue of being in an
array. You can access any record in one i/o because with its hash value
you know where in the array it is.

See
ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
It has a nice animation too :)

IIRC the max simultaneous number of records required was 100,000 or so.
A minimum hash table that could handle the required number of records
would need a 17bit hash value. Thats still something like a 70% overlap
with a perfect hash function.

Most hash tables are an array of pointers to records, but for
optimization sake, I would group the whole record into the array
instead.

The OP won't be able to use a flat or rehash collision handling method
for a hash table because every deletion would require a rehash of the
table. You can optimize the rehash a little with some flags, but it
will still be a nightmare wrapped in a bad dream. So linked lists or
some other method would have to be used for each record that had a
collision further adding to the layers of complexity and storage
requirements.


{Quote hidden}

Agreed :-)

> And the size of the key has not much to do with this decision.
>

I disagree, the size of the key has a lot to do with picking the
algorithm to use. If the key was 8 bits, no one would be arguing about
what method to use ;-) If the key was 10k of data (maybe a data file or
sound sample), you would want to hash that down to something
reasonable.

> > A card # that is 1 to 10 milion
> > already fits nicely into 24 bits. Hashing this small ammount of
> data
> > will only complicate things.
>
> Maybe, but a hash it gives the fastest retreival of the record.
>

A flat array with an element for each of the 10 million cards would
give the fastest access as even hash tables need to deal with
collisions. A hash table that was sized to fit would have horrible
performance when it approaches capacity. You would need significant
excess table size to get decent performance while running at capacity.

I don't think that fast access times are required. The OP is using this
for a vending machine so taking a second to look it up is probably
acceptable.

It all really depends on the application. I favor the sorted linked
list because it is simple to implement and very storage efficient. In
this application it is also easy to speed up with a few tweaks. I
wouldn't use it for anything that required near real-time lookup.

ttyl,
 Ben Hencke




__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25"
http://photos.yahoo.com/ph/print_splash

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2004\04\26@214432 by Kresho

flavicon
face
G'day list.

Thanks for everyone's ideas. The info on hashing and b-tree's has been
enlightening!

I have finally decided on a scheme which has taken on some of the points
raised in the thread but uses a practical approach which i think will work,
is scalable in some sort of way and should be easy to implement.

There are up to 10million different cards, but it is nigh impossible that
that many cards would be used within any realistic span of time (months).

Using a 128MB MMC:

I'm going to dedicate 40million bytes of memory to a list of 10million 32bit
links for each card. This will allow the fastest possible time to access a
link as it's just a flat, linear array.

That leaves 88MB (or thereabouts anyway) for data. This data space will be
divided into appropriately sized data chunks. For the purpose of testing
this idea, i'm assuming each chunk of data is 64 bytes: this can hold the
card number (4 bytes) and up to 60 bytes of other info like the time a token
is set, the time it is retrieved, etc. This also allows for multiple tokens
or other info if that need ever arises. 88MB / 64bytes = 1.375 million
entries at any given time. I can live with that.

When a card is presented, the link table is looked up. If it's empty, a new
link is created to the next location in the data space and that data space
is written with appropriate data. If it's already got a valid link, that
link is updated to the next location in the data space and the old data
(from the old link) + new additions / modifications are copied to the data
space. When data space is full it just rolls over and starts again.

So, this should work quite quickly, the worst case being reading an old
link, creating a new link, reading the old data and copying it to the new
data area. It should also provide even wear of the data space so that write
endurance becomes a non-issue. And it should need no other maintenance.

Does that sound like a plausible idea?

Rgds,

Kresho.

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2004\04\27@032643 by hael Rigby-Jones

picon face
{Quote hidden}

How do you know where the next 64byte chunk of free space is located? If you
have an index pointer, it's going to get a lot of writes.

Regards

Mike




=======================================================================
This e-mail is intended for the person it is addressed to only. The
information contained in it may be confidential and/or protected by
law. If you are not the intended recipient of this message, you must
not make any use of this information, or copy or show it to any
person. Please contact us immediately to tell us that you have
received this e-mail, and return the original to us. Any use,
forwarding, printing or copying of this message is strictly prohibited.
No part of this message can be considered a request for goods or
services.
=======================================================================
Any questions about Bookham's E-Mail service should be directed to
EraseMEpostmasterspambookham.com.

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2004\04\27@041011 by Alan B. Pearce

face picon face
>How do you know where the next 64byte chunk of free space is located?
>If you have an index pointer, it's going to get a lot of writes.

The trick here is to make use of a sneaky feature of MMC cards. they come
formatted as a FAT file system, or can be easily formatted as such by any
USB reader that handles them. You can then use the FAT block number as part
of the key pointer to find a particular field you want to reference. It does
involve putting FAT file handling code into the PIC, but that has been done
before.

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

More... (looser matching)
- Last day of these posts
- In 2004 , 2005 only
- Today
- New search...