Data storage

From: Mikee28 Oct 2010 14:35
To: ALL1 of 12
I'm making an application that does the following:

1) Downloads a list of processes and memory offsets
2) Monitors for those processes. If it finds any, it'll start monitoring those memory offsets and sending the data over to a node.js web server, which'll in turn store the data in a mysql database.

It's for monitoring statistics in mmorpgs. For example, the initial packet from the webserver might be something like this: (in json for simplicity)

javascript code:
 
[
  {
    Process: 'lotro',
    Memory : {
      PvP_Ranking : {
        Interval: 30000,
        Offsets: [ 0x4592, 0x254, 0xCE ]
      },
      PvP_Rating : {
        Interval: 30000,
        Offsets: [ 0x412C, 0x92, 0xF2 ]
      }
    }
  },
  {
    Process: 'wow',
    Memory : {
      PvP_Ranking : {
        Interval: 30000,
        Offsets: [ 0x4412, 0x524, 0xCF ]
      },
      PvP_Rating : {
        Interval: 30000,
        Offsets: [ 0x41C, 0x192, 0xF2B, 0x11F ]
      }
    }
  }
]
 


Every [interval], it'll send that data back up to the node.js web server.

The website will then let you browse your stats over time and generate graphs and make predictions and whatnot.

I'm trying to work out how to best store the data in the database.

My initial thought was just to have a single database like this:

UserId    | ProcessName     | Character_Name  | TimeStamp  | Key           | Value
------------------------------------------------------------------------------------
1         | lotro           | Mikee           | 23123113   | PVP_Ranking   | 1000
2         | wow             | Dave            | 23123114   | XP            | 12334


But I'm worried that this table could get very big very quickly.

In theory, you'll have maybe 2-3 stats monitored per game, Expecting around 40 people using it, probably each playing about 25 hours per week. Update times around 15 minutes per stat.

3 stats * 40 people * 25 hours * 4 per hour = 12,000 rows per week

This table could probably become a bit of a beast very quickly.

Would there be better ways to store this? a table for each game maybe? the stats might update at different intervals, though..

Hmmmmmm!... ?
From: af (CAER)28 Oct 2010 16:06
To: Mikee 2 of 12
One option might be to gradually reduce the resolution for older data - i.e. for data that's a few days old, take for example 10 consecutive entries and turn them into 9 entries with the data averaged over them. Repeat this until you get to whatever minimum resolution you want - maybe the average for a whole day's data once you get to really old stuff. You could have this run as a batch job at a certain time of day or whatever.
From: Matt28 Oct 2010 16:09
To: Mikee 3 of 12
processes
----------
process_id primary key, auto increment,
name,
version

process_offsets
---------------
process_offset_id primary key, auto increment,
process_id,
name,
offset

characters
----------
character_id primary key, auto increment,
user_id,
name

character_stats
---------------
stat_id primary key, auto increment,
user_id,
character_id,
process_id,
value

That's what I'd do for starters, I think.

The reason for the process and process_offsets tables would be to restrict users from being able to monitor only applications you support. Don't want them to be able to monitor any process and use your website to record the data. That would be bad with a capital "oh fuck there is a lawyer knocking at my door"!

Version string on processes is for different versions of the process exe. I'm betting that when patches get released the offsets will very likely change or new ones will get added.
From: af (CAER)28 Oct 2010 16:14
To: af (CAER) 4 of 12
To elaborate, say you had this data set for the 5 consecutive readings of a few days ago

3.00 | 6.10 | 4.50 | 3.20 | 7.80

Then you'd reduce this down to

4.55 | 5.30 | 3.85 | 5.50

Note that my maths isn't great and I largely skived off in statistics classes when I did maths A level about 12 years ago, but you get the idea.
From: Dan (HERMAND)28 Oct 2010 16:52
To: Matt 5 of 12
quote:
The reason for the process and process_offsets tables would be to restrict users from being able to monitor only applications you support. Don't want them to be able to monitor any process and use your website to record the data. That would be bad with a capital "oh fuck there is a lawyer knocking at my door"!



What? Why?
From: af (CAER)28 Oct 2010 16:55
To: Dan (HERMAND) 6 of 12
At a guess, they may inadvertently (or deliberately) record something illegal, which could get them and the website's owner/host into trouble.
From: Dan (HERMAND)28 Oct 2010 17:05
To: af (CAER) 7 of 12

To be honest, I misread and didn't see that the intention was for this to be a hosted service.

 

That said, I'm still not sure there's a legal issue there - in the same way that Hotmail doesn't have it's doors kicked when paedos use it.

From: Mikee28 Oct 2010 18:23
To: Matt 8 of 12

Hmm yeah version is a good point.

 


Ok thanks everyone.

 

 

 

Now, anyone know ASM? Having the hardest time trying to track down some of these damn offsets :(

From: THERE IS NO GOD BUT (RENDLE) 2 Nov 2010 10:37
To: Mikee 9 of 12
Does it have to be MySQL? It'd be easier with Mongo or Couch for this sort of thing.
From: 99% of gargoyles look like (MR_BASTARD) 2 Nov 2010 11:42
To: THERE IS NO GOD BUT (RENDLE) 10 of 12
The trouble with using Mongo is first you have to defeat Ming the Merciless.
From: patch 2 Nov 2010 12:12
To: 99% of gargoyles look like (MR_BASTARD) 11 of 12
Candygram. Sorts Mongo right out.
From: Mikee 5 Nov 2010 10:35
To: THERE IS NO GOD BUT (RENDLE) 12 of 12
Hmm Yeah Mongo could work well actually. Doctrine should be able to handle it fine.