CS6320:  SW Engineering of Web Based Systems

 

GAE Datastore

 

  • NOSQL distributed data storage as a service

  • Scalability

    • writes scale by automatically distributing data as necessary.
    • reads scale because the only queries supported are those whose performance scales with the size of the result set (as opposed to the data set). This means that a query whose result set contains 100 entities performs the same whether it searches over a hundred entities or a million. This property is the key reason some types of queries are not supported.


    Balance of strong and eventual consistency.

  • CONSISENT FOR= ensures that entity lookups by key and ancestor queries always receive strongly consistent data.
    EVENTUALLY CONSISTENT = All other queries are eventually consistent. The consistency models allow your application to deliver a great user experience while handling large amounts of data and users.

     

What do you use it for? any time you have structured data you want to scale

    • real-time inventory and product details for a retailer.

    • User profiles that deliver a customized experience based on the user’s past activities and preferences.

    • Transactions based on ACID properties, for example, transferring funds from one bank account to another.

    • AND MUCH MORE

 

 

data store

Lets learn.......

datastore = GAE's "database"

    • not the traditional relational databases --> yields greater scalability
    • more like object database

datastore entity = has one or more (name, value) pairs

    values are primitive data types

    each entity is of a named kind

 

 

 

 

 

Concepts of an Object based GAE datastore

  • Entity = Object (loosely think of this as a row in a relational database--a "data entry")



  • Properties= these store the data of an Entity (loosely think of this in a relational database as the field values in a data entry)
    • COMPONENTS OF A PROPERTY:
      • name
      • value(s) - a property may have more than one value (think about Entity=Dog, Property=color (values=white and brown))
        • each value is one of many data-types like string, an integer, a date-time, or a null value.
        • NOTE: the values do not have to be of the same type -- this is a departure of the concept of field in a relational database whoes values are of same type and singular.
                  

  • Key = each entity has a key that uniquely identifies it across entire system
    • COMPONENTS OF A KEY:

      • application ID = this makes sure nothing else about the key can collide with the entities of any other application.
        • It also ensures that no other app can access your app's data, and that your app cannot access data for other apps.
        • You won't see the app ID mentioned in the datastore API; this is automatic.

      • kind = An entity's kind categorizes the entity for the purposes of queries, and for ensuring the uniqueness of the rest of the key.
        • example: a shopping cart application represents each customer order with an entity of the kind "Order."
        • specify when create entity.
        • This is somewhat different than the realtional database concept of table but, that is the closest.

      • entity ID = This can be an arbitrary string specified by the app or it can be generated automatically by the datastore.
              CREATED (only one of following ways):
        • an entity ID given by the app called key name, will be a string
        • an entity ID generated by the datastore called an ID, will be an integer

Comparing GAE Datastore to Relational Database --Caution from your Book.

It's tempting to compare these concepts with similar concepts in relational
databases: kinds are tables; entities are rows; properties are fields
or columns. That's a useful comparison, but watch out for differences.

Unlike a table in a relational database, there is no relationship between
an entity's kind and its properties. Two entities of the same kind can
have different properties set or not set, and can each have a property of
the same name but with values of different types.
You can (and often
will) enforce a data schema in your own code, and App Engine includes
libraries to make this easy, but this is not required by the datastore.

Also unlike relational databases, keys are not properties. You can perform queries on key names just like properties, but you cannot change
a key name after the entity has been created.


And of course, a relational database cannot store multiple values in a single cell, while an App Engine property can have multiopel values.

 

Some things about keys

  • Keys can not be changed once set

  • creating the concept of a foreign key in a relational data base, in GAE we can store the key of another entity B inside an entity A, in this way it is a reference to entity A's entity B.

 


GAE: low-level package to access datastore

com.google.appengin.api.datastore.* package    

access

 

 

 

 

 

 

 

 

STEPS

STEP 1: Create instance of DataStore (ds) for this application      
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();

STEP 2: Create instance of Entity called book     

Entity book = new Entity("Book");

STEP 3: Set various properties of book Entity with name, value pairs   

  book.set*(*);

STEP 4: Add Entity instance book to the DataStore ds

ds.put(book);

 

 

Example Code

Showing code in Java here (can also do in Python)

NOTICE: how the application code setups up the entity ---there is no structure that is setup prior in the datastore.

 import java.io.IOException;
 import java.util.Calendar;
 import java.util.Date;
 import java.util.GregorianCalendar;
 import javax.servlet.http.HttpServlet;
 import javax.servlet.http.HttpServletRequest;
 import javax.servlet.http.HttpServletResponse;
 import com.google.appengine.api.datastore.DatastoreService;
 import com.google.appengine.api.datastore.DatastoreServiceFactory;
 import com.google.appengine.api.datastore.Entity;


// ... //STEP 1: Create instance of DataStore for this application       DatastoreService ds = DatastoreServiceFactory.getDatastoreService(); //STEP 2: Create instance of Entity called book      Entity book = new Entity("Book"); //STEP 3: Set various properties of book Entity with name, value pairs      book.setProperty("title", "The Grapes of Wrath");      book.setProperty("author", "John Steinbeck");      book.setProperty("copyrightYear", 1939); //STEP 3: Create instance of java.util.Data and set it as a property associated with name "authorBirthDate"      Date authorBirthdate = new GregorianCalendar(1902, Calendar.FEBRUARY, 27).getTime();      book.setProperty("authorBirthdate", authorBirthdate); //STEP 4: Add Entity instance book to the DataStore ds      ds.put(book);
// ...

 


GAE: higher-level langauge specific packages to access datastore

higher level language access

    • Python Datastore API
    • Java - Java Persistence API and Java Data Objects -- Using this over the low-level GAE API may be better as it makes your Java based web app more portable to other servers/platforms beyond GAE

 

 


Monitor your Datastore useage and Datastore Entity statistics on GAE app

watch out $$$$$   

You can see your datastore statisics by clicking in admin module the Datastore Statistics

 

For this example application (see article) you see that the data store entity "SSItemMark" takes up 97% of the datastore size

and there are 306,569 of them (this can get $$$costly)

datastore statistics

 

 

Why Object Database --- Google's BigTable

  • BigTable, the "data objects in the cloud" technology which undergirds Google's massive applications, has the magic property of being essentially infinitely scalable with respect to the amount of data, and the amount of transaction activity. It is essentially the horizontal "partitioning" or "sharding" of data taken to the extreme.



  • databases, and not leveraging most of the relational features. They are paying a big cost for relational functionality without real need. Google App Engine acknowledges this fact, and provides the true object interface that most application developers are using anyway.

  • Google App Engine forces you to be explicit about data indexes, but this is something you have to do anyway when horizontally scaling traditional databases anyway. In traditional web application architectures, scalability almost always involves partitioning data among several database instances. The moment you partition data among multiple SQL stores, you have to think about indexes, because searches across those stores requires you to perform some sort of scatter/gather (in functional programming & Google parlance "MapReduce") - and if you aren't careful about how you build indexes, you'll end up with incredible inefficiencies (like having to merge large data sets from multiple stores in memory).

 

 

What is horizontal partitioning -> distributed horizontally DB entries

Example horizontal partitioning by region closeness --- here into 2 partitions (shards)

horizontal partitioning

 

How does Google do this ---well it uses techniques from MapReduce to do the splitting and also when you need to get the data from the database it must go out to the different parititions (shards) do the query and then combine the results ---- THIS is not simple and its a great feature that Google gives its infrastructure to you to do this!!!

 

© Lynne Grewe