CS6320:  SW Engineering of Web Based Systems

 

GAE Datastore -- Indexes

 

Difference in getting information from GAE datastore and traditional DB

  • Relational database = riffles through the original records and performing calculations to determine the answer
  • GAE datastore= simply finds the answer in a list of possible answers prepared in advance.
    • need only do a simple scan of an index for every query, the application gets results back quickly

 

Wow --- how can GAE know all the answers in advance??

  • App Engine can do this because it knows which questions are going to be asked --- you tell it through the creation of indexes.

  • NOTE: some relational databases can be told to maintain a limited set of indexes to speed up some kinds of queries.
  • App Engine is different: it maintains an index for every query the application is going to perform.   

  • for large amounts of data, App Engine can spread the data and the indexes across many machines, and get results back from all of them without an expensive aggregate operation.

Index Strategy --- fast but has limits over Relation DB

  • Datastore's built-in query engine is weak compared to some relational databases, and is not suited to sophisticated
    data processing applications that would prefer slow but powerful runtime queries to fast simple ones.
  • HOWEVER --- web applications need fast results ---relation DB has problems with this.

    App Engine uses a model suited to scalable web applications: calculate the answers to known questions when the data
    is written, so reading is fast.

 

 

 

Index -- what is it???

A "table" of answers corresponding to a specific query that you can get Entrys from very quickly.

For every query an application performs, App Engine maintains an index, a single table
of possible answers for the query.

 

 

Example of an Index


Consider the following simple query:
SELECT * FROM Player WHERE name = '********' (you fill in value for ******)


GAE uses an index

  • has keys of every Player entity

  • has value of each entity's name property, sorted by the name property values in ascending order.

example of index

INDEX

entity = Player

key = player /id

properties = "name" property values

Filters:

      sort = sorted by name in ascending order

     conditional = WHERE name = 'druidjane'

 

 

 

GAE's retrieval from an Index

Option 1= when sorted

using above Example Index say have query SELECT * FROM Player WHERE name = 'druidjane'

  • finds the first row in the index that matches

  • scans down to the first row that doesn't match.

 

GAE distributes an Index (horizontally) on different machines --how Retrieval really works

  • Preperation: Entities and indexes are distributed across multiple machines

  • STEP 1: each machine scans its own index in parallel with the others.

  • STEP 2: each machine returns results to App Engine

  • STEP 3: App Engine delivers the final result set to the app, in order, as if all of the results were in one large index.

 

GAE updates indexes (any dealing with the Entry) whenever you alter an Entry

  • App Engine updates all relevant indexes when property values change.

  • Example

    • App retrieves a Player entity, changes the name, then saves the entity with
      a call to the put() method,

    • App Engine updates the appropriate row in the previous index.

    • It also moves the row if necessary so the ordering of the index is preserved.

    • call to put() does not return until all appropriate indexes are updated.

Default Indexes -- already exists for each Kind in your Project

  • GAE maintains 2 indexes for every property name in an entity kind

    • Index 1 = property values sorted in ascending order

    • Index 2 = property values in descending order.

  • Index 3 = index of entities of each kind.

  • SO each Kind has an Index 3 + Number_Property_Values_In_Kind * (2 --for Index 1 and Index 2)      

 

what kind of Queries can you do with these Default Indexes?

    • Index 3 -> A simple query for all entities of a given kind, no filters or sort orders
    • Index 1 or 2 -> One filter on a property using the equality (=) operator
    • Index 1 or 2 -> Filters using greater-than or less-than operators (>, >=, <, <=) on a single property
    • Index 1 or 2 -> One sort order, ascending or descending, and no filters, or with filters only on the
      same property used with the sort order
    • Index 1 or 2-> Filters or a sort order on the entity key
    • Index 3-> Kindless queries with or without key filters

 

Other Indexes you need to create --any time query has conditions, sorts

 

  • must tell GAE what other Indexes to provide for queries you will make not falling into Index 1, 2 or 3 default Indexes.

  • Use configuration file

    • Java = WEB-INF/datastoreindexes.xml

    • Python = index.yaml

Eclipse Pluggin for GAE will autogenerate the above configuration file for you when you make a "non-default" query.

Suppose we have the following code to form a query

                     Below implements : select * from Person where lastName=lastNameParam AND height=maxHeightParam

Query q = new Query("Person");
q.addFilter("lastName", Query.FilterOperator.EQUAL, lastNameParam);
q.addFilter("height", Query.FilterOperator.LESS_THAN, maxHeightParam);


indexes file
This query requires a custom index to be specified in your application's
war/WEB-INF/datastore-indexes.xml file
.

 


When you run your application in the SDK, it will automatically add an entry to this file.

When you upload your application, the custom index definition will be automatically uploaded, too.

The entry for this query will look like:

<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes>
    <datastore-index kind="Person" ancestor="false">
        <property name="lastName" direction="asc" />
        <property name="height" direction="asc" />
    </datastore-index>
</datastore-indexes>

You can read all about Datastore indexes in the Introduction to Indexes section.

 

 

Entity A that does not have a PARTICULAR property set that is set by OTHER Entities of the same KIND C ---> Means Entity A will not be in any Index that has this poperty for Kind C

 

 

Stipluating a property should NOT BE INDEX

  • WHY>>> This saves space in index tables, and reduces the amount of time it takes to
    save the entity.

  • JAVA: use setUnindexedProperty() method of the Entity object, instead of the setProperty()
    method.

 

How does Index do ORDERING --when a property can store different data types(not forced to be same data type)??????

  • Each type of property value has its own rules for comparing two values
    of the same type, and these rules are mostly intuitive: integers are sorted in numeric
    order, strings in Unicode order, and so forth.

  • Two entities can have values of different types for the same property, so App Engine
    also has rules for comparing such values, though these rules are not so intuitive.

  • RULE 1: Values are ordered first by type, then within their type.

    • For instance, all integers are sorted above all strings.

    • One effect of this that might be surprising is that all floats are sorted below all integers.

    • POTENTIAL PROBLEM: datastore treats floats and integers as separate value types, and so sorts them separately.

      • SOLUTION If your app relies on the correct ordering of numbers, make sure all numbers
        are stored using the same type of value.

  • RULE 2: Converts all data types to 8 distinctive types for use in RULE1

      • datastore supports several additional types (beyond 8: int, float, )by storing them as one
        of the eight types, then marshaling them between the internal representation and the
        value your app sees automatically.

      • Example = a date-time value is actually stored as an integer, and will be sorted amongst other integer values in an index.

      • Datastore type Java Python order
        Null null None -
        Integer and
        date-time
        long (other integer types are widened),
        java.util.Date, datastore.Rating
        long, datetime.datetime,
        db.Rating
        Numeric (datetime
        is
        chronological)
        Boolean boolean (true or false) bool (True or False) False, then
        true
        Byte string datastore.ShortBlob db.ByteString Byte order
        Unicode
        string
        java.lang.String,
        datastore.Category,
        datastore.Email,
        datastore.IMHandle,
        datastore.Link,
        datastore.PhoneNumber,
        datastore.PostalAddress
        unicode, db.Category,
        db.Email, db.IM, db.Link,
        db.PhoneNumber,
        db.PostalAddress
        unicode, db.Category,
        db.Email, db.IM, db.Link,
        db.PhoneNumber,
        db.PostalAddress
        Floatingpoint
        number
        double float Numeric
        Geographical
        point
        datastore.GeoPt db.GeoPt By latitude,
        then longitude
        (floating-
        point
        numbers)
        A Google account users.User users.User By email address,
        Unicode
        order
        Entity key datastore.Key db.Key Kind (byte
        string), then
        ID (numeric)
        or name
        (byte string)

 

 

What happens in GAE if a Query is made that needs a non-existent Index

the query fails

© Lynne Grewe