GAE Datastore -- Indexes
Difference in getting information from GAE datastore and traditional DB
- Relational database = riffles
through the original records and performing calculations to determine the answer
- GAE datastore= simply finds the answer in a list of possible answers prepared in advance.
- need only do a simple scan of an index
for every query, the application gets results back quickly
Wow --- how can GAE know all the answers in advance??
- App
Engine can do this because it knows which questions are going to be asked --- you tell it through the creation of indexes.
- NOTE: some relational
databases can be told to maintain a limited set of indexes to speed up some kinds
of queries.
- App Engine is different: it maintains an index for every query the application
is going to perform.
- for large amounts of
data, App Engine can spread the data and the indexes across many machines, and get
results back from all of them without an expensive aggregate operation.
Index Strategy --- fast but has limits over Relation DB
- Datastore's built-in query engine
is weak compared to some relational databases, and is not suited to sophisticated
data processing applications that would prefer slow but powerful runtime queries
to fast simple ones.
- HOWEVER --- web applications need fast results ---relation DB has problems with this.
App Engine uses a model suited
to scalable web applications: calculate the answers to known questions when the data
is written, so reading is fast.
Index -- what is it???
A "table" of answers corresponding to a specific query that you can get Entrys from very quickly.
For every query an application performs, App Engine maintains an index, a single table
of possible answers for the query.
Example of an Index
Consider the following simple query:
SELECT * FROM Player WHERE name = '********' (you fill in value for ******)
GAE uses an index
-
has keys of every Player
entity
-
has value of each entity's name property, sorted by the name property values
in ascending order.
INDEX
entity = Player
key = player /id
properties = "name" property values
Filters:
sort = sorted by name in ascending
order
conditional = WHERE name = 'druidjane'
GAE's retrieval from an Index
using above Example Index say have query SELECT * FROM Player WHERE name = 'druidjane'
-
finds the first row
in the index that matches
-
scans down to the first row that doesn't match.
GAE distributes an Index (horizontally) on different machines --how Retrieval really works
-
Entities and
indexes are distributed across multiple machines
-
each machine scans its own index
in parallel with the others.
-
each machine returns results to App Engine
-
App Engine delivers the final result set to the app, in order, as if all of
the results were in one large index.
GAE updates indexes (any dealing with the Entry) whenever you alter an Entry
-
App Engine updates all relevant indexes when property values change.
-
-
App retrieves a Player entity, changes the name, then saves the entity with
a call to the put() method,
-
App Engine updates the appropriate row in the previous
index.
-
It also moves the row if necessary so the ordering of the index is preserved.
-
call to put() does not return until all appropriate indexes are updated.
Default Indexes -- already exists for each Kind in your Project
-
GAE maintains 2 indexes for every property
name in an entity kind
-
Index 1 = property values sorted in ascending order
-
Index 2 = property values in descending order.
-
Index 3 = index of entities of each
kind.
- SO each Kind has an Index 3 + Number_Property_Values_In_Kind * (2 --for Index 1 and Index 2)
what kind of Queries can you do with these Default Indexes?
- Index 3 -> A simple query for all entities of a given kind, no filters or sort orders
- Index 1 or 2 -> One filter on a property using the equality (=) operator
- Index 1 or 2 -> Filters using greater-than or less-than operators (>, >=, <, <=) on a single property
- Index 1 or 2 -> One sort order, ascending or descending, and no filters, or with filters only on the
same property used with the sort order
- Index 1 or 2-> Filters or a sort order on the entity key
- Index 3-> Kindless queries with or without key filters
Other Indexes you need to create --any time query has conditions, sorts
-
must tell GAE what other Indexes to provide for queries you will make not falling into Index 1, 2 or 3 default Indexes.
-
Use configuration file
-
Java = WEB-INF/datastoreindexes.xml
-
Python = index.yaml
Suppose we have the following code to form a query
Below implements : select * from Person where lastName=lastNameParam AND height=maxHeightParam
Query q = new Query("Person"); q.addFilter("lastName", Query.FilterOperator.EQUAL, lastNameParam); q.addFilter("height", Query.FilterOperator.LESS_THAN, maxHeightParam);
This query requires a custom index to be specified in your application's
war/WEB-INF/datastore-indexes.xml file.
When you run your application in the SDK, it will automatically add an entry to this file.
When you upload your application, the custom index definition will be automatically uploaded, too.
The entry for this query will look like:
<?xml version="1.0" encoding="utf-8"?> <datastore-indexes> <datastore-index kind="Person" ancestor="false"> <property name="lastName" direction="asc" /> <property name="height" direction="asc" /> </datastore-index> </datastore-indexes>
You can read all about Datastore indexes in the Introduction to Indexes section.
Entity A that does not have a PARTICULAR property set that is set by OTHER Entities of the same KIND C ---> Means Entity A will not be in any Index that has this poperty for Kind C
Stipluating a property should NOT BE INDEX
-
WHY>>> This saves space in index tables, and reduces the amount of time it takes to
save the entity.
-
JAVA: use
setUnindexedProperty() method of the Entity object, instead of the setProperty()
method.
How does Index do ORDERING --when a property can store different data types(not forced to be same data type)??????
-
Each type of property value has its own rules for comparing two values
of the same type, and these rules are mostly intuitive: integers are sorted in numeric
order, strings in Unicode order, and so forth.
-
Two entities can have values of different types for the same property, so App Engine
also has rules for comparing such values, though these rules are not so intuitive.
-
RULE 1: Values
are ordered first by type, then within their type.
-
For instance, all integers are sorted
above all strings.
-
One effect of this that might be surprising is that all floats are sorted below all integers.
-
POTENTIAL PROBLEM: datastore treats floats and integers as separate value types, and so sorts them separately.
-
SOLUTION If your app relies on the correct ordering of numbers, make sure all numbers
are stored using the same type of value.
-
RULE 2: Converts all data types to 8 distinctive types for use in RULE1
-
datastore supports several additional types (beyond 8: int, float, )by storing them as one
of the eight types, then marshaling them between the internal representation and the
value your app sees automatically.
-
Example = a date-time value is actually stored as an integer, and will
be sorted amongst other integer values in an index.
-
Datastore type |
Java |
Python |
order |
Null |
null |
None |
- |
Integer and
date-time |
long (other integer types are widened),
java.util.Date, datastore.Rating |
long, datetime.datetime,
db.Rating |
Numeric (datetime
is
chronological) |
Boolean |
boolean (true or false) |
bool (True or False) |
False, then
true |
Byte string |
datastore.ShortBlob |
db.ByteString |
Byte order |
Unicode
string |
java.lang.String,
datastore.Category,
datastore.Email,
datastore.IMHandle,
datastore.Link,
datastore.PhoneNumber,
datastore.PostalAddress |
unicode, db.Category,
db.Email, db.IM, db.Link,
db.PhoneNumber,
db.PostalAddress |
unicode, db.Category,
db.Email, db.IM, db.Link,
db.PhoneNumber,
db.PostalAddress |
Floatingpoint
number |
double |
float |
Numeric |
Geographical
point |
datastore.GeoPt |
db.GeoPt |
By latitude,
then longitude
(floating-
point
numbers) |
A Google account |
users.User |
users.User |
By email address,
Unicode
order |
Entity key |
datastore.Key |
db.Key |
Kind (byte
string), then
ID (numeric)
or name
(byte string) |
What happens in GAE if a Query is made that needs a non-existent Index
the query fails
|