开发者

Long (and failing) bulk data loads to Google App Engine datastore

开发者 https://www.devze.com 2023-04-10 16:41 出处:网络
I\'m developing an application on Google App Engine using the current django non-rel and the now default, high replication datastore.I\'m currently trying to bulk load a 180MB csv file locally on a de

I'm developing an application on Google App Engine using the current django non-rel and the now default, high replication datastore. I'm currently trying to bulk load a 180MB csv file locally on a dev instance with the following command:

appcfg.py upload_data --config_file=bulkloader.yaml --filename=../my_data.csv --kind=Place --num_threads=4 --url=http://localhost:8000/_ah/remote_api --rps_limit=500

bulkloader.yaml

python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Place
  connector: csv 
  connector_options:
      encoding: utf-8
      columns: from_header

  property_map:
    - property: __key__
      external_name: appengine_key
      export_transform: transform.key_id_or_name_as_string

- property: name
  external_name: name

The bulk load is actually successful for a truncated, 1000 record version of the CSV, but the full set eventually bogs down and starts erroring, "backing off" and waiting longer and longer. The bulkloader-log that I actually tail, doesn't reveal anything helpful and either does the server's stderr.

Any help in understanding this bulk load process wo开发者_开发知识库uld be appreciated. My plans are to be able to eventually load big data sets into the google data store, but this isn't promising.


180MB is a lot of data to load into the dev_appserver - it's not designed for large (or even medium) datasets; it's built entirely for small-scale local testing. Your best bet would be to reduce the size of your test dataset; if you can't do that, try the --use_sqlite command line flag to use the new sqlite-based local datastore, which is more scalable.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号