在appengine上用compass来集成lucene实现全文搜索

using a compass + JDO Search on appengine

Compass
http://www.compass-project.org/
Demo video
http://www.kimchy.org/searchable-google-appengine-with-compass/

Information in 2.3.0-beta is.

Table of contents

How to use

Download module
1. http://build.compass-project.org/ of Compass Trunk of select Nightly Build.
2. Results Page built "Artifacts" Click, "Release" to select.
3. Likely to be a list of files, from which to download the compass-2.3.0-beta1.zip.
Module Placement
1. Unzip the downloaded module, add the following file CLASSPATH (Google Plugin If you use the "war / WEB-INF / lib" copy, Eclipse projects also add to the Build path.)
  1. commons-logging.jar
  2. compass-2.3.0-beta1.jar
  3. lucene-core.jar

Initialization

The demonstration video, PMF.java imitation because it was initialized with a static initializer.

Some PMF.java

private static final Compass compass;

private static final CompassGps compassGps;

static (

compass = new CompassConfiguration ()

. SetConnection ("gae: / / index")

. SetSetting (

CompassEnvironment.ExecutorManager.EXECUTOR_MANAGER_TYPE,

"Disabled"). AddScan (

"Jp.co.topgate.sandbox.compass / model"). BuildCompass ();

compassGps = new SingleCompassGps (getCompass ());

compassGps.addGpsDevice (new Jdo2GpsDevice ("appengine", INSTANCE));

compassGps.start ();

compassGps.index ();

)

public static Compass getCompass () (

return compass;

)

addScan () does, JDO specifies the name of the Entity package that contains classes.

Entity Class

Class "@ Searchable" to qualify.
Entity can be used as the primary key field "@ SearchableId" to qualify.
- However, "com.google.appengine.api.datastore.Key" is not bound to accept because, as I have below.
  
  @ SearchableId
  
  public Long getKeyValue () (
  
  return key.getId ();
  
  )
The search field to use as "@ SearchableProperty" to qualify.
- However, "com.google.appengine.api.datastore.Text" is not bound to accept because, as I have below.
  
  @ SearchableProperty
  
  public String getContent1String () (
  
     if (content1 == null) (
  
     return "";
  
     )
  
     return content1.getValue ();
  
  )
@ SearchableProperty (name = "Field Name") in the class Entity and attribute names can I use different name too.
- Also, org.apache.lucene.document.Field attributes that can be used when creating.

Search

Example of getting results

@ SuppressWarnings ("serial")

public static class SearchResult implements Serializable (

public final String keyValue;

public final String content;

public final String nickname;

public final String email;

public SearchResult (Resource resource) (

this.keyValue = resource.getId ();

this.content = (String) resource.getProperty ("content1String"). getObjectValue ();

this.nickname = (String) resource.getProperty ("nickname"). getObjectValue ();

this.email = (String) resource.getProperty ("email"). getObjectValue ();

)

private List <SearchResult> search (String keyWords) (

CompassSearchSession search = PMF.getCompass (). OpenSearchSession ();

CompassHits hits = search.find (keyWords);

int length = hits.length ();

List <SearchResult> result = new ArrayList <SearchResult> (length);

for (int i = 0; i <length; i + +) (

Resource resource = hits.resource (i);

result.add (new SearchResult (resource));

)

return result;

)

If you want to search for the name Field

Name your search field: Search

If you specify a search for Kind

Search criteria alias: Kind Name (Class.getSimpleName ())

Resona

When you save through JDO, @ SearchableProperty were added using the getter and Field Index to create value, it seems to work.
- Entity class, using annotations to Index If you do add to the information (that is what I wrote on this page), there always needs to be saved through JDO. LowLevelAPI saved and does not create Entity Index for the (natural or ....)
- Unowned other Entity relationship holds a reference to its value after the Index can also fetch for it created. , But always return the value of persistence when the getter must return a value, so actually, the "fetch the referenced Entity before saving to keep you" need.
  
  unowned
  
  @ Persistence private Key otherEntityKey;
  
  @ NotPersistence private OtherEntity otherEntity;
  
  @ SearchablePropperty (name = "otherEntity")
  
  public Sting getOtherEntityValue () (
  
  return otherEntity! = null? otherEntity.getValue (): "";
  
  )
Entity Index for the creation, no one could get for a separate?
- Once implemented the Task Queue Java platform, Index will be delayed for creating Entity can only be good ... and behavior.
  - Compass TaskQueue side and I may well be expected to support.
Want control over the creation of the Index
- Without CompassGps, Compass.Getsearchengineindexmanager() call control.
  - Because some of the lucene subinterface for the smaller may be able to use it?
- Entity is a separate index to Compass.Openindexsession() calls to control
create dynamic query
- Compass.queryBuilder () Get the querybuilder. 良Shinani later.
- between, eq, ge, gt, le, le, and, or, like, wildcard and one with an array.
- CompassMultiPropertyQueryStringBuilder or search over multiple properties at once
- CompassSearchSession.find (queryString) internally CompassSearchSession.queryBuilder (). QueryString (queryString). ToQuery (). Hits () have.
Add to sort search
- Compassquery .Addsort(String propertyName, Compassquery.sortdirection direction) and the like.
limit, offset to add
- I looked, was not sure.CompassHits todetach(int from, int size) so that, if this 1-2 times will reduce Data itself. I do not mean much.
Highlight the text you wish to search results
- Search engines highlight text in general are presented as fragments around search terms that best match (for example, place a few hundred characters) lucene is called in other bestFragment.
- The method to get this fragment CompassHits.highlighter (i). Fragment (propertyName)
  
  To obtain fragments of the highlights
  
      CompassSearchSession searchSession = getCompass (). OpenSearchSession ();
      CompassHits hits = searchSession.find (text);
  
      int length = hits.getLength ();
  
      if (length> 0) (
  
       for (int i = 0; i <length; i + +) (
  
        / / Here are widespread PR highlights the string.
  
        String fragment = hits.highlighter (i). Fragment ("pr");
  
        Resource resource = hits.resource (i);
       )
      )
      return results;
- Wicket to display in IData highlighter in one form or another will take to get the fragment.

标签 : gae, google, java, lucene, search

发表评论

IT瘾于2010年4月27日下午05时01分56秒发布 #

使用Scrum的Agile项目管理介绍

Scrum 是一种迭代式增量软件开发过程，通常用于敏捷软件开发。Scrum在英语的意思是橄榄球里的争球。

虽然Scrum是为管理软件开发项目而开发的，它同样可以用于运行软件维护团队，或者作为计划管理方法： Scrum of Scrums .

阅读全文……

标签 : other

发表评论

IT瘾于2010年4月26日下午08时50分57秒发布 #

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x10)

org.apache.xmlrpc.XmlRpcException: Failed to parse XML-RPC request: An invalid XML character (Unicode: 0x10) was found in the element content of the document.

Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x10) was found in the element content of the document.
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xmlrpc.server.XmlRpcStreamServer.getRequest(XmlRpcStreamServer.java:65)

阅读全文……

标签 : google, java, other, xml

发表评论

IT瘾于2010年4月25日下午06时11分57秒发布 #

Google云计算GAE开发的一个IT技术推荐应用

IT Technology Network是用Google App Engine - Google Code google的云计算服务开发的一个网站。

IT Technology Network 是一个专注于IT技术的网站，包括软件开发、IT业界、Java、数据库、Unix、Web开发、开源项目、互联网等新闻和博客。
目标是提供一个有质量保证、简洁、经常更新的IT技术内容和搜索服务。
IT Technology Network 其实也就是国内IT技术网站的一个聚合网站，例如，腾讯IT，Linux伊甸园，ChinaUnix，JavaEye博客，Java博客，ITPUB技术门户

这算是Google App Engine的试用应用吧，渐渐地才发现云计算不是一个概念，而中国的云计算可能还落后很多，而这个又像马云说的那样，云计算可能蕴藏颠覆性力量。“我最怕的是老酒装新瓶的东西，你看不清他在玩什么，突然爆发出来最可怕。假如从来没有听说的，这个不可怕。雅虎当年做搜索引擎，然后Google出来了，雅虎很多人认为跟我们也差不多，后来几乎把他们搞死。”

标签 : gae, google, java, other

发表评论

IT瘾于2010年4月22日上午10时22分19秒发布 #

Google App Engine性能调优 - 页面性能优化

GAE提供了简单实用的API和开发工具，结合已有的开发框架，Java开发人员可以很容易开发出自己的业务应用系统。

本次先介绍页面部分的性能优化技巧，只需要进行简单的设置和少量的编码，即可获得不错的性能提高。后续的文章

文中提到的技巧已经在本博客取得验证，从后来的统计数据中可以看到，首页的处理时间从平均400ms减少到了平均26ms，性能提高了15倍！

阅读全文……

标签 : gae, google, java

发表评论

IT瘾于2010年4月18日下午10时55分05秒发布 #

Google App Engine for Java数据备份下载

It is possible to use python tool bulkloader.py to create datastore backup of GAE Java app.

You just have to set up remote_api by adding following lines to web.xml:

<?xml version="1.0" encoding="utf-8"?>
<web-app>
  <!-- Add this to your web.xml to enable remote API on Java. -->
  <servlet>
    <servlet-name>remoteapi</servlet-name>
    <servlet-class>

com.google.apphosting.utils.remoteapi.RemoteApiServlet

</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>remoteapi</servlet-name>
    <url-pattern>/remote_api</url-pattern>
  </servlet-mapping>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>remoteapi</web-resource-name>
      <url-pattern>/remote_api</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin</role-name>
    </auth-constraint>
  </security-constraint>
</web-app> 
After that you can use bulkloader.py with --dump to download backup
and with --restore to upload backup to datastore.

You can also use the --kind=... argument to download all entities of a specific kind:

bulkloader.py --dump --app_id=<app-id> --kind=<kind> --url=http://<appname>.appspot.com/remote_api --filename=<data-filename>

You can download and upload every entity of a kind in a format suitable for backup and restore, all without writing any additional code or configuration. To download all entities of all kinds, run the folowing command:

bulkloader.py --dump --app_id=<app-id> --url=http://<appname>.appspot.com/remote_api --filename=<data-filename>

Downloading Data from App Engine

To start a data download, run appcfg.py download_data with the appropriate arguments:

appcfg.py download_data --config_file=album_loader.py --filename=album_data_archive.csv --kind=Album <app-directory>

If you are using a Google Apps domain name and need appcfg.py to sign in using an account on that domain, you must specify the --auth_domain=... option, whose value is your domain name.

If the transfer is interrupted, you can resume the transfer from where it left off using the --db_filename=... and --result_db_filename=... arguments. These arguments are the names of the progress file and the results file created by the tool, which are either names you provided with the arguments when you started the transfer, or default names that include a timestamp. This assumes you have sqlite3 installed, and did not disable progress files with --db_filename=skip.

标签 : gae, google, java

发表评论

IT瘾于2010年4月17日上午08时56分12秒发布 #

Google App Engine for Java的Performance性能问题

Google App Engine - Google Code，会根据负载情况自动关闭或者启动Java web应用的JVM，这使得很多http请求会触发启动JVM并部署web应用，因此这样的http请求非常慢，响应性能很差，用户体验非常差。因为大部分的应用还是需要连接DataStore，那么就需要初始化Data类的MetaData和连接数据库，不可避免需要很多时间。这个问题甚至Google自己也没有很好的办法。

What is a loading request?

Some requests run slower because App Engine needs to create a new Java virtual machine to service the request. We call that kind of request, a Loading Request. During a loading request, your application undergoes initialization (such as class loading, JIT compiling, etc) which causes the request to take longer.

For slow requests which are already close to App Engine's request deadline, the extra initialization can push it past the deadline, causing a DeadlineExceededException.

What causes loading requests?

App Engine spins up JVMs on demand, so there are several reasons why you may receive a loading request:

You just uploaded a new version of your application.
Your application may not be getting any traffic.
Your traffic has become high enough to need another JVM to scale.

You can expect that during the course of developing your application, you will often experience the first two scenarios. In comparison, for a production app receiving even a very small but steady amount of traffic, loading requests are relatively infrequent.

How do I distinguish normal requests from loading requests in my application logs?

You can register an HttpSessionListener in your web.xml which logs from its sessionCreated method. For example:

// web.xml snippet
<listener>
  <listener-class>
  com.example.LogLoadingRequest
  </listener-class>
</listener>

// LogLoadingRequest.java
public class LogLoadingRequest implements ServletContextListener {
  private static final Logger logger = Logger.getLogger(LogLoadingRequest.class.getName());
  public void contextInitialized(ServletContextEvent sce) {
    logger.log(Level.INFO, "Loading request occuring.");
  }

  public void contextDestroyed(ServletContextEvent sce) {
  }
}

In the future, the Admin Console logs viewer will mark loading requests specifically so that they can be easily identified.

Do I need to be concerned about high CPU warnings in the admin console for my loading requests?

App Engine provides high CPU warnings to help you determine which requests might need optimization. In the case of loading requests, though, the execution time is artificially longer due to the extra application initialization required. In addition, the number of loading requests is inversely proportional to the amount of traffic your application receives. So, while your CPU usage due to additional traffic will increase, your CPU usage due to loading requests will decrease.

Given that, your time is most often better spent focusing on optimizing other high CPU warnings in relation to your application's total CPU usage.

How can I speed up loading requests?

Here are a few suggestions:

Perform application initialization lazily, rather than eagerly, so it doesn't all occur within a single request.
Share expensive initialization between JVMs. For example, put data which is expensive to read or compute into memcache, where it can be quickly read by other JVMs during startup.
Move initialization from application startup to build-time where reasonable. For example, convert a complex datafile into a simple, quick-to-read datafile in your build process.
Use slimmer dependencies. For example, prefer a library that is optimized to your task, as opposed to a large library that performs very heavy initialization.

What is Google doing to speed up loading requests?

With the release of 1.2.8, we've introduced a new class-loading optimization called precompilation. We've seen improvements to loading requests of 30% and greater. For now, you'll need to opt into precompilation in order to take advantage of it. You can do this by including the following code in your appengine-web.xml file:
```
<precompilation-enabled>true</precompilation-enabled>
```
In the future, we plan to turn this optimization on for all applications.
We're also making runtime optimizations guided by profiling applications with longer loading requests. In addition, we've provided profiling feedback to third-party language runtimes such as Groovy and JRuby and suggestions for optimization of their own libraries and runtimes.
We're actively working on further startup optimizations.

Can I pay to keep a JVM reserved for my application?

We've seen this request from some developers with low-traffic applications who'd like to reduce the percentage of loading requests they receive. Although we have many improvements in the pipeline to improve loading request performance, we'd like to gauge the general interest in this feature. If you'd like to be able to reserve a JVM at a price, please star this issue. If there's a particular pricing scheme you're interested in, let us know.

Should I run a cron job to keep my JVMs alive and reduce my loading requests?

We discourage developers from doing this because it increases the average number of loading requests for all low-traffic applications. Instead, we will continue to improve the performance of loading requests for everyone, and you can use the advice on this page to optimize your application's startup performance.

标签 : gae, google, java

发表评论

IT瘾于2010年4月16日上午08时42分33秒发布 #

Google App Engine 云计算的限制

Google App Engine - Google Code，虽然是个令人兴奋的东西，但是它由于种种原因有很多限制，而有些限制还是挺恼火的。

开发者对于App Engine的文件系统只拥有读的权限。
App Engine仅可以在HTTP请求时执行代码（除了计划的后台任务、任务队列和XMPP服务）。
用户可以上传任意的Python模块，但它们必须是纯Python，不包括任何C扩展程序或其他必须编译的代码。
App Engine限制每次Datastore请求最多返回1000行数据。
Java应用程序只能使用JRE基本版本类库中的一个子集（JRE类白名单）。

Java应用程序不能创建新的线程。

指标	限制
每个开发者拥有的应用程序	10
每次请求的时间	30秒
每个应用程序的文件	1000个
HTTP响应大小	10 MB
Datastore大小	1 MB
应用程序代码大小	150 MB

标签 : gae, google, java

发表评论

IT瘾于2010年4月15日下午05时43分15秒发布 #

using a compass + JDO Search on appengine

How to use

Initialization

The demonstration video, PMF.java imitation because it was initialized with a static initializer.

Some PMF.java

Search

Example of getting results

If you want to search for the name Field

If you specify a search for Kind

Resona

unowned

To obtain fragments of the highlights

Downloading Data from App Engine

What is a loading request?

What causes loading requests?

How do I distinguish normal requests from loading requests in my application logs?

Do I need to be concerned about high CPU warnings in the admin console for my loading requests?

How can I speed up loading requests?

What is Google doing to speed up loading requests?

Can I pay to keep a JVM reserved for my application?

Should I run a cron job to keep my JVMs alive and reduce my loading requests?