<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    posts - 40,  comments - 7,  trackbacks - 0

    1. 實(shí)現(xiàn)一個(gè)簡(jiǎn)單的search feature

    ?? 在本章中只限于討論簡(jiǎn)單Lucene 搜索API, 有下面幾個(gè)相關(guān)的類(lèi):

    ?Lucene 基本搜索API:

    類(lèi)

    功能

    IndexSearcher

    搜索一個(gè)index的入口.所有的searches都是通過(guò)IndexSearcher 實(shí)例的幾個(gè)重載的方法實(shí)現(xiàn)的.

    Query (and subclasses)

    各個(gè)子類(lèi)封裝了特定搜索類(lèi)型的邏輯(logic),Query實(shí)例傳遞給IndexSearchersearch方法.

    QueryParser

    處理一個(gè)可讀的表達(dá)式,轉(zhuǎn)換為一個(gè)具體的Query實(shí)例.

    Hits

    包含了搜索的結(jié)果.IndexSearchersearch函數(shù)返回.

    下面我們來(lái)看幾個(gè)書(shū)中的例子:

    LiaTestCase.java? 一個(gè)繼承自 TestCase 并且擴(kuò)展了 TestCase 的類(lèi) , 下面的幾個(gè)例子都繼承自該類(lèi) .

    01 ?package?lia.common;
    02 ?
    03 ?import?junit.framework.TestCase;
    04 ?import?org.apache.lucene.store.FSDirectory;
    05 ?import?org.apache.lucene.store.Directory;
    06 ?import?org.apache.lucene.search.Hits;
    07 ?import?org.apache.lucene.document.Document;
    08 ?
    09 ?import?java.io.IOException;
    10 ?import?java.util.Date;
    11 ?import?java.text.ParseException;
    12 ?import?java.text.SimpleDateFormat;
    13 ?
    14 ?/**
    15 ??*?LIA?base?class?for?test?cases.
    16 ??*/
    17 ?public?abstract?class?LiaTestCase?extends?TestCase?{
    18 ???private?String?indexDir?=?System.getProperty("index.dir");? // 測(cè)試 index 已經(jīng)建立好了
    19 ???protected?Directory?directory;
    20 ?
    21 ???protected?void?setUp()?throws?Exception?{
    22 ?????directory?=?FSDirectory.getDirectory(indexDir,?false);
    23 ???}
    24 ?
    25 ???protected?void?tearDown()?throws?Exception?{
    26 ?????directory.close();
    27 ???}
    28 ?
    29 ???/**
    30 ????*?For?troubleshooting 為了 解決問(wèn)題的方法
    31 ????*/
    32 ???protected?final?void?dumpHits(Hits?hits)?throws?IOException?{
    33 ?????if?(hits.length()?==?0)?{
    34 ???????System.out.println("No?hits");
    35 ?????}
    36 ?
    37 ?????for?(int?i=0;?i?<?hits.length();?i++)?{
    38 ???????Document?doc?=?hits.doc(i);
    39 ???????System.out.println(hits.score(i)?+?":"?+?doc.get("title"));
    40 ?????}
    41 ???}
    42 ?
    43 ???protected?final?void?assertHitsIncludeTitle(
    44 ???????????????????????????????????????????Hits?hits,?String?title)
    45 ?????throws?IOException?{
    46 ?????for?(int?i=0;?i?<?hits.length();?i++)?{
    47 ???????Document?doc?=?hits.doc(i);
    48 ???????if?(title.equals(doc.get("title")))?{
    49 ?????????assertTrue(true);
    50 ?????????return;
    51 ???????}
    52 ?????}
    53 ?
    54 ?????fail("title?'"?+?title?+?"'?not?found");
    55 ???}
    56 ?
    57 ???protected?final?Date?parseDate(String?s)?throws?ParseException?{
    58 ???????return?new?SimpleDateFormat("yyyy-MM-dd").parse(s);
    59 ???}
    60 ?}

    ? I. 搜索一個(gè)特定的Term 和利用QueryParser 解析用戶(hù)輸入的表達(dá)式

    ? 要利用一個(gè)特定的term搜索,使用QueryTerm就可以了,單個(gè)term 尤其適合Keyword搜索. 解析用戶(hù)輸入的表達(dá)式可以更適合用戶(hù)的使用方式,搜索表達(dá)式的解析有QueryParser來(lái)完成.如果表達(dá)式解析錯(cuò)誤 會(huì)有異常拋出, 可以取得相信的錯(cuò)誤信息 以便給用戶(hù)適當(dāng)?shù)奶崾?span lang="EN-US">.在解析表達(dá)式時(shí),還需要一個(gè)Analyzer 來(lái)分析用戶(hù)的輸入, 并根據(jù)不同的Analyzer來(lái)生產(chǎn)相應(yīng)的Term然后構(gòu)成Query實(shí)例.

    下面看個(gè)例子吧: BasicSearchingTest.java

    01 ?package?lia.searching;
    02 ?
    03 ?import?lia.common.LiaTestCase;
    04 ?import?org.apache.lucene.analysis.SimpleAnalyzer;
    05 ?import?org.apache.lucene.document.Document;
    06 ?import?org.apache.lucene.index.Term;
    07 ?import?org.apache.lucene.queryParser.QueryParser;
    08 ?import?org.apache.lucene.search.Hits;
    09 ?import?org.apache.lucene.search.IndexSearcher;
    10 ?import?org.apache.lucene.search.Query;
    11 ?import?org.apache.lucene.search.TermQuery;
    12 ?
    13 ?public?class?BasicSearchingTest?extends?LiaTestCase?{
    14 ?
    15 ???public?void?testTerm()?throws?Exception?{
    16 ?????IndexSearcher?searcher?=?new?IndexSearcher(directory);
    17 ?????Term?t?=?new?Term("subject",?"ant");??????????????? // 構(gòu)造一個(gè) Term
    18 ?????Query?query?=?new?TermQuery(t);
    19 ?????Hits?hits?=?searcher.search(query);???????????????? // 搜索
    20 ?????assertEquals("JDwA",?1,?hits.length());???????????? // 測(cè)試結(jié)果
    21 ?
    22 ?????t?=?new?Term("subject",?"junit");
    23 ?????hits?=?searcher.search(new?TermQuery(t));??????????????????
    24 ?????assertEquals(2,?hits.length());
    25 ?
    26 ?????searcher.close();
    27 ???}
    28 ?
    29 ???public?void?testKeyword()?throws?Exception?{? // 測(cè)試關(guān)鍵字搜索
    30 ?????IndexSearcher?searcher?=?new?IndexSearcher(directory);
    31 ?????Term?t?=?new?Term("isbn",?"1930110995");???????????????? // 關(guān)鍵字 term
    32 ?????Query?query?=?new?TermQuery(t);
    33 ?????Hits?hits?=?searcher.search(query);
    34 ?????assertEquals("JUnit?in?Action",?1,?hits.length());
    35 ???}
    36 ?
    37 ???public?void?testQueryParser()?throws?Exception?{? // 測(cè)試 QueryParser.
    38 ?????IndexSearcher?searcher?=?new?IndexSearcher(directory);
    39 ?
    40 ?????Query?query?=?QueryParser.parse("+JUNIT?+ANT?-MOCK",
    41 ?????????????????????????????????????"contents",
    42 ?????????????????????????????????????new?SimpleAnalyzer());? // 通過(guò)解析搜索表達(dá)式 返回一個(gè) Query 實(shí)例
    43 ?????Hits?hits?=?searcher.search(query);
    44 ?????assertEquals(1,?hits.length());
    45 ?????Document?d?=?hits.doc(0);
    46 ?????assertEquals("Java?Development?with?Ant",?d.get("title"));
    47 ?
    48 ?????query?=?QueryParser.parse("mock?OR?junit",
    49 ???????????????????????????????"contents",
    50 ???????????????????????????????new?SimpleAnalyzer());????????????? // 通過(guò)解析搜索表達(dá)式 返回一個(gè) Query 實(shí)例
    51 ?????hits?=?searcher.search(query);
    52 ?????assertEquals("JDwA?and?JIA",?2,?hits.length());
    53 ???}
    54 ?}

    2. 使用IndexSearcher

    ? 既然IndexSearcher 是那么的重要 下面我們來(lái)看看如何使用吧. 在構(gòu)造IndexSearcher時(shí) 有兩種方法:

    ■ By Directory
    ■ By a file system path

    推薦使用Directory 這樣就會(huì)Index 存放的位置 無(wú)關(guān)了, 在上面的 LiaTestCase.java 中我們構(gòu)造了一個(gè) Directory:

    ?? directory?=?FSDirectory.getDirectory(indexDir,? false );

    利用她構(gòu)造一個(gè) IndexSearch :

    IndexSearcher searcher = new IndexSearcher(directory);

    然后可以利用 searchersearch方法來(lái)搜索了 (6個(gè)重載的方法,參考doc 看看什么時(shí)候使用合適:) ,然后可以得到Hits, Hits中包含了搜索的結(jié)果 下面來(lái)看看Hits:

    I.Working with Hits

    Hits 4個(gè)方法, 如下

    Hits methods for efficiently accessing search results

    Hits method

    Return value

    length()

    Number of documents in the Hits collection

    doc(n)

    Document instance of the nth top-scoring document

    id(n)

    Document ID of the nth top-scoring document

    score(n)

    Normalized score (based on the score of the topmost document) of the nth top-scoring document, guaranteed to be greater than 0 and less than or equal to 1

    通過(guò)這幾個(gè)方法 可以得到搜索結(jié)果的相關(guān)信息, Hits也會(huì)caches 一些Documents 以便提升性能, 默認(rèn)caches 100的被認(rèn)為常用的結(jié)果.

    注意:

    ? The methods doc(n), id(n), and score(n) require documents to be loaded
    from the index when they aren’t already cached. This leads us to recommend
    only calling these methods for documents you truly need to display or access;
    defer calling them until needed.

    II.Paging through Hits

    Paging Hits時(shí) 用兩種方法可以使用:

    ■ Keep the original Hits and IndexSearcher instances available while theuser is navigating the search results.
    ■ Requery each time the user navigates to a new page.

    推薦使用第二種 ,這樣基于無(wú)狀態(tài)協(xié)議時(shí) 會(huì)簡(jiǎn)單些,Http 搜索(google search)

    III.reading index into memory

    有時(shí) 為了充分利用系統(tǒng)資源,提高性能 可以把index 讀入到內(nèi)存中搜索, :

    RAMDirectory ramDir = new RAMDirectory(dir);

    該構(gòu)造函數(shù)有幾個(gè)重載實(shí)現(xiàn),根據(jù)不同的數(shù)據(jù)來(lái)源構(gòu)造RAMDirectory 看看doc.

    3.Understanding Lucene Scoring

    Lucene 搜索返回的Hits中 的結(jié)果根據(jù)默認(rèn)的Score 排序,score 是根據(jù)如下公式計(jì)算的.

    上面公式的參數(shù)解釋如下:

    Factor

    Description

    tf(t in d)

    Term frequency factor for the term (t) in the document (d).

    idf(t)

    Inverse document frequency of the term.

    boost(t.field in d)

    Field boost, as set during indexing.

    lengthNorm(t.field in d)

    Normalization value of a field, given the number of terms within the field. This value is computed during indexing and stored in the index.

    coord(q, d)

    Coordination factor, based on the number of query terms the document contains.

    queryNorm(q)

    Normalization value for a query, given the sum of the squared weights of each of the query terms.

    關(guān)于Score的更多內(nèi)容參考 Similarity 類(lèi)的 docs.

    通過(guò) Explanation 類(lèi)可以了解到 document 各個(gè) score 的參數(shù)細(xì)節(jié) , toString 函數(shù)可以打印出來(lái) , 可以有 IndexSearch 得到 Explanation: 如下 :

    01 ?package?lia.searching;
    02 ?
    03 ?import?org.apache.lucene.analysis.SimpleAnalyzer;
    04 ?import?org.apache.lucene.document.Document;
    05 ?import?org.apache.lucene.queryParser.QueryParser;
    06 ?import?org.apache.lucene.search.Explanation;
    07 ?import?org.apache.lucene.search.Hits;
    08 ?import?org.apache.lucene.search.IndexSearcher;
    09 ?import?org.apache.lucene.search.Query;
    10 ?import?org.apache.lucene.store.FSDirectory;
    11 ?
    12 ?public?class?Explainer?{
    13 ???public?static?void?main(String[]?args)?throws?Exception?{
    14 ?????if?(args.length?!=?2)?{
    15 ???????System.err.println("Usage:?Explainer?<index?dir>?<query>");
    16 ???????System.exit(1);
    17 ?????}
    18 ?
    19 ?????String?indexDir?=?args[0];
    20 ?????String?queryExpression?=?args[1];
    21 ?
    22 ?????FSDirectory?directory?=
    23 ?????????FSDirectory.getDirectory(indexDir,?false);
    24 ?
    25 ?????Query?query?=?QueryParser.parse(queryExpression,
    26 ?????????"contents",?new?SimpleAnalyzer());
    27 ?
    28 ?????System.out.println("Query:?"?+?queryExpression);
    29 ?
    30 ?????IndexSearcher?searcher?=?new?IndexSearcher(directory);
    31 ?????Hits?hits?=?searcher.search(query);
    32 ?
    33 ?????for?(int?i?=?0;?i?<?hits.length();?i++)?{
    34 ???????Explanation?explanation?=????????????????? // Generate Explanation of single Document for query
    35 ???????????????????????????????searcher.explain(query,?hits.id(i));
    36 ?
    37 ???????System.out.println("----------");
    38 ???????Document?doc?=?hits.doc(i);
    39 ???????System.out.println(doc.get("title"));
    40 ???????System.out.println(explanation.toString());? // 打印出來(lái)結(jié)果
    41 ?????}
    42 ???}
    43 ?}

    結(jié)果如下:

    Query: junit

    ----------

    JUnit in Action

    0.65311843 = fieldWeight(contents:junit in 2), product of:

    ??? 1.4142135 = tf(termFreq(contents:junit)=2) // (1)junit contents 中出現(xiàn)兩次

    ??? 1.8472979 = idf(docFreq=2)

    ??? 0.25 = fieldNorm(field=contents, doc=2)

    ----------

    Java Development with Ant

    0.46182448 = fieldWeight(contents:junit in 1), product of:

    ??? 1.0 = tf(termFreq(contents:junit)=1)?? // (2)junit contents 中出現(xiàn)一次

    ??? 1.8472979 = idf(docFreq=2)

    ??? 0.25 = fieldNorm(field=contents, doc=1)

    (1) JUnit in Action has the term junit twice in its contents field. The contents field in

    our index is an aggregation of the title and subject fields to allow a single field

    for searching.

    (2) Java Development with Ant has the term junit only once in its contents field.

    還可以使用toHtml 方法轉(zhuǎn)換為Html代碼, Nutch 項(xiàng)目的核心就是利用Explanation(請(qǐng)參考Nutch 項(xiàng)目文檔).

    4.creating queries programmatically

    IndexSearch search函數(shù)需要一個(gè)Query實(shí)例, Query有不同的子類(lèi),分別應(yīng)用不同的場(chǎng)合,下面來(lái)看看各種Query:

    TermQuery
    TermQuery
    最簡(jiǎn)單(上文提到過(guò)), Term t=new Term("contents","junit"); new TermQuery(t)就可以構(gòu)造
    TermQuery
    把查詢(xún)條件視為一個(gè)keyword, 要求和查詢(xún)內(nèi)容完全匹配,比如Field.Keyword類(lèi)型就可以使用TermQuery

    RangeQuery
    RangeQuery
    看名字就知道是表示一個(gè)范圍的搜索條件,RangeQuery query = new RangeQuery(begin, end, included);
    boolean
    參數(shù)表示是否包含邊界條件本身, 用字符表示為"[begin TO end]"()包含邊界值 或者"{begin TO end}"(不包含邊界值)

    PrefixQuery
    顧名思義,就是表示以XX開(kāi)頭的查詢(xún), 字符表示為"something*"

    BooleanQuery
    邏輯組合的Query,你可以把各種Query添加進(jìn)去并標(biāo)明他們的邏輯關(guān)系,添加條件用如下方法

    public void add(Query query, boolean required, boolean prohibited)

    ? 后兩個(gè)boolean變量是標(biāo)示AND OR NOT三種關(guān)系(如果同時(shí)取true的話(huà)是不和邏輯的哦 ) 字符表示為" AND OR NOT" "+ -" ,一個(gè)BooleanQuery中可以添加多個(gè)Query, 如果超過(guò)setMaxClauseCount(int)的值(默認(rèn)1024個(gè))的話(huà),會(huì)拋出TooManyClauses錯(cuò)誤.

    ?? 3:兩個(gè)參數(shù)的組合

    ?

    required

    false

    true

    prohibited

    false

    Clause is optional

    Clause must match

    true

    Clause must not

    match

    Invalid

    PhraseQuery
    表示不嚴(yán)格語(yǔ)句的查詢(xún),比如"quick fox"要匹配"quick brown fox","quick brown high fox",PhraseQuery所以提供了一個(gè)setSlop()參數(shù),在查詢(xún)中,lucene會(huì)嘗試調(diào)整單詞的距離和位置,這個(gè)參數(shù)表示可以接受調(diào)整次數(shù)限制,如果實(shí)際的內(nèi)容可以在這么多步內(nèi)調(diào)整為完全匹配,那么就被視為匹配.在默認(rèn)情況下slop的值是0, 所以默認(rèn)是不支持非嚴(yán)格匹配的, 通過(guò)設(shè)置slop參數(shù)(比如"quick fox"匹配"quick brown fox"就需要1個(gè)slop來(lái)把fox后移動(dòng)1),我們可以讓lucene來(lái)模糊查詢(xún). 值得注意的是,PhraseQuery不保證前后單詞的次序,在上面的例子中,"fox quick"需要2個(gè)slop,也就是如果slop如果大于等于2,那么"fox quick"也會(huì)被認(rèn)為是匹配的.如果是多個(gè)Term的搜索,slop指最大的所以的用到次數(shù).看個(gè)例子就更明白了:

    01 ?package?lia.searching;
    02 ?
    03 ?import?junit.framework.TestCase;
    04 ?import?org.apache.lucene.analysis.WhitespaceAnalyzer;
    05 ?import?org.apache.lucene.document.Document;
    06 ?import?org.apache.lucene.document.Field;
    07 ?import?org.apache.lucene.index.IndexWriter;
    08 ?import?org.apache.lucene.index.Term;
    09 ?import?org.apache.lucene.search.Hits;
    10 ?import?org.apache.lucene.search.IndexSearcher;
    11 ?import?org.apache.lucene.search.PhraseQuery;
    12 ?import?org.apache.lucene.store.RAMDirectory;
    13 ?
    14 ?import?java.io.IOException;
    15 ?
    16 ?public?class?PhraseQueryTest?extends?TestCase?{
    17 ???private?IndexSearcher?searcher;
    18 ?
    19 ???protected?void?setUp()?throws?IOException?{
    20 ?????//?set?up?sample?document
    21 ?????RAMDirectory?directory?=?new?RAMDirectory();
    22 ?????IndexWriter?writer?=?new?IndexWriter(directory,
    23 ?????????new?WhitespaceAnalyzer(),?true);
    24 ?????Document?doc?=?new?Document();
    25 ?????doc.add(Field.Text("field",
    26 ???????????????"the?quick?brown?fox?jumped?over?the?lazy?dog"));
    27 ?????writer.addDocument(doc);
    28 ?????writer.close();
    29 ?
    30 ?????searcher?=?new?IndexSearcher(directory);
    31 ???}
    32 ?
    33 ???private?boolean?matched(String[]?phrase,?int?slop)
    34 ???????throws?IOException?{
    35 ?????PhraseQuery?query?=?new?PhraseQuery();
    36 ?????query.setSlop(slop);
    37 ?
    38 ?????for?(int?i=0;?i?<?phrase.length;?i++)?{
    39 ???????query.add(new?Term("field",?phrase[i]));
    40 ?????}
    41 ?
    42 ?????Hits?hits?=?searcher.search(query);
    43 ?????return?hits.length()?>?0;
    44 ???}
    45 ?
    46 ???public?void?testSlopComparison()?throws?Exception?{
    47 ?????String[]?phrase?=?new?String[]?{"quick",?"fox"};
    48 ?
    49 ?????assertFalse("exact?phrase?not?found",?matched(phrase,?0));
    50 ?
    51 ?????assertTrue("close?enough",?matched(phrase,?1));
    52 ???}
    53 ?
    54 ???public?void?testReverse()?throws?Exception?{
    55 ?????String[]?phrase?=?new?String[]?{"fox",?"quick"};
    56 ?
    57 ?????assertFalse("hop?flop",?matched(phrase,?2));
    58 ?????assertTrue("hop?hop?slop",?matched(phrase,?3));
    59 ???}
    60 ?
    61 ???public?void?testMultiple()?throws?Exception?{???? // 測(cè)試多個(gè) Term 的搜索
    62 ?????assertFalse("not?close?enough",
    63 ?????????matched(new?String[]?{"quick",?"jumped",?"lazy"},?3));
    64 ?
    65 ?????assertTrue("just?enough",
    66 ?????????matched(new?String[]?{"quick",?"jumped",?"lazy"},?4));
    67 ?
    68 ?????assertFalse("almost?but?not?quite",
    69 ?????????matched(new?String[]?{"lazy",?"jumped",?"quick"},?7));
    70 ?
    71 ?????assertTrue("bingo",
    72 ?????????matched(new?String[]?{"lazy",?"jumped",?"quick"},?8));
    73 ?
    74 ???}
    75 ?
    76 ?} ????

    WildcardQuery
    使用?(0或者一個(gè)字符)*(0 或者多個(gè)字符)來(lái)表示,比如?ild*可以匹配 wild ,mild ,wildcard ...,值得注意的是,wildcard,只要是匹配上的紀(jì)錄,他們的相關(guān)度都是一樣的,比如wildcard mild的對(duì)于?ild的相關(guān)度就是一樣的.

    FuzzyQuery
    他能模糊匹配英文單詞,比如fuzzywuzzy他們可以看成類(lèi)似, 對(duì)于英文的各種時(shí)態(tài)變化和復(fù)數(shù)形式,這個(gè)FuzzyQuery還算有用,匹配結(jié)果的相關(guān)度是不一樣的.字符表示為 "fuzzy~".特別是你忘記了一個(gè)單詞如何寫(xiě)了的時(shí)候最為有用, 比如 用google search 來(lái)搜索liceue? google 在搜索不到結(jié)果時(shí)候 會(huì)提醒你 是不是搜索Lucene? . 但是這個(gè)Query對(duì)中文沒(méi)有什么用處.

    5.parsing query expressions: QueryParser

    對(duì)于一個(gè)讓普通用戶(hù)使用的產(chǎn)品來(lái)說(shuō),使用搜索表達(dá)式還是比較人性化的.下面看看如何使用QueryParser來(lái)處理搜索表達(dá)式.

    注意: Whenever special characters are used in a query expression, you need to provide an escaping mechanism so that the special characters can be used in a normal fashion. QueryParser uses a backslash (\) to escape special characters within terms. The escapable characters are as follows: \ + - ! ( ) : ^ ] { } ~ * ???????? (特殊字符要用轉(zhuǎn)移字符表示)

    QueryParser 把用戶(hù)輸入的各種查詢(xún)條件轉(zhuǎn)為Query, 利用Query's toString方法可以打印出QueryParser解析后的等價(jià)的結(jié)果.通過(guò)該方式 可以了解 QueryParser是否安裝你的意愿工作.注意: QueryParser用到了Analyzer,不同的Analyzer可能會(huì)忽略stop word,所以QueryParser parse過(guò)后的QuerytoString未必和原來(lái)的String一樣.

    boolean 操作:

    or and not (或者+ - )表示 ,很容易理解

    分組:Groupping
    比如"(a AND b) or c",就是括號(hào)分組,也很容易理解

    域選擇:FieldSelectiong
    QueryParser
    的查詢(xún)條件是對(duì)默認(rèn)的Field進(jìn)行的, 它在QueryParser解析的時(shí)候編碼指定, 如果用戶(hù)需要在查詢(xún)條件中選用另外的Field, 可以使用如下語(yǔ)法: fieldname:a, 如果是多個(gè)分組,可以用fieldname:(a b c)表示.
     

    范圍搜索:range search

    使用[ begin? TO end](包括邊界條件) {begin TO end} 實(shí)現(xiàn).

    注意: Nondate range queries use the beginning and ending terms as the user entered them, without modification. In other words, the beginning and ending terms are not analyzed. Start and end terms must not contain whitespace, or parsing fails. In our example index, the field pubmonth isn’t a date field; it’s text of the format YYYYMM.

    在處理日期時(shí) 可以通過(guò)QueryParsersetLocale方法設(shè)置地區(qū) 處理I18N問(wèn)題. 見(jiàn)下面的例子:

    Phrase query:

    用雙引號(hào)引住的字符串 可以創(chuàng)建一個(gè)PhraseQuery, 在隱含之間的內(nèi)容被分析后創(chuàng)建Query可能把一些Stop word 忽略掉.如下:

    094 ???public?void?testPhraseQuery()?throws?Exception?{
    095 ?????Query?q?=?QueryParser.parse("\"This?is?Some?Phrase*\"",? // this is StandardAnalyzer 中為 stop word
    096 ?????????"field",?new?StandardAnalyzer());
    097 ?????assertEquals("analyzed",
    098 ?????????"\"some?phrase\"",?q.toString("field"));?? // 沒(méi)有 this is 出現(xiàn)
    099 ?
    100 ?????q?=?QueryParser.parse("\"term\"",?"field",?analyzer);
    101 ?????assertTrue("reduced?to?TermQuery",?q?instanceof?TermQuery);?
    102 ???}

    通配符搜索
    關(guān)于通配符搜索注意:QueryParser默認(rèn)不允許*號(hào)出現(xiàn)在開(kāi)始部分,這樣做的目的主要是為了防止用戶(hù)誤輸入* 從而導(dǎo)致嚴(yán)重的性能問(wèn)題

    Fuzzy query:

    ?~ 結(jié)尾代表一個(gè)Fuzzy.

    關(guān)于使用通配符 和模糊搜索都有不同的性能問(wèn)題.以后會(huì)討論到

    boosting query

    通過(guò)使用符號(hào)^后面跟個(gè)浮點(diǎn)值 可以設(shè)置該termboost.: junit^2.0 testing 設(shè)置 junit TermQuery boost值為 2.0
    testing TermQueryboost值還是默認(rèn)值1.0. 大家可以試試google search 有沒(méi)有該特性. :)

    QueryParser
    確實(shí)很好友 但是不是總是適合你的情況 來(lái)看看作者的觀(guān)點(diǎn)吧:

    To QueryParse or not to QueryParse?

    QueryParser is a quick and effortless way to give users powerful query construction,

    but it isn’t right for all scenarios. QueryParser can’t create every type of

    query that can be constructed using the API . In chapter 5, we detail a handful of

    API -only queries that have no QueryParser expression capability. You must keep

    in mind all the possibilities available when exposing free-form query parsing to

    an end user; some queries have the potential for performance bottlenecks, and

    the syntax used by the built-in QueryParser may not be suitable for your needs.

    You can exert some limited control by subclassing QueryParser (see section 6.3.1).

    Should you require different expression syntax or capabilities beyond what

    QueryParser offers, technologies such as ANTLR 7 and JavaCC 8 are great options.

    We don’t discuss the creation of a custom query parser; however, the source code

    for Lucene’s QueryParser is freely available for you to borrow from.

    You can often obtain a happy medium by combining a QueryParser -parsed

    query with API -created queries as clauses in a BooleanQuery . This approach is

    demonstrated in section 5.5.4. For example, if users need to constrain searches

    to a particular category or narrow them to a date range, you can have the user

    interface separate those selections into a category chooser or separate daterange

    fields.

    OK ch3 到此就結(jié)束了 現(xiàn)在可以在Application中添加其本的搜索功能了.慶賀啊!

    來(lái)個(gè)總結(jié):)

    Lucene rapidly provides highly relevant search results to queries. Most applications

    need only a few Lucene classes and methods to enable searching. The most

    fundamental things for you to take from this chapter are an understanding of

    the basic query types (of which TermQuery , RangeQuery , and BooleanQuery are the

    primary ones) and how to access search results.

    Although it can be a bit daunting, Lucene’s scoring formula (coupled with the

    index format discussed in appendix B and the efficient algorithms) provides the

    magic of returning the most relevant documents first. Lucene’s QueryParser

    parses human-readable query expressions, giving rich full-text search power to

    end users. QueryParser immediately satisfies most application requirements;

    however, it doesn’t come without caveats, so be sure you understand the rough

    edges. Much of the confusion regarding QueryParser stems from unexpected

    analysis interactions; chapter 4 goes into great detail about analysis, including

    more on the QueryParser issues.

    And yes, there is more to searching than we’ve covered in this chapter, but

    understanding the groundwork is crucial. Chapter 5 delves into Lucene’s more

    elaborate features, such as constraining (or filtering) the search space of queries

    and sorting search results by field values; chapter 6 explores the numerous

    ways you can extend Lucene’s searching capabilities for custom sorting and

    query parsing.
    posted on 2007-01-05 10:11 Lansing 閱讀(849) 評(píng)論(0)  編輯  收藏 所屬分類(lèi): 搜索引擎
    <2007年1月>
    31123456
    78910111213
    14151617181920
    21222324252627
    28293031123
    45678910

    歡迎探討,努力學(xué)習(xí)Java哈

    常用鏈接

    留言簿(3)

    隨筆分類(lèi)

    隨筆檔案

    文章分類(lèi)

    文章檔案

    Lansing's Download

    Lansing's Link

    我的博客

    搜索

    •  

    最新評(píng)論

    閱讀排行榜

    評(píng)論排行榜

    主站蜘蛛池模板: 日本精品人妻无码免费大全| 国产91色综合久久免费| 日韩成人免费视频播放| 亚洲视频国产精品| 无码AV片在线观看免费| 亚洲AV永久无码精品成人| 国产无遮挡裸体免费视频在线观看 | 777亚洲精品乱码久久久久久 | 亚洲福利一区二区| 猫咪免费人成网站在线观看| 99久久亚洲综合精品成人网| 四虎国产成人永久精品免费| 亚洲成AV人片久久| 免费观看美女裸体网站| 美女被免费网站视频在线| 亚洲?V无码成人精品区日韩| 人碰人碰人成人免费视频| 亚洲中文字幕无码久久综合网| 大地资源网高清在线观看免费| 亚洲AV无码专区国产乱码电影| 免费A级毛片无码A∨中文字幕下载| 久久久久亚洲AV无码专区体验| 青青草a免费线观a| 国产成人人综合亚洲欧美丁香花 | 青青草无码免费一二三区| 亚洲人成电影院在线观看| 精品久久洲久久久久护士免费| 一本一道dvd在线观看免费视频 | 亚洲国产精品丝袜在线观看| 花蝴蝶免费视频在线观看高清版 | 国产精品免费观看久久| 青青草97国产精品免费观看| 亚洲狠狠婷婷综合久久久久| 青青青免费国产在线视频小草| 国产成人综合亚洲| 香蕉视频在线观看亚洲| 国内一级一级毛片a免费| 国产在线精品一区免费香蕉| 亚洲AV一二三区成人影片| 亚洲欧洲日产国码一级毛片| 91精品全国免费观看含羞草|