亚洲国产精品乱码在线观看97,亚洲美女视频网址,亚洲七七久久精品中文国产

nativeFont和logicalFont在JDK1.4下的重大效率差异

zht — Mon, 02 Aug 2010 13:32:00 GMT

最�q�遇��C��个很奇怪的问题�Q�写了一个界面程序，刚开始没�?讄��字体�Q�效率还可以�Q�但是默认字体看着比较隄��Q�就改用了一个字体，谁知道在jdk1.4扚w��试�q�程中，效率居然比原来要低非帔R��常的多�?br /> 后来发现如果是jdk1.5及以上版本两者效率几乎一��P��见附件截图�?br /> 而且��Swing效率和增强功能来�Ԍ��JDK1.6u10及以上版本有非常大的提高�Q?br /> 所以如果条件运行，�q�是��都改成JDK1.6u10及以上版�?/font>

��试代码如下:

public class FontDemo extends JPanel {

    public static void main(String[] args) {
        JFrame f = new JFrame();
        f.setTitle("TWaver中文�C�֌�");
        f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        f.setContentPane(new FontDemo());
        f.setSize(800, 600);
        f.setLocationRelativeTo(null);
        f.setVisible(true);
    }

    private TDataBox box = new TDataBox();
    private BarChart chart = new BarChart(box);

    private static final int times = 1000;
    private static final int style = Font.BOLD;
    private static final int size = 16;

    public FontDemo() {
        initBox();
        initChart();
        initGUI();
    }

    private void initGUI() {
        this.setLayout(new BorderLayout());

        JScrollPane pane = new JScrollPane(chart.getLegendPane());

        this.add(chart, BorderLayout.CENTER);
        this.add(pane, BorderLayout.EAST);
    }

    private void initBox() {
        final List localFonts = new ArrayList();
        List nativeFonts = new ArrayList();

        // get all available fontFamily names
        Font[] fonts = SunGraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
        for (int i = 0; i < fonts.length; i++) {
            // separate logical and native font
            if (SunGraphicsEnvironment.isLogicalFont(fonts[i])) {
                localFonts.add(fonts[i]);
            }
            else {
                nativeFonts.add(fonts[i]);
            }
        }

        System.out.println("///////////// localFonts test /////////////");
        for (int i = 0; i < localFonts.size(); i++) {
            Font font = (Font) localFonts.get(i);
            long start = System.currentTimeMillis();
            for (int k = 0; k < times; k++) {
                createFont(font);
            }
            long spendTime = System.currentTimeMillis() - start;
            Node n = new Node();
            n.setName(font.getName());
            n.putChartValue(spendTime);
            n.putChartColor(Color.GREEN);
            box.addElement(n);
            //            System.out.println(">" + spendTime + "\t" + font.getName());
        }
        System.out.println("\n///////////// nativeFonts test /////////////");
        for (int i = 0; i < nativeFonts.size(); i++) {
            Font font = (Font) nativeFonts.get(i);
            long start = System.currentTimeMillis();
            for (int k = 0; k < times; k++) {
                createFont(font);
            }
            long spendTime = System.currentTimeMillis() - start;
            Node n = new Node();
            n.setName(font.getName());
            n.putChartValue(spendTime);
            n.putChartColor(Color.RED);
            box.addElement(n);
            //            System.out.println(">" + spendTime + "\t" + font.getName());
        }
        System.out.println("\n$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$\n");
    }

    private void createFont(Font font) {
        //     font.deriveFont(style, size);
        new Font(font.getName(), style, size);
    }

    private void initChart() {
        chart.setLegendLayout(TWaverConst.LEGEND_LAYOUT_VERTICAL);
        chart.setLegendOrientation(TWaverConst.LABEL_ORIENTATION_HORIZONTAL);
        chart.setYScaleTextVisible(true);
        chart.setShadowOffset(1);
    }
}

1.4��试�l�果

1.6��试�l�果

原帖地址

zht 2010-08-02 21:32 发表评论

Java中取JVM中内存的�Ҏ��

zht — Fri, 18 Apr 2008 06:10:00 GMT

Runtime runtime = Runtime.getRuntime();
long total = runtime.totalMemory();

long free = runtime.freeMemory();
System.out.println(total+"-"+free);

totalMemory()
    /**
     * Returns the total amount of memory in the Java virtual machine.
     * The value returned by this method may vary over time, depending on
     * the host environment.
     *

     * Note that the amount of memory required to hold an object of any
     * given type may be implementation-dependent.
     *
     * @return the total amount of memory currently available for current
     *          and future objects, measured in bytes.
     */
freeMemory()
    /**
     * Returns the amount of free memory in the Java Virtual Machine.
     * Calling the
     * gc method may result in increasing the value returned
     * by freeMemory.
     *
     * @return an approximation to the total amount of memory currently
     *          available for future allocated objects, measured in bytes.
     */

通过�q�个�Ҏ��Q�可以写一个类��g��Windows��d��理器的面板�Q?br />

zht 2008-04-18 14:10 发表评论

Reflect&Proxy

zht — Thu, 03 Apr 2008 02:57:00 GMT

Reflect&Proxy

Reflect&Proxy are two functions provided by java,
Following is some example about how to use it.

1.Reflect
By reflect, the Class instance can be created, not using 'new' method
as following:
(1)Creating the class instance:
Class clazz = Class.forName("twaver.Node");//Node.class;
Constructor cs = clazz.getConstructor(new Class[] { Object.class });
Object object = cs.newInstance(new Object[] { "ID-679" });
@param parameterTypes the parameter array
new Class[] { Object.class }
means that the process will use the construcor
which has one parameter to create the class instance.
cs.newInstance method will create the class instance with the parameter "new Object[] { "ID-679" }"
(2)Getting the class method:
Method getIDMethod = clazz.getMethod("getID", new Class[] {});
Method setNameMethod = clazz.getMethod("setName", new Class[] { String.class });
@param name the name of the method
@param parameterTypes the list of parameters
setNameMethod.invoke(object, new Object[] { "todd.zhang" });
Invokes the setNameMthod of object instance with the parameter new Object[] { "todd.zhang" }

public class ReflectTest {
    public static void main(String[] args) throws Exception {
        Node node = new Node("ID-679");
        node.setName("todd.zhang");
        System.out.println(node.getID());
        System.out.println(node.getName());

        System.out.println("-----------------------");

        Class clazz = Class.forName("twaver.Node");//Node.class;

        Constructor cs = clazz.getConstructor(new Class[] { Object.class });
        Method getIDMethod = clazz.getMethod("getID", new Class[] {});
        Method setNameMethod = clazz.getMethod("setName", new Class[] { String.class });
        Method getNameMethod = clazz.getMethod("getName", new Class[] {});

        Object object = cs.newInstance(new Object[] { "ID-679" });
        setNameMethod.invoke(object, new Object[] { "todd.zhang" });
        System.out.println(getIDMethod.invoke(object, new Object[] {}));
        System.out.println(getNameMethod.invoke(object, new Object[] {}));
    }
}

2.Proxy
Proxy is an application of reflect function.
ClassLoader classLoader = ProxyAnything.class.getClassLoader();
Class[] interfaces = new Class[] { Interface_A.class, Interface_B.class };
//the interface array that the proxy will realize
InvocationHandler handler = new ProxyAnything();
//the handler of the proxy , each invoked method of Interface_A and Interface_B will be hold up by handler
//the method invoke will turn to the handler first, and hanlder will deside how to deal with the invoke.
Proxy proxy = (Proxy) Proxy.newProxyInstance(classLoader, interfaces, handler);
//it is like the implement of interface ,so it can be transform to interface compulsively.
proxy instanceof Interface_A will be true

Interface_A a = (Interface_A) proxy;
Interface_B b = (Interface_B) proxy;
a.do_A1();
b.do_B2();

InvocationHandler may proxy sereval class,
public Object invoke(Object proxy, Method m, Object[] args) throws Throwable
the proxy can be used to judge which it is.
the proxy just is a rind, each operation will be deal with hanlder

interface Interface_A {
    public void do_A1();
    public void do_A2();
    public void do_A3();
}

interface Interface_B {
    public void do_B1();
    public void do_B2();
    public void do_B3();
}

public class ProxyAnything implements InvocationHandler {

    private Interface_A businessA;
    private Interface_B businessB;

    public ProxyAnything() {
        this.businessA = new Interface_A() {
            public void do_A1() {
                System.out.println("doing A1");
            }

            public void do_A2() {
                System.out.println("doing A2");
            }

            public void do_A3() {
                System.out.println("doing A3");
            }
        };
        this.businessB = new Interface_B() {
            public void do_B1() {
                System.out.println("doing B1");
            }

            public void do_B2() {
                System.out.println("doing B2");
            }

            public void do_B3() {
                System.out.println("doing B3");
            }
        };
    }

    public Object invoke(Object proxy, Method m, Object[] args) throws Throwable {
        if (m.getDeclaringClass() == Interface_A.class) {
            if (m.getName().equals("do_A3")) {
                System.out.println("you can not invoke do_A3");
            } else {
                return m.invoke(this.businessA, args);
            }
        }
        if (m.getDeclaringClass() == Interface_B.class) {
            System.out.println(m.getName() + " is called.");
            return m.invoke(this.businessB, args);
        }

        return null;
    }

    public static void main(String[] args) throws Exception {
        ClassLoader classLoader = ProxyAnything.class.getClassLoader();
        Class[] interfaces = new Class[] { Interface_A.class, Interface_B.class };
        InvocationHandler handler = new ProxyAnything();
        Proxy proxy = (Proxy) Proxy.newProxyInstance(classLoader, interfaces, handler);

        if (proxy instanceof Interface_A) {
            System.out.println("proxy instanceof Interface_A");
        }
        if (proxy instanceof Interface_B) {
            System.out.println("proxy instanceof Interface_B");
        }

        Interface_A a = (Interface_A) proxy;
        Interface_B b = (Interface_B) proxy;

        a.do_A1();
        a.do_A2();
        a.do_A3();
        b.do_B1();
        b.do_B2();
        b.do_B3();

    }

}

zht 2008-04-03 10:57 发表评论

Lucene学习

zht — Tue, 30 Oct 2007 02:02:00 GMT

(�?

本文首先介绍了Lucene的一些基本概念，然后开发了一个应用程序演�C�Z��利用Lucene建立索引�q�在该烦引上�q�行搜烦的过�E��?/span>

Lucene ��?/span>

Lucene 是一个基�?Java 的全文信息检索工具包�Q�它不是一个完整的搜烦应用�E�序�Q�而是��Z��的应用程序提供烦引和搜烦功能。Lucene 目前�?Apache Jakarta 家族中的一个开源项目。也是目前最为流行的��Z�� Java 开源全文检索工具包�?/span>

目前已经有很多应用程序的搜烦功能是基�?Lucene 的，比如 Eclipse 的帮助系�l�的搜烦功能。Lucene 能够为文本类型的数据建立索引�Q�所以你只要能把你要索引的数据格式�{化的文本的，Lucene ��p��对你的文档进行烦引和搜烦。比如你要对一�?HTML 文档�Q�PDF 文档�q�行索引的话你就首先需要把 HTML 文档�?PDF 文档转化成文本格式的�Q�然后将转化后的内容交给 Lucene �q�行索引�Q�然后把创徏好的索引文�g保存到磁盘或者内存中�Q�最后根据用戯��入的查询条�g在烦引文件上�q�行查询。不指定要烦引的文档的格式也�?Lucene 能够几乎适用于所有的搜烦应用�E�序�?/span>

�?1 表示了搜索应用程序和 Lucene 之间的关�p�，也反映了利用 Lucene 构徏搜烦应用�E�序的流�E�：

�?. 搜烦应用�E�序�?Lucene 之间的关�p?/span>

索引和搜�?/span>

索引是现代搜索引擎的核心�Q�徏立烦引的�q�程��是把源数据处理成非常方便查询的索引文�g的过�E�。�ؓ什么烦引这么重要呢�Q�试想你现在要在大量的文档中搜烦含有某个关键词的文档�Q�那么如果不建立索引的话你就需要把�q�些文档��序的读入内存，然后��查这个文章中是不是含有要查找的关键词�Q�这��L��话就会耗费非常多的旉��Q�想��x��索引擎可是在毫秒�U�的旉��内查扑և�要搜索的�l�果的。这��是�׃��建立了烦引的原因�Q�你可以把烦引想象成�q�样一�U�数据结构，他能够��你快速的随机讉K��存储在烦引中的关键词�Q�进而找到该关键词所兌��的文档。Lucene 采用的是一�U�称为反向烦引（inverted index�Q�的机制。反向烦引就是说我们�l�护了一个词/短语表，对于�q�个表中的每个词/短语�Q�都有一个链表描�q�C��有哪些文档包含了�q�个�?短语。这样在用户输入查询条�g的时候，��p��非常快的得到搜烦�l�果。我们将在本�p�d��文章的第二部分详�l�介�l?Lucene 的烦引机�Ӟ��׃�� Lucene 提供了简单易用的 API�Q�所以即使读者刚开始对全文本进行烦引的机制�q�不太了解，也可以非常容易的使用 Lucene 对你的文档实现烦引�?/span>

�Ҏ��档徏立好索引后，��可以在�q�些索引上面�q�行搜烦了。搜索引擎首先会�Ҏ��索的关键词进行解析，然后再在建立好的索引上面�q�行查找�Q�最�l�返回和用户输入的关键词相关联的文档�?/span>

Lucene 软�g包分�?/span>

Lucene 软�g包的发布形式是一�?JAR 文�g�Q�下面我们分析一下这�?JAR 文�g里面的主要的 JAVA 包，使读者对之有个初步的了解�?/span>

Package: org.apache.lucene.document

�q�个包提供了一些�ؓ��装要烦引的文档所需要的�c�，比如 Document, Field。这��P��每一个文档最�l�被��装成了一�?Document 对象�?/span>

Package: org.apache.lucene.analysis

�q�个包主要功能是�Ҏ��档进行分词，因�ؓ文档在徏立烦引之前必��要�q�行分词�Q�所以这个包的作用可以看成是为徏立烦引做准备工作�?/span>

Package: org.apache.lucene.index

�q�个包提供了一些类来协助创建烦引以及对创徏好的索引�q�行更新。这里面有两个基��的类�Q�IndexWriter �?IndexReader�Q�其�?IndexWriter 是用来创建烦引�ƈ��d��文档到烦引中的，IndexReader 是用来删除烦引中的文档的�?/span>

Package: org.apache.lucene.search

�q�个包提供了对在建立好的索引上进行搜索所需要的�c�R��比�?IndexSearcher �?Hits, IndexSearcher 定义了在指定的烦引上�q�行搜烦的方法，Hits 用来保存搜烦得到的结果�?/span>

一个简单的搜烦应用�E�序

假设我们的电脑的目录中含有很多文本文档，我们需要查扑֓�些文档含有某个关键词。�ؓ了实现这�U�功能，我们首先利用 Lucene 对这个目录中的文档徏立烦引，然后在徏立好的烦引中搜烦我们所要查扄��文档。通过�q�个例子读者会对如何利�?Lucene 构徏自己的搜索应用程序有个比较清楚的认识�?/span>

建立索引

��Z��Ҏ��档进行烦引，Lucene 提供了五个基��的类�Q�他们分别是 Document, Field, IndexWriter, Analyzer, Directory。下面我们分别介�l�一下这五个�cȝ��用途：

Document

Document 是用来描�q�文档的�Q�这里的文档可以指一�?HTML ��面�Q�一��电子邮�Ӟ��或者是一个文本文件。一�?Document 对象由多�?Field 对象�l�成的。可以把一�?Document 对象惌��成数据库中的一个记录，而每�?Field 对象��是记录的一个字�D�c�?/span>

Field

Field 对象是用来描�q�C��个文档的某个属性的�Q�比如一��电子邮件的标题和内容可以用两个 Field 对象分别描述�?/span>

Analyzer

在一个文档被索引之前�Q�首先需要对文档内容�q�行分词处理�Q�这部分工作��是�?Analyzer 来做的。Analyzer �c�L��一个抽象类�Q�它有多个实现。针对不同的语言和应用需要选择适合�?Analyzer。Analyzer 把分词后的内容交�l?IndexWriter 来徏立烦引�?/span>

IndexWriter

IndexWriter �?Lucene 用来创徏索引的一个核心的�c�，他的作用是把一个个�?Document 对象加到索引中来�?/span>

Directory

�q�个�c�M��表了 Lucene 的烦引的存储的位�|�，�q�是一个抽象类�Q�它目前有两个实玎ͼ��W�一个是 FSDirectory�Q�它表示一个存储在文�g�pȝ��中的索引的位�|�。第二个�?RAMDirectory�Q�它表示一个存储在内存当中的烦引的位置�?/span>

熟悉了徏立烦引所需要的�q�些�c�d��Q�我们就开始对某个目录下面的文本文件徏立烦引了�Q�清�?�l�出了对某个目录下的文本文�g建立索引的源代码�?/span>

清单 1. �Ҏ��本文件徏立烦�?/span>

package TestLucene;
            import java.io.File;
            import java.io.FileReader;
            import java.io.Reader;
            import java.util.Date;
            import org.apache.lucene.analysis.Analyzer;
            import org.apache.lucene.analysis.standard.StandardAnalyzer;
            import org.apache.lucene.document.Document;
            import org.apache.lucene.document.Field;
            import org.apache.lucene.index.IndexWriter;
            /**
            * This class demonstrate the process of creating index with Lucene
            * for text files
            */
            public class TxtFileIndexer {
            public static void main(String[] args) throws Exception{
            //indexDir is the directory that hosts Lucene's index files
            File   indexDir = new File("D:\\luceneIndex");
            //dataDir is the directory that hosts the text files that to be indexed
            File   dataDir  = new File("D:\\luceneData");
            Analyzer luceneAnalyzer = new StandardAnalyzer();
            File[] dataFiles  = dataDir.listFiles();
            IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);
            long startTime = new Date().getTime();
            for(int i = 0; i < dataFiles.length; i++){
            if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")){
            System.out.println("Indexing file " + dataFiles[i].getCanonicalPath());
            Document document = new Document();
            Reader txtReader = new FileReader(dataFiles[i]);
            document.add(new Field("path", dataFiles[i].getCanonicalPath(),

                                          Field.Store.YES, Field.Index.TOKENIZED));
            document.add(new Field("contents", txtReader));
            indexWriter.addDocument(document);
            }
            }
            indexWriter.optimize();
            indexWriter.close();
            long endTime = new Date().getTime();
            System.out.println("It takes " + (endTime - startTime)
            + " milliseconds to create index for the files in directory "
            + dataDir.getPath());
            }
            }

在清�?中，我们注意到类 IndexWriter 的构造函数需要三个参敎ͼ��W�一个参数指定了所创徏的烦引要存放的位�|�，他可以是一�?File 对象�Q�也可以是一�?FSDirectory 对象或�?RAMDirectory 对象。第二个参数指定�?Analyzer �cȝ��一个实玎ͼ�也就是指定这个烦引是用哪个分词器�Ҏ��挡内容进行分词。第三个参数是一个布��型的变量，如果�?true 的话��׃��表创��Z��个新的烦引，�?false 的话��׃��表在原来索引的基��上进行操作。接着�E�序遍历了目录下面的所有文本文档，�q��ؓ每一个文本文档创��Z��一�?Document 对象。然后把文本文档的两个属性：路径和内容加入到了两�?Field 对象中，接着在把�q�两�?Field 对象加入�?Document 对象中，最后把�q�个文档�?IndexWriter �cȝ�� add �Ҏ��加入到烦引中厅R��这��h��们便完成了烦引的创徏。接下来我们�q�入在徏立好的烦引上�q�行搜烦的部分�?/span>

搜烦文档

利用Lucene�q�行搜烦��像建立索引一样也是非常方便的。在上面一部分中，我们已经��Z��个目录下的文本文档徏立好了烦引，现在我们��p��在这个烦引上�q�行搜烦以找到包含某个关键词或短语的文档。Lucene提供了几个基��的类来完成这个过�E�，它们分别是呢IndexSearcher, Term, Query, TermQuery, Hits. 下面我们分别介绍�q�几个类的功能�?/span>

Query

�q�是一个抽象类�Q�他有多个实玎ͼ�比如TermQuery, BooleanQuery, PrefixQuery. �q�个�cȝ��目的是把用户输入的查询字�W�串��装成Lucene能够识别的Query�?/span>

Term

Term是搜索的基本单位�Q�一个Term对象有两个String�c�d��的域�l�成。生成一个Term对象可以有如下一条语句来完成�Q�Term term = new Term(“fieldName”,”queryWord”); 其中�W�一个参��C��表了要在文档的哪一个Field上进行查找，�W�二个参��C��表了要查询的关键词�?/span>

TermQuery

TermQuery是抽象类Query的一个子�c�，它同时也是Lucene支持的最为基本的一个查询类。生成一个TermQuery对象由如下语句完成： TermQuery termQuery = new TermQuery(new Term(“fieldName”,”queryWord”)); 它的构造函数只接受一个参敎ͼ�那就是一个Term对象�?/span>

IndexSearcher

IndexSearcher是用来在建立好的索引上进行搜索的。它只能以只�ȝ��方式打开一个烦引，所以可以有多个IndexSearcher的实例在一个烦引上�q�行操作�?/span>

Hits

Hits是用来保存搜索的�l�果的�?/span>

介绍完这些搜索所必须的类之后�Q�我们就开始在之前所建立的烦引上�q�行搜烦了，清单2�l�出了完成搜索功能所需要的代码�?/span>

清单2 �Q�在建立好的索引上进行搜�?/span>

package TestLucene;
            import java.io.File;
            import org.apache.lucene.document.Document;
            import org.apache.lucene.index.Term;
            import org.apache.lucene.search.Hits;
            import org.apache.lucene.search.IndexSearcher;
            import org.apache.lucene.search.TermQuery;
            import org.apache.lucene.store.FSDirectory;
            /**
            * This class is used to demonstrate the
            * process of searching on an existing
            * Lucene index
            *
            */
            public class TxtFileSearcher {
            public static void main(String[] args) throws Exception{
            String queryStr = "lucene";
            //This is the directory that hosts the Lucene index
            File indexDir = new File("D:\\luceneIndex");
            FSDirectory directory = FSDirectory.getDirectory(indexDir,false);
            IndexSearcher searcher = new IndexSearcher(directory);
            if(!indexDir.exists()){
            System.out.println("The Lucene index is not exist");
            return;
            }
            Term term = new Term("contents",queryStr.toLowerCase());
            TermQuery luceneQuery = new TermQuery(term);
            Hits hits = searcher.search(luceneQuery);
            for(int i = 0; i < hits.length(); i++){
            Document document = hits.doc(i);
            System.out.println("File: " + document.get("path"));
            }
            }
            }

在清�?中，�c�IndexSearcher的构造函数接受一个类型�ؓDirectory的对象，Directory是一个抽象类�Q�它目前有两个子�c�：FSDirctory和RAMDirectory. 我们的程序中传入了一个FSDirctory对象作�ؓ其参敎ͼ�代表了一个存储在��盘上的索引的位�|�。构造函数执行完成后�Q�代表了�q�个IndexSearcher以只�ȝ��方式打开了一个烦引。然后我们程序构造了一个Term对象�Q�通过�q�个Term对象�Q�我们指定了要在文档的内容中搜烦包含关键�?#8221;lucene”的文档。接着利用�q�个Term对象构造出TermQuery对象�q�把�q�个TermQuery对象传入到IndexSearcher的search�Ҏ��中进行查询，�q�回的结果保存在Hits对象中。最后我们用了一个��@环语句把搜烦到的文档的�\径都打印了出来。好了，我们的搜索应用程序已�l�开发完毕，怎么��P��利用Lucene开发搜索应用程序是不是很简单�?/span>

�ȝ��

本文首先介绍�?Lucene 的一些基本概念，然后开发了一个应用程序演�C�Z��利用 Lucene 建立索引�q�在该烦引上�q�行搜烦的过�E�。希望本文能够�ؓ学习 Lucene 的读者提供帮助�?br />

架构概览

图一昄��?Lucene 的烦引机制的架构。Lucene 使用各种解析器对各种不同�c�d��的文档进行解析。比如对�?HTML 文档�Q�HTML 解析器会做一些预处理的工作，比如�q��o文档中的 HTML 标签�{�等。HTML 解析器的输出的是文本内容�Q�接着 Lucene 的分词器(Analyzer)从文本内容中提取出烦引项以及相关信息�Q�比如烦引项的出现频率。接着 Lucene 的分词器把这些信息写到烦引文件中�?/p>

图一�Q�Lucene 索引机制架构

用Lucene索引文档

接下来我��一步一步的来演�C�如何利�?Lucene ��Z��的文档创建烦引。只要你能将要烦引的文�g转化成文本格式，Lucene ��p��Z��的文档徏立烦引。比如，如果你想�?HTML 文档或�?PDF 文档建立索引�Q�那么首先你��需要从�q�些文档中提取出文本信息�Q�然后把文本信息交给 Lucene 建立索引。我们接下来的例子用来演�C�如何利�?Lucene 为后�~�名�ؓ txt 的文件徏立烦引�?/p>

1�Q?准备文本文�g

首先把一些以 txt 为后�~�名的文本文�g攑ֈ�一个目录中�Q�比如在 Windows �q�_��上，你可以放�?C:\\files_to_index 下面�?/p>

2�Q?创徏索引

清单1是�ؓ我们所准备的文档创建烦引的代码�?/p>

清单1�Q�用 Lucene 索引你的文档

package lucene.index;
            import java.io.File;
            import java.io.FileReader;
            import java.io.Reader;
            import java.util.Date;
            import org.apache.lucene.analysis.Analyzer;
            import org.apache.lucene.analysis.standard.StandardAnalyzer;
            import org.apache.lucene.document.Document;
            import org.apache.lucene.document.Field;
            import org.apache.lucene.index.IndexWriter;
            /**
            * This class demonstrates the process of creating an index with Lucene
            * for text files in a directory.
            */
            public class TextFileIndexer {
            public static void main(String[] args) throws Exception{
            //fileDir is the directory that contains the text files to be indexed
            File   fileDir  = new File("C:\\files_to_index ");
            //indexDir is the directory that hosts Lucene's index files
            File   indexDir = new File("C:\\luceneIndex");
            Analyzer luceneAnalyzer = new StandardAnalyzer();
            IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);
            File[] textFiles  = fileDir.listFiles();
            long startTime = new Date().getTime();
            //Add documents to the index
            for(int i = 0; i < textFiles.length; i++){
            if(textFiles[i].isFile() >> textFiles[i].getName().endsWith(".txt")){
            System.out.println("File " + textFiles[i].getCanonicalPath()
            + " is being indexed");
            Reader textReader = new FileReader(textFiles[i]);
            Document document = new Document();
            document.add(Field.Text("content",textReader));
            document.add(Field.Text("path",textFiles[i].getPath()));
            indexWriter.addDocument(document);
            }
            }
            indexWriter.optimize();
            indexWriter.close();
            long endTime = new Date().getTime();
            System.out.println("It took " + (endTime - startTime)
            + " milliseconds to create an index for the files in the directory "
            + fileDir.getPath());
            }
            }

正如清单1所�C�，你可以利�?Lucene 非常方便的�ؓ文档创徏索引。接下来我们分析一下清�?中的比较关键的代码，我们先从下面的一条语句开始看赗��?/p>

Analyzer luceneAnalyzer = new StandardAnalyzer();

�q�条语句创徏了类 StandardAnalyzer 的一个实例，�q�个�c�L��用来从文本中提取出烦引项的。它只是抽象�c?Analyzer 的其中一个实现。Analyzer 也有一些其它的子类�Q�比�?SimpleAnalyzer �{��?/p>

我们接着看另外一条语句：

IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true);

�q�条语句创徏了类 IndexWriter 的一个实例，该类也是 Lucene 索引机制里面的一个关键类。这个类能创��Z��个新的烦引或者打开一个已存在的烦引�ƈ��所引添加文档。我们注意到该类的构造函数接受三个参敎ͼ��W�一个参数指定了存储索引文�g的�\径。第二个参数指定了在索引�q�程中��用什么样的分词器。最后一个参数是个布��变量，如果��gؓ真，那么��p��C��创徏一个新的烦引，如果��gؓ假，��p��C�打开一个已�l�存在的索引�?/p>

接下来的代码演示了如何添加一个文档到索引文�g中�?/p>

Document document = new Document();
            document.add(Field.Text("content",textReader));
            document.add(Field.Text("path",textFiles[i].getPath()));
            indexWriter.addDocument(document);

首先�W�一行创��Z��c?Document 的一个实例，它由一个或者多个的�?Field)�l�成。你可以把这个类惌��成代表了一个实际的文档�Q�比如一�?HTML ��面�Q�一�?PDF 文档�Q�或者一个文本文件。而类 Document 中的域一般就是实际文档的一些属性。比如对于一�?HTML ��面�Q�它的域可能包括标题�Q�内容，URL �{�。我们可以用不同�c�d��?Field 来控制文档的哪些内容应该索引�Q�哪些内容应该存储。如果想获取更多的关�?Lucene 的域的信息，可以参�?Lucene 的帮助文档。代码的�W�二行和�W�三行�ؓ文档��d��了两个域�Q�每个域包含两个属性，分别是域的名字和域的内容。在我们的例子中两个域的名字分别�?content"�?path"。分别存储了我们需要烦引的文本文�g的内容和路径。最后一行把准备好的文档��d��C��索引当中�?/p>

当我们把文档��d��到烦引中后，不要忘记关闭索引�Q�这��h��保证 Lucene 把添加的文档写回到硬盘上。下面的一句代码演�C�Z��如何关闭索引�?/p>

indexWriter.close();

利用清单1中的代码�Q�你��可以成功的��文本文档添加到索引中去。接下来我们看看对烦引进行的另外一�U�重要的操作�Q�从索引中删除文档�?/p>

从烦引中删除文档

�c�IndexReader负责从一个已�l�存在的索引中删除文档，如清�?所�C��?/p>

清单2�Q�从索引中删除文�?/strong>

File   indexDir = new File("C:\\luceneIndex");
            IndexReader ir = IndexReader.open(indexDir);
            ir.delete(1);
            ir.delete(new Term("path","C:\\file_to_index\lucene.txt"));
            ir.close();

在清�?中，�W�二行用静态方�?IndexReader.open(indexDir) 初始化了�c?IndexReader 的一个实例，�q�个�Ҏ��的参数指定了索引的存储�\径。类 IndexReader 提供了两�U�方法去删除一个文档，如程序中的第三行和第四行所�C�。第三行利用文档的编��h��删除文档。每个文档都有一个系�l�自动生成的�~�号。第四行删除了�\径�ؓ"C:\\file_to_index\lucene.txt"的文档。你可以通过指定文�g路径来方便的删除一个文档。值得注意的是虽然利用上述代码删除文档使得该文档不能被��索到�Q�但是�ƈ没有物理上删除该文档。Lucene 只是通过一个后�~�名�ؓ .delete 的文件来标记哪些文档已经被删除。既然没有物理上删除�Q�我们可以方便的把这些标��Cؓ删除的文档恢复过来，如清�?3 所�C�，首先打开一个烦引，然后调用�Ҏ�� ir.undeleteAll() 来完成恢复工作�?/p>

清单3�Q�恢复已删除文档

File indexDir = new File("C:\\luceneIndex"); IndexReader ir = IndexReader.open(indexDir); ir.undeleteAll(); ir.close();

你现在也许想知道如何物理上删除烦引中的文档，�Ҏ��也非常简单。清�?4 演示了这个过�E��?/p>

清单4�Q�如何物理上删除文档

File indexDir = new File("C:\\luceneIndex"); Analyzer luceneAnalyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,false); indexWriter.optimize(); indexWriter.close();

在清�?4 中，�W�三行创��Z��c?IndexWriter 的一个实例，�q�且打开了一个已�l�存在的索引。第 4 行对索引�q�行清理�Q�清理过�E�中��把所有标��Cؓ删除的文档物理删除�?/p>
Lucene 没有直接提供�Ҏ��Ҏ��档进行更斎ͼ�如果你需要更��C��个文档，那么你首先需要把�q�个文档从烦引中删除�Q�然后把新版本的文档加入到烦引中厅R�?/p>

提高索引性能

利用 Lucene�Q�在创徏索引的工�E�中你可以充分利用机器的��g资源来提高烦引的效率。当你需要烦引大量的文�g�Ӟ��你会注意到烦引过�E�的瓉��是在往��盘上写索引文�g的过�E�中。�ؓ了解册��个问�? Lucene 在内存中持有一块缓冲区。但我们如何控制 Lucene 的缓冲区呢？�q�运的是�Q�Lucene 的类 IndexWriter 提供了三个参数用来调整缓冲区的大��以及往��盘上写索引文�g的频率�?/p>
1�Q�合�q�因子（mergeFactor�Q?/p>
�q�个参数军_��了在 Lucene 的一个烦引块中可以存攑֤��文档以及把��盘上的索引块合�q�成一个大的烦引块的频率。比如，如果合�ƈ因子的值是 10�Q�那么当内存中的文档数达�?10 的时候所有的文档都必��d��到磁盘上的一个新的烦引块中。�ƈ且，如果��盘上的索引块的隔数辑ֈ� 10 的话�Q�这 10 个烦引块会被合�ƈ成一个新的烦引块。这个参数的默认值是 10�Q�如果需要烦引的文档数非常多的话�q�个值将是非�怸�合适的。对批处理的索引来讲�Q��ؓ�q�个参数赋一个比较大的��g��得到比较好的索引效果�?/p>
2�Q�最��合�q�文档数

�q�个参数也会影响索引的性能。它军_��了内存中的文档数臛_��辑ֈ�多少才能��它们写回磁盘。这个参数的默认值是10�Q�如果你有��够的内存�Q�那么将�q�个值尽量设的比较大一些将会显著的提高索引性能�?/p>
3�Q�最大合�q�文档数

�q�个参数军_��了一个烦引块中的最大的文档数。它的默认值是 Integer.MAX_VALUE�Q�将�q�个参数讄��为比较大的值可以提高烦引效率和��索速度�Q�由于该参数的默认值是整型的最大��|��所以我们一般不需要改动这个参数�?/p>
清单 5 列出了这个三个参数用法，清单 5 和清�?1 非常�怼��Q�除了清�?5 中会讄��刚才提到的三个参数�?/p>

清单5�Q�提高烦引性能

/** * This class demonstrates how to improve the indexing performance * by adjusting the parameters provided by IndexWriter. */ public class AdvancedTextFileIndexer { public static void main(String[] args) throws Exception{ //fileDir is the directory that contains the text files to be indexed File fileDir = new File("C:\\files_to_index"); //indexDir is the directory that hosts Lucene's index files File indexDir = new File("C:\\luceneIndex"); Analyzer luceneAnalyzer = new StandardAnalyzer(); File[] textFiles = fileDir.listFiles(); long startTime = new Date().getTime(); int mergeFactor = 10; int minMergeDocs = 10; int maxMergeDocs = Integer.MAX_VALUE; IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true); indexWriter.mergeFactor = mergeFactor; indexWriter.minMergeDocs = minMergeDocs; indexWriter.maxMergeDocs = maxMergeDocs; //Add documents to the index for(int i = 0; i < textFiles.length; i++){ if(textFiles[i].isFile() >> textFiles[i].getName().endsWith(".txt")){ Reader textReader = new FileReader(textFiles[i]); Document document = new Document(); document.add(Field.Text("content",textReader)); document.add(Field.Keyword("path",textFiles[i].getPath())); indexWriter.addDocument(document); } } indexWriter.optimize(); indexWriter.close(); long endTime = new Date().getTime(); System.out.println("MergeFactor: " + indexWriter.mergeFactor); System.out.println("MinMergeDocs: " + indexWriter.minMergeDocs); System.out.println("MaxMergeDocs: " + indexWriter.maxMergeDocs); System.out.println("Document number: " + textFiles.length); System.out.println("Time consumed: " + (endTime - startTime) + " milliseconds"); } }

通过�q�个例子�Q�我们注意到在调整缓冲区的大��以及写��盘的频率上�?Lucene �l�我们提供了非常大的灉|��性。现在我们来看一下代码中的关键语句。如下的代码首先创徏了类 IndexWriter 的一个实例，然后对它的三个参数进行赋倹{�?/p>

int mergeFactor = 10; int minMergeDocs = 10; int maxMergeDocs = Integer.MAX_VALUE; IndexWriter indexWriter = new IndexWriter(indexDir,luceneAnalyzer,true); indexWriter.mergeFactor = mergeFactor; indexWriter.minMergeDocs = minMergeDocs; indexWriter.maxMergeDocs = maxMergeDocs;

下面我们来看一下这三个参数取不同的值对索引旉��的媄响，注意参数值的不同和烦引之间的关系。我们�ؓ�q�个实验准备�?10000 个测试文档。表 1 昄��了测试结果�?/p>

�?�Q�测试结�?/strong>

通过�?1�Q�你可以清楚地看��C��个参数对索引旉��的媄响。在实践中，你会�l�常的改变合�q�因子和最��合�q�文档数的值来提高索引性能。只要你有��够大的内存，你可以�ؓ合�ƈ因子和最��合�q�文档数�q�两个参数赋��量大的��g��提高索引效率�Q�另外我们一般无需更改最大合�q�文档数�q�个参数的��|��因�ؓ�pȝ��已经默认��它讄��成了最大�?/p>

Lucene 索引文�g�l�构分析

在分�?Lucene 的烦引文件结构之前，我们先要理解反向索引�Q�Inverted index�Q�这个概念，反向索引是一�U�以索引��ؓ中心来组�l�文档的方式�Q�每个烦引项指向一个文档序列，�q�个序列中的文档都包含该索引��V��相反，在正向烦引中�Q�文档占据了中心的位�|�，每个文档指向了一个它所包含的烦引项的序列。你可以利用反向索引��L��的找到那些文档包含了特定的烦引项。Lucene正是使用了反向烦引作为其基本的烦引结构�?/p>

索引文�g的逻辑视图

在Lucene 中有索引块的概念�Q�每个烦引块包含了一定数目的文档。我们能够对单独的烦引块�q�行��索。图 2 昄��?Lucene 索引�l�构的逻辑视图。烦引块的个数由索引的文档的��L��以及每个索引块所能包含的最大文档数来决定�?/p>

�?�Q�烦引文件的逻辑视图

Lucene 中的关键索引文�g

下面的部分将会分析Lucene中的主要的烦引文�Ӟ��可能分析有些索引文�g的时候没有包含文件的所有的字段�Q�但不会影响到对索引文�g的理解�?/p>
1�Q�烦引块文�g

�q�个文�g包含了烦引中的烦引块信息�Q�这个文件包含了每个索引块的名字以及大小�{�信息。表 2 昄��了这个文件的�l�构信息�?/p>

�?�Q�烦引块文�g�l�构

2�Q�域信息文�g

我们知道�Q�烦引中的文档由一个或者多个域�l�成�Q�这个文件包含了每个索引块中的域的信息。表 3 昄��了这个文件的�l�构�?/p>

�?�Q�域信息文�g�l�构

3�Q�烦引项信息文�g

�q�是索引文�g里面最核心的一个文�Ӟ��它存储了所有的索引��的��g��及相关信息，�q�且以烦引项来排序。表 4 昄��了这个文件的�l�构�?/p>

�?�Q�烦引项信息文�g�l�构

4�Q�频率文�?/p>
�q�个文�g包含了包含烦引项的文档的列表�Q�以及烦引项在每个文档中出现的频率信息。如果Lucene在烦引项信息文�g中发现有索引��和搜烦词相匚w��。那�?Lucene ��׃��在频率文件中找有哪些文�g包含了该索引��V��表5昄��了这个文件的一个大致的�l�构�Q��ƈ没有包含�q�个文�g的所有字�D�c�?/p>

�?�Q�频率文件的�l�构

5�Q�位�|�文�?/p>
�q�个文�g包含了烦引项在每个文档中出现的位�|�信息，你可以利用这些信息来参与对烦引结果的排序。表 6 昄��了这个文件的�l�构

�?�Q�位�|�文件的�l�构

到目前�ؓ止我们介�l�了 Lucene 中的主要的烦引文件结构，希望能对你理�?Lucene 的物理的存储�l�构有所帮助�?/p>

�ȝ��

目前已经有非常多的知名的�l�织正在使用 Lucene�Q�比如，Lucene �?Eclipse 的帮助系�l�，�ȝ��理工学院�?OpenCourseWare 提供了搜索功能。通过阅读�q�篇文章�Q�希望你能对 Lucene 的烦引机制有所了解�Q��ƈ且你会发现利�?Lucene 创徏索引是非常简单的事情�?/p>

参考资�?

学习

您可以参阅本文在 developerWorks 全球站点上的英文原文 �?br />

实战 Lucene: 初识 Lucene 介绍�?Lucene 的一些基本概念，然后开发了一个应用程序演�C�Z��利用 Lucene 建立索引�q�在该烦引上�q�行搜烦的过�E��?

Parsing, indexing, and searching XML with Digester and Lucene �?Otis Gospodnetic �?developerWorks 上发表的一��关于利�?Lucene �?Digester 来操�?XML 文档的文章�?

IBM Search and Index APIs (SIAPI) for WebSphere Information Integrator OmniFind Edition �?Srinivas Varma Chitiveli �?developerWorks 上发表的一��关于如何用 SIAPI 来构建搜索解��x��案的文章�?

Lucene的官方网�?/a>�Q�上面有大量�?Lucene 帮助文档�?

一个关�?Lucene 的演�?/a>�Q�是�?Lucene 最初的作�?Doug Cutting �?Pisa 大学所作�?

��C��信息��?/em>是由 Ricardo Baeza-Yates �?Berthier Ribeiro-Neto 所写的关于信息��索方面的一本著作�?

developerWorks Web Architecture 专区�Q�上面有很多关于如何构徏�|�站的技术文章�?

获得产品和技�?/strong>

下蝲 Lucene 最新版本�?

zht 2007-10-30 10:02 发表评论