當(dāng)柳上原的風(fēng)吹向天際的時(shí)候...

真正的快樂(lè)來(lái)源于創(chuàng)造

:: 管理

368 Posts :: 1 Stories :: 201 Comments :: 0 Trackbacks

編程中有時(shí)需要將一段文本分解成標(biāo)記，比如說(shuō)14+2*3需要變成14，+，2，*，3的樣式，再比如說(shuō)select a from b，需要變成select,a,from,b的形式，要寫(xiě)出這樣的代碼不難，考慮到通用性，于是我制作了下面這個(gè)通用類(lèi)，用戶(hù)只需要指定合法字符和分隔字符的正則表達(dá)式，程序即能將字符串分解成標(biāo)記并注明類(lèi)型，下面是源碼：

1.用于表示標(biāo)記的Token類(lèi)，含有文本和類(lèi)型兩個(gè)屬性:

package com.heyang.tokenmaker;

/**
* 標(biāo)記類(lèi)，內(nèi)含文本及類(lèi)型
* 說(shuō)明：
* 作者：heyang(heyang78@gmail.com)
*/
public class Token{
    // 有效內(nèi)容類(lèi)型
    public static final String Type_Content="Content";

    // 分隔符類(lèi)型
    public static final String Type_Separator="Seperator";

    // 標(biāo)記文本
    private String text;

    // 標(biāo)記類(lèi)型
    private String type;

    /**
     * 構(gòu)造函數(shù)
     * @param text
     * @param type
     */
    public Token(String text,String type){
        this.text=text;

        if(type.equals(Type_Content) || type.equals(Type_Separator)){
            this.type=type;
        }
        else{
            throw new IllegalArgumentException(type+"不是有效的類(lèi)型。");
        }

    }

    public String getText() {
        return text;
    }

    public String getType() {
        return type;
    }
}

2.用于分解的TokenMaker類(lèi)：

package com.heyang.tokenmaker;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.lang.StringUtils;

/**
* 傳入一個(gè)字符串，將它轉(zhuǎn)化為記號(hào)放在鏈表中
* 說(shuō)明：
* 作者：何楊(heyang78@gmail.com)
*/
public class TokenMaker{
    // 來(lái)源字符串
    private String sourceString;

    // 用正則表達(dá)式表示的,表示單個(gè)有效字符的字符串，注意這個(gè)是表示合法單字字符的正則表達(dá)式
    private String validPatten;

    // 用正則表達(dá)式表示的,表示單個(gè)分隔符的字符串，注意這個(gè)是表示合法單字字符的正則表達(dá)式
    private String separatorPattern;

    // 記號(hào)鏈表
    private List<Token> tokens;

    /**
     * 構(gòu)造函數(shù)
     * @param sourceString
     * @param validPatten
     * @param seperatorPattern
     * @throws Exception
     */
    public TokenMaker(String sourceString,String validPatten,String seperatorPattern) throws Exception{
        this.sourceString=sourceString;
        this.validPatten=validPatten;
        this.separatorPattern=seperatorPattern;

        findTokens(sourceString);
    }

    /**
     * 找到指定的標(biāo)記并放入鏈表中
     *
     * 說(shuō)明：
     * @param sourceString
     * @throws Exception
     */
    private void findTokens(String sourceString)throws Exception{
        tokens=new ArrayList<Token>();

        sourceString=sourceString.toString();
        final String End = "~";// 結(jié)束標(biāo)志,這個(gè)地方注意與有效文本差別化
        sourceString+=End;// 加上結(jié)束標(biāo)志

        // 單詞，用來(lái)累加字符
        String word = "";

        for (int i = 0; i < sourceString.length(); i++) {
            // 取得每個(gè)字符
            String str = String.valueOf(sourceString.charAt(i));

            if(End.equals(str)){
                // 將word放入鏈表
                addTokenToList(new Token(word,Token.Type_Content));
                break;
            }

            // 字符的驗(yàn)證
            if(isValid(str)==false){
                throw new Exception("在"+this.sourceString+"找到非法的字符'"+str+"',無(wú)法進(jìn)行求值.");
            }

            // 判斷是否空格
            if (StringUtils.isBlank(str)) {
                if (word.trim().length() < 1) {
                    // 碰到空格而word中沒(méi)有字符則從頭再來(lái)
                    continue;
                } else {
                    // 將word放入鏈表
                    addTokenToList(new Token(word,Token.Type_Content));

                    // 然后吧word置空后繼續(xù)累加
                    word = "";
                }
            }
            else if(isSeparator(str)){
                // 將word放入鏈表
                addTokenToList(new Token(word,Token.Type_Content));

                // 將符號(hào)放入鏈表
                addTokenToList(new Token(str,Token.Type_Separator));

                // 然后吧word置空后繼續(xù)累加
                word = "";
            }
            else {
                // 不是則繼續(xù)累加
                StringBuilder sb=new StringBuilder(word);
                sb.append(str);
                word=sb.toString();

                //word += str;
            }
        }
    }

    /**
     * 將標(biāo)記添加到標(biāo)記鏈表
     *
     * @param token
     */
    private void addTokenToList(Token token){
        if(token.getText().trim().length()>0){
            tokens.add(token);
        }
    }

    /**
     * 打印鏈表中的標(biāo)記
     *
     */
    public void printTokens(){
        System.out.println("\n將文字"+sourceString+"轉(zhuǎn)化后的標(biāo)記為:");
        System.out.println("序號(hào)\t內(nèi)容\t類(lèi)型");
        System.out.println("-------------------------");

        int index=1;

        for(Token token:tokens){
            System.out.println((index++)+"\t"+token.getText()+"\t"+token.getType());
        }
    }

    /**
     * 判斷是否有效字符
     *
     * 說(shuō)明：
     * @param str
     * @return
     */
    private boolean isValid(String str){
        Pattern p = Pattern.compile(validPatten,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(str);
        return m.find();
    }

    /**
     * 判斷是否分隔符
     *
     * 說(shuō)明：
     * @param str
     * @return
     */
    private boolean isSeparator(String str) {
        Pattern p = Pattern.compile(separatorPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(str);
        return m.find();
    }

    /**
     * 取得標(biāo)記鏈表
     *
     * 說(shuō)明：
     * @return
     * 創(chuàng)建時(shí)間：2010-6-27 上午12:46:47
     * 修改時(shí)間：2010-6-27 上午12:46:47
     */
    public List<Token> getTokens() {
        return tokens;
    }

    /**
     * 取得標(biāo)記的內(nèi)容鏈表
     *
     * 說(shuō)明：
     * @return
     * 創(chuàng)建時(shí)間：2010-6-27 上午08:59:34
     * 修改時(shí)間：2010-6-27 上午08:59:34
     */
    public List<String> getTokenConcents(){
        List<String> ls=new ArrayList<String>();

        for(Token token:tokens){
            ls.add(token.getText());
        }

        return ls;
    }

    /**
     * 測(cè)試
     *
     * 說(shuō)明：
     * @param args
     * @throws Exception
     * 創(chuàng)建時(shí)間：2010-6-27 上午09:00:02
     * 修改時(shí)間：2010-6-27 上午09:00:02
     */
    public static void main(String[] args)  throws Exception{
        new TokenMaker("96.2+8*5-12*(4-1)/2^(3%10)","[0-9\\.+-[*]/()\\^\\%]","[+-[*]/()\\^\\%]").printTokens();
        new TokenMaker("select A a,b, v from ta,tb where 1=1 and 2=2 order by a asc","[\\w=<>!\\s,]","[\\s,]").printTokens();
    }
}

3.對(duì)算式和Sql語(yǔ)句分解的結(jié)果:

將文字96.2+8*5-12*(4-1)/2^(3%10)轉(zhuǎn)化后的標(biāo)記為:
序號(hào)    內(nèi)容    類(lèi)型
-------------------------
1    96.2    Content
2    +    Seperator
3    8    Content
4    *    Seperator
5    5    Content
6    -    Seperator
7    12    Content
8    *    Seperator
9    (    Seperator
10    4    Content
11    -    Seperator
12    1    Content
13    )    Seperator
14    /    Seperator
15    2    Content
16    ^    Seperator
17    (    Seperator
18    3    Content
19    %    Seperator
20    10    Content
21    )    Seperator

將文字select A a,b, v from ta,tb where 1=1 and 2=2 order by a asc轉(zhuǎn)化后的標(biāo)記為:
序號(hào)    內(nèi)容    類(lèi)型
-------------------------
1    select    Content
2    A    Content
3    a    Content
4    ,    Seperator
5    b    Content
6    ,    Seperator
7    v    Content
8    from    Content
9    ta    Content
10    ,    Seperator
11    tb    Content
12    where    Content
13    1=1    Content
14    and    Content
15    2=2    Content
16    order    Content
17    by    Content
18    a    Content
19    asc    Content

posted on 2010-06-27 09:34 何楊閱讀(1410) 評(píng)論(1) 編輯收藏

Feedback

# re: 將字符串分解成標(biāo)記的類(lèi)[未登錄](méi) 2010-06-27 20:34 feenn

敢于嘗試很好，其實(shí)最通用的是詞法解析工具。可以看看JFlex 回復(fù) 更多評(píng)論

新用戶(hù)注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶(hù)登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問(wèn) 管理

當(dāng)柳上原的風(fēng)吹向天際的時(shí)候...

公告

常用鏈接

留言簿(3)

隨筆分類(lèi)

相冊(cè)

個(gè)人常用鏈接

最新隨筆

積分與排名

最新評(píng)論

閱讀排行榜

Feedback