<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    posts - 431,  comments - 344,  trackbacks - 0
    原文來(lái)自:http://chemhack.com/cn/2008/11/faster-fingerprint-search-with-java-cdk/

    Rich Apodaca wrote a great serious posts named Fast Substructure Search Using Open Source Tools providing details on substructure search with MySQL. But, however, poor binary data operation functions of MySQL limited the implementation of similar structure search which typically depends on the calculation of Tanimato coefficient. We are going to use Java & CDK to add this feature.

    As default output of CDK fingerprint, java.util.BitSet with Serializable interface is perfect data format of fingerprint data storage. Java itself provides several collections such as ArrayList, LinkedList, Vector class in package Java.util. To provide web access to the search engine, thread unsafe ArrayList and LinkedList have to be kicked out. How about Vector? Once all the fingerprint data is well prepared, the collection  function we need to do similarity search is just iteration. No add, no delete. So, a light weight array is enough.

    Most of the molecule information is stored in MySQL database, so we are going to map fingerprint to corresponding row in data table. Here is the MolDFData class, we use a long variable to store corresponding primary key in data table.

    public class MolDFData implements Serializable {
        private long id;
       private BitSet fingerprint;
        public MolDFData(long id, BitSet fingerprint) {
            this.id = id;
            this.fingerprint = fingerprint;
        }
        public long getId() {
            return id;
        }
        public void setId(long id) {
            this.id = id;
        }
        public BitSet getFingerprint() {
            return fingerprint;
        }
        public void setFingerprint(BitSet fingerprint) {
            this.fingerprint = fingerprint;
        }
    }

    This is how we storage our fingerprints.

    private MolFPData[] arrayData;

    No big deal with similarity search. Just calculate the Tanimoto coefficient, if it’s bigger than minimal  similarity you set, add this one into result.

        public List searchTanimoto(BitSet bt, float minSimlarity) {

            List resultList = new LinkedList();
            int i;
            for (i = 0; i < arrayData.length; i++) {
                MolDFData aListData = arrayData[i];
                try {
                    float coefficient = Tanimoto.calculate(aListData.getFingerprint(), bt);
                    if (coefficient > minSimlarity) {
                        resultList.add(new SearchResultData(aListData.getId(), coefficient));
                    }
                } catch (CDKException e) {
                }
                Collections.sort(resultList);
            }
            return resultList;
        }
    Pretty ugly code?  Maybe. But it really works, at a acceptable speed.

    Tests were done using the code blow on a macbook(Intel Core Due 1.83 GHz, 2G RAM).

    long t3 = System.currentTimeMillis();
    List<SearchResultData> listResult = se.searchTanimoto(bs, 0.8f);
    long t4 = System.currentTimeMillis();
    System.out.println("Thread: Search done in " + (t4 - t3) + " ms.");

    In my database of 87364 commercial compounds, it takes 335 ms.

    posted on 2009-10-18 14:09 周銳 閱讀(513) 評(píng)論(0)  編輯  收藏 所屬分類: ChemistryJavaCDK
    主站蜘蛛池模板: 亚洲国产成人九九综合| 亚洲精品tv久久久久久久久| 亚洲AV无码一区二区三区在线| 日韩电影免费在线观看| 日本红怡院亚洲红怡院最新| a级毛片毛片免费观看久潮喷| 亚洲日韩欧洲乱码AV夜夜摸| 国产免费福利体检区久久| 久久夜色精品国产亚洲av| 97在线视频免费公开视频| 亚洲av之男人的天堂网站| 免费人成在线观看视频高潮| 亚洲国产成人久久精品动漫| 69影院毛片免费观看视频在线| 亚洲精品在线视频观看| 最近最好的中文字幕2019免费 | 亚洲中文字幕久久无码| 免费高清av一区二区三区| 相泽南亚洲一区二区在线播放| 午夜国产羞羞视频免费网站| GOGOGO高清免费看韩国| 亚洲视频在线观看| 国产免费不卡v片在线观看| 亚洲日本天堂在线| 亚洲国产日韩在线观频| 久久久久免费精品国产| 亚洲一区二区影视| 免费观看亚洲人成网站| 两性色午夜视频免费播放| 亚洲综合久久综合激情久久| 搡女人免费视频大全| 国产免费牲交视频免费播放| 亚洲白色白色永久观看| 国产成人无码免费视频97| 中国在线观看免费的www| 亚洲成人福利在线观看| xvideos亚洲永久网址| 99xxoo视频在线永久免费观看| 亚洲精品无码你懂的| 亚洲成AV人片在线播放无码| 成人黄软件网18免费下载成人黄18免费视频 |