|
2008年3月28日
during analysis of "IO Error: Connection reset", many articles mentioned that it could be caused by java security code (accessing /dev/random) used in JDBC connection. However it is not the root cause in my case.
In my environment, Java already use /dev/urandom.
1. $JAVA_HOME/jre/lib/security/java.security
securerandom.source=file:/dev/./urandom
2. check with strace.
only -Djava.security.egd=file:/dev/../dev/urandom will trigger system call (read on /dev/urandom)
all other other path format like below are OK.
-Djava.security.egd=file:/dev/./urandom
-Djava.security.egd=file:///dev/urandom
3. Keep checking the retropy size, I have never seen it is exhaused.
while [ 1 ];
do
cat /proc/sys/kernel/random/entropy_avail
sleep 1
done
usually the avail is in the range from 1000 to 3000.
so far, there is no clue about the root cause of "IO Error: Connection reset".
I encountered many issue during installation of Oracle Grid Infrastructure(GI) and Database;
with the help of ariticle and documents found through Google search engine,
I finally made it. for records, here is the details issues encountered and solutions applied.
Major issues were encountered during GI installation.
Pre-installation tasks.
Issue 1: swapspace is not big enough; (1.3.1 Verify System Requirements)
grep MemTotal /proc/meminfo
264G
grep SwapTotal /proc/meminfo
2G
during OS installation, I take default option and swap space is only 2G.
Oracle recommend to have more than 16G swap space in case of more that 32G RAM.
dd if=/dev/zero of=/home/swapfile bs=1024 count=33554432
33554432+0 records in
33554432+0 records out
34359738368 bytes (34 GB) copied
mkswap /home/swapfile
mkswap /home/swapfile
chmod 0600 /home/swapfile
lessons learned: setup swap space properly according to DB requirement when installing OS.
Issue 2: cannot find oracleasm-kmp-default from Oracle site.
(1.3.6 Prepare Storage for Oracle Automatic Storage Management)
install oracleasmlib and oracleasm-support is easy, just download them from Oracle and install them;
Originally oracleasm kernel is provided by Oracle, but now I cannot find it from Oracle; finally I
realized that oracleasm kernel is now provided by OS vendor;
In my case, it should be installed from SUSE disk;
a. to get its name oracleasm-kmp-default
zypper se oracle
b. map dvd and install
zypper in oracleasm-kmp-default
rpm -qa|grep oracleasm
oracleasm-kmp-default-2.0.8_k3.12.49_11-3.20.x86_64
oracleasm-support-2.1.8-1.SLE12.x86_64
oracleasmlib-2.0.12-1.SLE12.x86_64
asm configure -i
asm createdisk DATA /dev/<...>
asm listdisks
--DATA
ls /dev/oracleasm/disks
Installation tasks:
Issue 3: always failed due to user equivalence check after starting installer OUI with user oracle.
however if I manully check with runcluvfy, no issue found at all.
./runcluvfy.sh stage -pre crsinst -n , -verbose
I worked around it by using another user to replace user oracle. but it triggered next issue.
Issue 4: cannot see ASM disks in OUI. no matter how I change the disk dicovery path. the disk list is empty.
but I can find disk manully.
/usr/sbin/oracleasm-discover 'ORCL:*'
Discovered disk: ORCL:DATA
Root cause is that the ASM is configured and created with user oracle. and I aming installing GI
with different user other than oracle; so I cannot see the Disk created.
change owner of disk device file solved the issue.
ls /dev/oracleasm/disks
chown /dev/oracleasm/disks -R
Issue 5: root.sh execution failed.
Failed to create keys in the OLR, rc = 127, Message:
clscfg.bin: error while loading shared libraries: libcap.so.1:
cannot open shared object file: No such file or directory
fixed the issue with command below:
zypper in libcap1
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2016-07-24 23:10:28.502:
[client(1119)]CRS-2101:The OLR was formatted using version 3.
I found a good document from SUSE,
Oracle RAC 11.2.0.4.0 on SUSE Linux Enterprise Server 12 - x86_64,
it make it clear that SUSE 12 is supported by Oracle GI 11.2.0.4, it also mentioned
Patch 18370031.
"During the Oracle Grid Infrastructure installation,
you must apply patch 18370031 before configuring the software that is installed. "
The patch 18370031 is actually mentioned in "Oracle quick installation guide on Linux",
but not mentioned in "Oracle quick installation guide on Linux". I majored followed up
with later one and missed Patch 18370031.
issue disappeared after I installed the patch 18370031.
./OPatch/opatch napply -oh -local /18370031
Errors in file :
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
solved by change owner of disk DATA related file
ls -l /dev/oracleasm/iid
chown on folder /dev/oracleasm/iid and some .* hidden file.
Issue during DB installation
Issue 6: report error: in invoking target 'agent nmhs'
vi $ORACLE_HOME/sysman/lib/ins_emagent.mk
Search for the line
$(MK_EMAGENT_NMECTL)
Change it to:
$(MK_EMAGENT_NMECTL) -lnnz11
refer to
https://community.oracle.com/thread/1093616?tstart=0
很多年前裝了Ubuntu和Windows的雙系統,最近因為有了專門的電腦來裝Ubuntu,所以把原先電腦上的Ubuntu卸載了,結果系統不能引導了。因為GRUB的原理是控制權從MBR到Ubuntu系統盤,然后Ubuntu系統盤再提供對windows的引導。現在Ubuntu系統被卸載了。這個啟動的鏈條也就斷了。
這個問題本身不難解決,借了一個Windows安裝盤,恢復一下MBR即可。但是這個需要windows系統的Administrator密碼。而我的系統不是我裝的,我根本不知道這個密碼。
有的帖子提到破解Administrator密碼,試了一下,覺得太麻煩了。因為電腦上有數據,也不能重裝。
最后的解決方案是在原先的Ubuntu分區上安裝一個新的Windows。這樣變成了windows的雙系統。安裝完重啟之后可以進入任何一個系統(新的或者舊的Windows)。安裝的過程中MBR被自動更新了。再下來就改一下老系統的Administrator密碼,刪除掉多于的Windows新系統即可。
用Gmail的時候不小心點了"存檔"按鈕,一封重要的郵件就此消失了好幾天,今天才機緣巧合找到。
在網絡上查到的解釋是:
存檔會將郵件從收件箱移至所有郵件,這樣您不必刪除郵件就可以整理收件箱。
難以理解,坦率地說,這個功能對我來說是徒增煩惱??磥砣魏喂ぞ叨夹枰闳ミm應,磨合。
最近買了一個叫做“華容道”的玩具給兒子晚。這個游戲雖然號稱是中國四大古典智力游戲之一。其實不過百年歷史,而且是從國外引進的。不過本地化做得非常好,也算是創造性地吸收國外文明。
手工解決這個游戲有點難度,當然已經有人給出了解法;不過我還是自己用編程的方式解決了一遍。發現自己在這方面的編程還是比較弱。大部分時間花在了調試上。
剛開始是用的深度優先搜索。大致知道了答案應該長什么樣。后來改進為廣度優先搜索,得到了最優的解法。還有一個就是原先只考慮每次最多移動一格。后來發現傳統的定義是一個塊的所有連續移動都算作一步。相應地修改了實現算法。
最難的是做界面。為了調試,隨便寫了個Applet。但是給我兒子玩,就覺得拿不出手了。
Just use this blog to share some meta information. git://github.com/ueddieu/mmix.git http://github.com/ueddieu/mmix.git
After two weeks' struggle, I have successfully installed Gentoo, a popular GNU/Linux Distribution. For Records, the obstacles I encountered are listed below.
(but I can not remember the solution exactly)
0. failed to emerge gpm when I install the links package.
If I recall correctly, it is resolved by install gpm manually 1. I encounter issue when I install glib 2.22.5.
no update-desktop-database.
which is in dev-util/desktop-file-utils. When I try to emerge it, there is a circular dependency on glib. no solution
and I forget How I resolve the problem.
2. later after I install glib, with ~amd64 keyword I can install gpm-1.20.6, but it conflicts with the manually inatalled gpm.
I remove the conflicted file and emerge successfully.
3. Failed to emerge tiff.
edit packages.keywords to add the following.
/ ~amd64
I am able to use latest tiff in beta-version, which is unstable and masked out.
4. later atk-1.28.0 failed to emerge.
edit /etc/make.conf with the following.
FEATURES="-stricter".
then emerge successfully with only some complain. with out this seting. the warining from GCC will cause that emerge fail.
5. when I run
emerge --update system
actually gcc will be upgraded from 4.3.4 to 4.4.3. but it failed because of compilation warning, again. add "-stricter" into Features variable in /etc/make.conf work around it.
6. The installation takes a long time, the KDE itself take more than 10 hours. There is still a lot of improvement space! Anyway, it is nice to be able to use it daily.
在C:\Documents and Settings\<user_name>\Application Data\Subversion\servers文件中加入
all=*.*
[all]
http-proxy-host = ***.**.com
http-proxy-port = 8080
這里的all映射到所有的Server。
網絡環境的復雜給我們的工作帶來了一些影響。就拿Proxy的設置來說,本來理想的情況是在全局做一個設置就可以了,但是事實上我們要為每個程序做設置,而且語法還不一樣。
今天從word文檔中拷貝腳本到命令行執行。沒有想到的是Word自動加入了空格,導致執行失敗。具體如下。
call ttGridCreate('$TT_GRID');
被word變成了
call ttGridCreate(' $TT_GRID');
這個空格可不容易被發現,尤其你不是腳本作者的時候。提高警惕!
今天做了一個簡單的性能測試。比較訪問Java對象屬性的各種方法的性能差異。
1. 直接訪問對象的屬性。
2. 用方法訪問對象的屬性。
3. 用Map來存儲和訪問。
4. 反射-Field 訪問。
5. 反射-Method訪問。
重復100次,結果如下(單位為納秒)。
* 100 field access, 14,806<br/>
* 100 method access, 20,393<br/>
* 100 map access, 66,489<br/>
* 100 reflection field access, 620,190<br/>
* 100 reflection method access, 1,832,356<br/>
重復100000次,結果如下(單位為納秒)。
*100000 field access, 2,938,362
*100000 method access, 3,039,772
*100000 map access, 10,784,052
*100000 reflection field access, 144,489,034
*100000 reflection method access, 37,525,719 <br/>
由結果可見:
1。getter/setter 的性能已經接近直接屬性訪問(大約慢50%),沒有必要擔心getter/setter的性能而采用直接屬性訪問。
2。用Map代替POJO的代價大約是比getter/setter慢三倍。
3。反射訪問比getter/setter慢50到150倍。慎用。追求動態性的時候也要注意不菲的性能代價。
4。注意重復次數增加到100000次,方法訪問和屬性訪問的差距縮??;更有意思的是,反射的Method訪問比Field訪問快四倍。這主要是JIT的作用。
該測試結果和原先的猜想基本符合。但是性能評估很容易得到片面的結論,如果有錯誤的地方,請大家不吝指正。謝謝。
0. I am reading the source code of Tomcat 6.0.26. To pay off the effort,
I documents some notes for record. Thanks for the articles about Tomcat
source code, especially the book <<How Tomcat works>>.
1. They are two concepts about server, one is called Server, which
is for managing the Tomcat (start and stop); another is called Connector,
which is the server to serve the application request. they are on the different
ports. The server.xml clearly show the difference.
<Server port="8005" shutdown="SHUTDOWN">
<Service name="Catalina">
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />
although the server is the top level element, logically it should not be.
Actually in code, Bootstrap starts the service first, which
in turn start the Server and server's services.
2. My focus in on Connector part. I care how the request is services by the
Tomcat. Here are some key classes.
Connector --> ProtocolHandler (HttpProtocol
and AjpProtocol) --> JIoEndPoint
--> Handler(Http11ConnectionHandler
and AjpConnectionHandler)
3. Connector is most obervious class, but the entry point is not here.
The sequence is like this.
Connector.Acceptor.run()
--> JioEndPoint.processSocke(Socket socket)
-->SockeProcess.run()
-->Http11ConnectorHandler.process(Socket socket)
-->Http11Processor.process(Socket socket)
-->CoyoteAdapter.service(Request req, Response res)
The core logic is in method Http11Processor.process(Socket socket)
CoyoteAdapter.service(Request req, Response res) bridges between Connector module and Container module.
Any comments are welcome. I may continue the source code reading and dig deeper into it if time permit.
It is handy to be able to navigate the source code with Ctrl + ] in Cscope, but I always forget how to navigate back and waste effort many times. So for record, Ctrl+t can navigate back in Cscope.
One more time, Ctrl+] and Ctrl+t can navigate forth and back in Cscope.
How to read the source code in <<TCP/IP Illustrated Volume 2>>
1. Get the source code, original link provided in the book is not available now.
You may need to google it.
2. install cscope and vi.
3. refer to http://cscope.sourceforge.net/large_projects.html for the following steps.
It will include all the source code of the whole OS, not only the kernel.
find src -name '*.[ch]' > cscope.files
we actually only care kernel source.
find src/sys -name '*.[ch]' > cscope.files
4. wc cscope.files
1613 1613 45585 cscope.files
5. vim
:help cscope
then you can read the help details.
6. if you run vim in the folder where cscope.out resides. then it will be loaded
automaically.
7. Try a few commands.
:cs find g mbuf
:cs find f vm.h
They works. A good start.
P.S. this book is quite old, if you know it well and can recommend some better alternative for learning TCP/IP, please post a comments, Thanks in advance.
我兒子在彈鋼琴,他阿姨說,“哥哥就喜歡彈難的。”
我外甥女說:“哥哥要彈男的。妹妹要彈女的。”
在高中的時候,知道了用篩法可以得到素數。當時我還有一個錯誤的關于尋找素數的猜測。
以為用兩個素數相乘,其附近存在素數的幾率很高。比如, 7×11 = 77, 其附近有79,正好是素數。
當時已經發現11×11=121。7×17=119;但是錯誤的理解為只有其中一個是平方或次冪時才成立。
后來有了計算機,編程驗證了一下,發現有很多的反例。對當初的錯誤猜測羞赧不已。
這個猜測雖然錯的離譜,但是和現在的素數理論,尤其是孿生素數還是很有關系的?,F在已經知道,
素數有無窮多個,但是素數在自然數中所占的比例逐漸趨近于零。
因此孿生素數在自然數中的比例也是趨近于零的?,F在還沒有證明孿生素數是否有無窮多個。
這個猜測的樸素之處在于,任何兩個素數之乘積A,要么A是3n+2,要么A是3n+1;如果是3n+2,則只有A+2
才有可能是素數;如果是3n+1,則只有A-2才有可能是素數。但是,事實上,這個猜測成立的比例非常的低。
寫了一個程序驗證了一下。16位的整數中,大概只有 10% 能使假設成立。
由于是在Proxy的網絡環境,MSYSGIT 的 git clone 總是失敗。需要配置如下環境變量。
export http_proxy="http://<proxy domain name>:<port>"
之后http協議git clone沒有任何問題。但是用git 協議仍舊有問題。
之后發現git push 和 git pull 經常不能work。多次嘗試后發現用更全的命令行參數可以解決問題。
過程如下。
git pull --fail
git pull origin --fail
git pull git@github.com:ueddieu/mmix.git --it works.
It seems the Command line short cuts are lack of some user information, such as user name "git".
(which is kind of strange at the first glance.)
git push --fail
git push origin --fail
git push git@github.com:ueddieu/mmix.git master --it works.
Anyway, now I can check in code smoothly. :)
There are a few cases in which the un-visible blank character will cause
problem, but it is hard to detect since they are not visible.
One famous case is the '\t' character used by Make file, it is used to mark
the start of a command. If it is replace by blank space character, it does
not work, but you can not see the difference if you only look at the make file.
This kind of problem may get the newbies crazy.
Last week, I have encounter a similar issue, which is also caused by unnecessary
blank space.
As you may know, '\' is used as line-continuation when you have a very long line, e.g.
when you configure the class path for Java in a property file, you may have something like this.
classpath=/lib/A.jar;/lib/B.jar;\
/lib/C.jar;/lib/D.jar;\
/lib/E.jar;/lib/log4j.jar;\
/lib/F.jar;/lib/httpclient.jar;
But if you add extra blank space after the '\', then you can not get the complete
content of classpath. Because only when '\' is followed by a '\n' on Unix or '\r''\n'
on Windows, it will work as line-continuation ; otherwise, e.g '\' is followed by
' ''\n', the line is complete after the '\n', the content after that will be the start of
a new line.
Fortunately, it is easy to check this kind of extra blank space by using vi in Unix.
use command '$' to go to the end of line, if there is no extra blank space after '\',
the current position should be '\', if there are any blank space after '\', the current position
is after the '\'.
媽媽和兒子
媽媽,你最近不吃魚,變笨了吧――2009.6.2
兒子又要求錄音了。我們按著臺詞在dialog。
“……”兒子
“……,I like mangoes”媽媽
“媽媽,我昨天剛教會你,又忘了?是I like watermelon。”
“哦,媽媽現在記性不好了!”
“是你這幾天不吃魚了吧,變笨了吧。明天多吃點!”
媽媽你穿這衣服蠻可愛――2009-6-9晚
兒子挑選的故事講完了。“好,ok,我們睡覺吧!”我說。
“唉,媽媽,還沒錄音呢,我去拿mp3。”自從第一次提議給他錄音,兒子每天都要求我能做到。
……
“……,媽媽,你蠻可愛的!……”錄音正起勁,兒子突然插了一句題外話。
“什么?”我沒聽清。
“你穿這衣服蠻可愛的!”兒子賊賊的笑著又重復了一遍。“因為你的衣服象小斑馬呀!”我終于明白。
我安裝openldap時主要是參考了http://hexstar.javaeye.com/blog/271912
我遇到的一個新問題是執行ldapsearch報錯如下:
can not find libdb-4.7.so.
我的解決辦法是,建立符號鏈接/usr/lib/libdb-4.7.so, 后者指向/usr/local/BerkeleyDB/lib/libdb-4.7.so
之后沒有遇到其他問題。
凡事都有其內在原因,即使是表面上毫無道理的行為,也有其內在的原因。今天再次認識到這個
道理,還是因為今天早上和我兒子的一段插曲。
今天早上,我兒子不肯起床,在床上哭鬧,不讓他媽媽去上班,要他媽媽陪他睡覺。
他媽媽要趕班車,沒時間陪她,留我在家里。我陪他睡了一會,聊了十分鐘,才知道他是有原因
的。
昨天晚上,我和她媽媽都很累,我就說今天我們早點睡覺,和兒子一起睡好了??墒且驗閯倓?/span>
回上海,有很多事情要做,最后還是忙到十點半才睡。兒子就說爸爸說謊了。當然他可能還有
其它的原因,比如想我們每天和他一起睡覺。
The analysis of MOR(MXOR) instruction implementation in MMIXWare
-- A stupid way to understand the source code.
the implementation of MOR(MXOR) is in file: mmix-arith.w
436 octa bool_mult(y,z,xor)
437 octa y,z; /* the operands */
438 bool xor; /* do we do xor instead of or? */
439 {
440 octa o,x;
441 register tetra a,b,c;
442 register int k;
443 for (k=0,o=y,x=zero_octa;o.h||o.l;k++,o=shift_right(o,8,1))
444 if (o.l&0xff) {
445 a=((z.h>>k)&0x01010101)*0xff;
446 b=((z.l>>k)&0x01010101)*0xff;
447 c=(o.l&0xff)*0x01010101;
448 if (xor) x.h^=a&c, x.l^=b&c;
449 else x.h|=a&c, x.l|=b&c;
450 }
451 return x;
452 }
It takes me several hours to understand the details.
If we treat each octabyte as a matrix, each row corresponds to a byte, then
y MOR z = z (matrix_mulitiply) y
For a=((z.h>>k)&0x01010101)*0xff;
(z.h>>k)&0x01010101 will get the four last bit in (z.h>>k). depends on the bit in last row,
((z.h>>k)&0x01010101)*0xff will expand the bit (either 0 or 1) into the whole row.
e.g.
ff
* 0x01010101
---------------
= ff
ff
ff
ff
----------------
= ffffffff
(depending on the last bit in each row of z, the result could be #ff00ff00. #ff0000ff, etc.)
similarily, b=((z.l>>k)&0x01010101)*0xff; will expand the last bit in each byte into the
whole byte.
over all, after these two step, the z becomes the replication of it's last row, since k vary
from 0 to 7, it will loop on all the rows actually.
For c=(o.l&0xff)*0x01010101, it will get the last byte in o.l and populate it to other three byte.
since it will not only or/xor h but also l. it is not necessary populate it to o.h.
one example,
let (z.h>>k)&0x01010101 = 0x01000101, then a= 0xff00ffff;
let (z.l>>k)&0x01010101 = 0x01010001, then b= 0xffff00ff;
let (o.l&0xff)=0xuv, then c= 0xuvuvuvuv;
then a&c=0xuv00uvuv;
b&c=0xuvuv00uv;
consider the elements [i,j] in result x. in this round, what value was accumalated in by operation
or(xor).
it is the jth bit in last byte of o.l & ith bit in last column of z.(do not consider looping now.)
in this round, the 64 combination of i and j, contirbute the value to the 64 bits in z.
Noticed that o loop on y from last byte to first byte. There are 8 loop/rounds, in another round.
say kth round.
the elements[i,j] will accumuate the jth bit in last (k + 1)th row & the jth bit in last (k+1)th
column.
that means the jth column in y multiply the ith row in z. it conform to the definiton for
z matrix_multiply y.
游戲和數學有密切的聯系。最近在玩九連環,感受更深。
之所以開始玩九連環,是因為在高德納的書中提到了格雷碼和九連環的關系。為了理解生成格雷碼的算法,特意買了九連環來玩。畢竟書上的
描述沒有實際玩起來那么容易理解。
通過這個游戲,我不僅會解九連環了,而且掌握的生成格雷碼的一種算法。
A detailed reading process of a piece of beautiful and trick bitwise operation code.
The following code is from MMIXWare, it is used to implement the Wyde difference between two octabyte.
in file: "mmix-arith.w"
423 tetra wyde_diff(y,z)
424 tetra y,z;
425 {
426 register tetra a=((y>>16)-(z>>16))&0x10000;
427 register tetra b=((y&0xffff)-(z&0xffff))&0x10000;
428 return y-(z^((y^z)&(b-a-(b>>16))));
429 }
It is hard to understand it without any thinking or verification, here is the process I used
to check the correctness of this algorithm.
let y = 0xuuuuvvvv;
z = 0xccccdddd; (please note the [c]s may be different hex number.)
then y>>16 = 0x0000uuuu;
z>>16 = 0x0000cccc;
then ((y>>16)-(z>>16)) = 0x1111gggg if #uuuu < #cccc or
((y>>16)-(z>>16)) = 0x0000gggg if #uuuu >= #cccc
so variable a = 0x00010000 if #uuuu < #cccc or
variable a = 0x00000000 if #uuuu >= #cccc
similarly, we can get
variable b = 0x00010000 if #vvvv < #dddd or
variable b = 0x00000000 if #vvvv >= #dddd
for (b-a-(b>>16)))), there are four different result depending on the relation between a and b.
when #uuuu >= #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x00000000;
when #uuuu >= #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x00001111;
when #uuuu < #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x11110000;
when #uuuu < #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x11111111;
You can see that >= map to #0000 and < map to #1111
for y-(z^((y^z)&(b-a-(b>>16)))), when (b-a-(b>>16)))) is 0x00000000, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 0) = z^0=z, so y-(z^((y^z)&(b-a-(b>>16))))=y-z.
similarily, when (b-a-(b>>16)))) is 0x11111111, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 1) = z^(y^z)=y, so y-(z^((y^z)&(b-a-(b>>16))))=0.
when (b-a-(b>>16)))) is 0x11110000 or 0x11110000, we can treat the y and z as two separate wydes.
each wyde in the result is correct.
You may think it is a little stupid to verify such kind of details. but for my point of view,
without such detailed analysis, I can not understand the algorithm in the code. with the hard
work like this, I successfully understand it. The pleasure deserve the effort.
I am wondering how can the author discover such a genius algorithm.
昨天晚上,瑞瑞睡到十二點的時候,爬出被子,躺在外面。結果咳嗽得很厲害,吐了好幾口。吃了枇杷膏很快就睡著了。
沒過一會,他就開始做惡夢了。下面是他的夢話:
“爸爸的手不見了”
“爸爸掉下去了”
“我親愛的爸爸不見了,我可怎么辦阿”
估計是最近洪恩的GOGO學英語的DVD看多了。
After reading the <<MMIX: A RISC Computer for the New Millennium>>, I am ispired to Create a MMIX simulator in Java.
Donald Knuth already created a high quality MMIX simulater in C, why I still bother to creating a new one in Java.
First, I want to learn more about how the computer works. I think re-implement a simulator for MMIX can
help me gain a better understanding.
Second, I want to exercise my Java skills.
After about one month's work, I realize that I can not finish it by myself. I am looking for the help.
If you are interested in MMIX and know Java, Please give me a hand.
Currently I have finished most of the instructions, but some important and complex one are not completed
yet.
I have developed a few JUnit TestCase for some instructions, but it's way far from covering all the instructions (there are 256 instructions total).
Few of the sample MMIX program in Donald Knuth's MMIXware package, such as cp.mmo, hello.mmo can be
simulated successfully, but there are much more to support.
To help on this project, first you need the access to the current source code. It's hosted on Google
code. Please follow the steps below to access the source code.
Use this command to anonymously check out the latest project source code:
# Non-members may check out a read-only working copy anonymously over HTTP.
svn checkout http://mmix.googlecode.com/svn/trunk/ mmix-read-only
If you are willint to help, please comment on this blog with your email address.
There are
many questions coming into my mind when I read the Linux kernel book and source
code. As time goes by, I become more knowledgeable than before and can address
those questions by myself, here is the first question addressed by myself.
Q: why
kernel have to map the high memory in kernel space, why not just allocate the
high memory and only map it in user process.
A: Because
kernel also need to access the high memory before it returned the allocated
memory to user process. For example, kernel must zero the page or initialized
the page for security reason. Please refer to linux device driver page 9.
Q: why not
let the clib zero the page or initialize it, it saves the kernel's effort and simplifies
the kernel.
A: besides
Requesting memory through clib, user program can also request memory through
direct System call, in this situation, the security is not guaranteed, the
information in memory will be leaked.
9/26/2008 8:57AM
Today I want to research the different ways to substitute text in the file. For records, I written them down.
1. use Ultra Edit, it is super easy for a Windows user if you have Ultra Edit installed.
use Ctrl + R to get Replacement Wizard and follow you intuition.
2. use VI in Unix.
:s/xx/yy/
will replace xx with yy.
3. use filter, such as sed, and awk in Unix.
sed -e 's/xx/yy/g' file.in > file.out
replace xx with yy in all the lines. It seem sed will not change the original input file, so I redirect the out put to file.out
1 WYSIWYM vs WYSIWYG
WYSIWYM stands for What You See is What You Mean; WYSIWYG stands for What You See is What You Get;
Microsoft -- Word is always considered as a example of WYSIWYG. Today I have a look at the tool named LyX, which is an example of WYSIWYM. From an end user's point of view, there are more similarity than difference between them.
They both display the the resulted layout on the fly; they both provide button to typeset the document.
The difference I can see between then is -- LyX use text file, while Word use binary file. But I don't think it matters.
In my humble opinion, the real difference between Word and LyX/LaTeX is as the following. In Word, you typeset in the lower level, you can control all the details but it also need more effort. In LyX/LaTex, you typeset in higher level, you only need to figure out the logic structure of the document. The resulted layout is not decided by you, you actually just share the layout developed by the expert. I think it is the key advantage of WYSIWYM.
Yesterday, we found that the application can not send mail successfully; the performance of the module using email feature is also very bad. I suspect it caused by that the mail server host name can not be resolved in the application server.
I executed the following command
host <mail server host name>
It shows a strange IP. It means it can not properly resolve the mail server host name
Then I execute the command below.
man host
The output tells me to resort to /etc/resolv.conf
open it with
vi /etc/resolv.conf
The context is as following:
nameserver <name server 1>
nameserver <name server 2>
update the config with correct DNS server IP.
Everything is OK.
P.S. It seems that the ping and host commands are different. For some host name, I can ping it but I can not host it.
The reality is far from the idealism - the inelegance in Operating System
I am interested in Operating System, after I know more and more concepts, know more and more details, I realize that the reality is far from the idealism. The root cause is the history and to some extent, it is the back compatibility. we can not afford to make a brand new thing from scratch, we need to include many old things in any things.
Let me give some example about how the history make the current Operation System become complicated and inelegant.
1. DMA
DMA stands for Direct Memory Access, which is a way to improve the parallelism in computer system. Basically, with DMA, peripheral device can access main Memory simultaneously when CPU is running. but for historical reason, in X86 platform, some DMA device only have 24 bit address line. which limit the memory scope to 16M. since X86 platform is also lack of IO-MMU to remap the address, the memory can be used in DMA is [0,16M). It definitely complicated the memory management.
2. High Memory
Since Linux kernel has only 1G linear address space, it can not address all the 4G physical memory in 32 bit machine. This is actual a design issue in Linux for historical reason. it does not predict that some day, the physical memory will become so large. Later in order to support more than 1 G physical memory, CONFIG_HIGHMEM compile option was added. There are also other way to fix this problem, such as 4G kernel space v.s. 4G user space.
3. PAE
PAE stands for Physical Memory Extension, PAE make it possible to support up to 64G physical memory. but to me, it is just a temporary solution, does not deserve the effort. I even do not want to have a look on the corresponding document. It does not make too much sense. I prefer to directly move to 64 bit platform. 64 bit platform has its own problems though.
the above is just some inelegant in hardware. majorly cause by historical reason. I am wondering how can we keep up the quick development under the burden of history. maybe at some point, we finally need to throw away the history and move on with a brand new start.
Virtual Memory Area
Virtual Memory Area is also called Memory Region in some book.
In the process address spaces, there are many memory areas, contiguous addresses wil be divided into different memory area if the access right of them are different. For example, in one Java Process, there are 359 memory areas.
so the kernel need to find a effective way to insert into, remove from, search from the list of memory areas. The semantics of find_area API is the as the following.
return null if
1. The list itself is empty.
2. The list is not empty, and the address is big than the last memory area.
return found area if
1. the address is in the region of one area.
2. the address is not in the region of any area. but is not bigger than the last area.
it means it is in the hole between areas. right area besides the hole is returned.
The kernel are trying to use as little resource as possible. Here is an example, Originally, in kernerl 2.4, the size of Kernel Stack is 8K. Now, in kernel 2.6, it could be 4K, if you enable it in compilaiton time.
Why will kernel spend effort to support such a feature when most of PC have more than 1 Gigabyte memonry. I think it has something to do with the C10K probleum; C10K means Concurrent 10 Thousand Processes(Threads). considering a system with more thant 10 thousand processes, such as a WEB server, the save of 4K in every kernel stack will become 4K * 10 K = 40 M tatal save of memory, which is a big deal!
How is it possible to achieve that? originally the kernel mode stack is also used in Exception and Interrupt handling, but Exception and Interrupt handling is not specific to any process. so in 2.6, Interrupt and Exception will have their own Stack for each CPU. Kernel stack is only used by process in the kernel mode. so the acutal kernel stack did not become small.
2.4 8K Stack shared between process kernel mode, Exception, Interrupt.
v.s
2.6 4K Stack specific for process kernel mode Stack
4K Stack specific for Exception Stack
4K Stack specific for Interrupt Stack
Besides this, in 8K stack of 2.4, task_struct is at the bottom of stack, which may cost about 1K, in 4K stack of 2.6, only thread_info is at the bottom of stack, the task_struct is put into a per-CPU data structre, thread_info is only about 50 bytes.
Here is just the high level summary of my understanding on Linux Kernel Memory Management. I think it can help achieve a better understanding of the book <<understanding linux kernel>>.
It is said, the memory management is most complex sub-system in linux kernel, at the same time, there aren't too much System Calls for it. Becuase most the the complex mechanism happens trasparently to the user process, such as COW(Copy On Write), On Demand Paging. For user process, to successfully refer to a linear memory address, the following factors are necessary:
vm_area_struct (Virtual Memory Area, Memory Region) are set up correctly.
Phsical memory are allocated.
Page Global Directory, Page Table, and the corresponding entry are correclty set up according to Virtual Memory Area and Phisical Meory.
This three factors can be further simplified as
Virtual Memory
Phisical Memory
Mappting between Virtual Momory and Phisical Memory.
From user process's perspective, only Virtual Memory is visible, when user process applys for memory, he got virtual memory; phisical memory may not be allocated yet. All these three factors are managed by the kernel, they can be thought of as three resource managed by the kernel. kernel not only need to manage the Virtual Memoty in user address space, but also need to manage Virtual Memory in kernel address space.
When user process try to use his virtual memory, but the phisical memory is not allocated yet. Page Exception happens, kernel take charge of it and allocate the phisical memory and set up the mapping. user process reexecute the instruction and everything go forward smoothly. It's called On Demand Paging.
Besides that there are many more concepts, such as Memory mapping, non-linear memory mapping. I will continue this article when I dig into the details.
ps -H -A
can show the relationship between all the processes in a tree format. it is helpful when you want to research the internals of UNIX.
init
keventd
ksoftirqd/0
bdflush
kswapd
we can see from the above that all the process are the children of init (directly or indirectly). especially the kernel thread are also the children of init process.
process 0 is special, it is not displayed.
From the following:
sshd
sshd
sshd
bash
vim
cscope
sshd
sshd
bash
ps
we can see that how ssh works. actually I have created two ssh session to the server.
根據以下Xusage.txt中的說明:
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size
Java -Xms512M 應該為Java分配至少512M的內存,但是在Linux中用TOP查看,其RSS和SIZE的值遠小于512M。我的理解是Java向操作系統申請內存時,用的是mmap2或者old_mmap系統調用,這兩個系統調用其實都沒有真正分配物理內存,而僅僅是分配了虛擬內存。所以預先分配的這些內存要到實際使用時才能落實到位。
There are not too much grammar. here is just the incomplete summary for the future reference.
meta-character
. any character
| or
() grouping
[] character class
[^] negative character class
Greedy Quantifier
? optional
* any amount
+ at least one
lazy quantifier
??
*?
+?
possessing quantifier
?+
*+
++
position related
^ start ot the line
\A
$ end of the line
\Z
\< start of the word
\> end of the word
\b start or end of the word
non-capturing group (?:Expression)
non-capturing atomic group (?>Expression)
positive lookahead (?=Expression)
negative lookahead (?!Expression)
positive lookbehind (?<=Expression)
negative lookbehind (?<!Expression)
\Q start quoting
\E end quoting
mode modifier
(?modifier)Expression(?-modifier)
valid modifier
i case insensitive match mode
x free spacing
s dot matches all match mode
m enhanced line-anchor match mode
(?modifier:Expression)
comments:
(?#Comments)
kernel memory mapping summay
Today, finally I become clear about the relationship between
fixed mapping
permanent kernel mapping
temporary kernel mapping
noncontiguous memory area mapping
(I feel that most of the name is not appropriate, to some text, it will mislead the reader.)
4G linear virtual address space is divided into two major part.
kernel space mapping [3G, 4G)
user space mapping [0, 3G)
kernel space mapping is divided into more pieces
linear mapping [3G, 3G + 896M)
non linear mapping [3G + 896M + 8M, 4G)
1. Fixed Mapping (wrong name, should be compile time mapping, the virtual address is decided in compile time. )
2. Temporary mapping
3. Permanent mapping
4. noncontiguous memory area mapping (Vmalloc area)
The following is the diagram for the reference.
FIXADDR_TOP (=0xfffff000)
fixed_addresses (temporary kernel mapping is part of it)
#define __FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
FIXADDR_START (FIXADDR_TOP - __FIXADDR_SIZE)
temp fixed addresses (used in boot time)
#define __FIXADDR_BOOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT)
FIXADDR_BOOT_START (FIXADDR_TOP - __FIXADDR_BOOT_SIZE)
Persistent kmap area (4M)
PKMAP_BASE ( (FIXADDR_BOOT_START - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK )
2*PAGE_SIZE
VMALLOC_END (PKMAP_BASE-2*PAGE_SIZE) or (FIXADDR_START-2*PAGE_SIZE)
noncontiguous memory area mapping (Vmalloc area)
VMALLOC_START (((unsigned long) high_memory + 2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))
high_memory MIN (896M, phisical memory size)
below the excerp of the source code.
#ifdef CONFIG_X86_PAE
#define LAST_PKMAP 512
#else
#define LAST_PKMAP 1024
#endif
#define VMALLOC_OFFSET (8*1024*1024)
#define VMALLOC_START (((unsigned long) high_memory + \
2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))
#ifdef CONFIG_HIGHMEM
# define VMALLOC_END (PKMAP_BASE-2*PAGE_SIZE)
#else
# define VMALLOC_END (FIXADDR_START-2*PAGE_SIZE)
#endif
enum fixed_addresses {
FIX_HOLE,
FIX_VDSO,
FIX_DBGP_BASE,
FIX_EARLYCON_MEM_BASE,
#ifdef CONFIG_X86_LOCAL_APIC
FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
#endif
#ifdef CONFIG_X86_IO_APIC
FIX_IO_APIC_BASE_0,
FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1,
#endif
#ifdef CONFIG_X86_VISWS_APIC
FIX_CO_CPU, /* Cobalt timer */
FIX_CO_APIC, /* Cobalt APIC Redirection Table */
FIX_LI_PCIA, /* Lithium PCI Bridge A */
FIX_LI_PCIB, /* Lithium PCI Bridge B */
#endif
#ifdef CONFIG_X86_F00F_BUG
FIX_F00F_IDT, /* Virtual mapping for IDT */
#endif
#ifdef CONFIG_X86_CYCLONE_TIMER
FIX_CYCLONE_TIMER, /*cyclone timer register*/
#endif
#ifdef CONFIG_HIGHMEM
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
#endif
#ifdef CONFIG_ACPI
FIX_ACPI_BEGIN,
FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
#endif
#ifdef CONFIG_PCI_MMCONFIG
FIX_PCIE_MCFG,
#endif
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
__end_of_permanent_fixed_addresses,
/* temporary boot-time mappings, used before ioremap() is functional */
#define NR_FIX_BTMAPS 16
FIX_BTMAP_END = __end_of_permanent_fixed_addresses,
FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS - 1,
FIX_WP_TEST,
__end_of_fixed_addresses
}
scale up - vertically scale
scale out - horizontally scale
scale out
1. Use share nothing clustering architectures
The session failover functionality cannot avoid errors completely when failures happen, as my article mentioned, but it will damage the performance and scalability.
2. Use scalable session replication mechanisms
The most scalable one is paired node replication, the least scalable solution is using database as session persistence storage.
3. Use collocated deployment instead of distributed one.
4. Shared resources and services
Database servers, JNDI trees, LDAP Servers, and external file systems can be shared by the nodes in the cluster.
5. Memcached
Memcached's magic lies in its two-stage hash approach. It behaves as though it were a giant hash table, looking up key = value pairs. Give it a key, and set or get some arbitrary data. When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data.
6. Terracotta
Terracotta extends the Java Memory Model of a single JVM to include a cluster of virtual machines such that threads on one virtual machine can interact with threads on another virtual machine as if they were all on the same virtual machine with an unlimited amount of heap.
7. Using unorthodox approach to achieve high scalability
今天遇到了一個奇怪的Hibernate問題。(我用得hibernate是2.1版。比較舊,不知道這個問題在hibernate 3 中是否存在。)
下面這個是捕捉到的異常堆棧。
java.lang.ClassCastException: java.lang.Boolean
at net.sf.hibernate.type.StringType.set(StringType.java:26)
at net.sf.hibernate.type.NullableType.nullSafeSet(NullableType.java:48)
at net.sf.hibernate.type.NullableType.nullSafeSet(NullableType.java:35)
at net.sf.hibernate.persister.EntityPersister.dehydrate(EntityPersister.java:393)
at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:466)
at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:442)
at net.sf.hibernate.impl.ScheduledInsertion.execute(ScheduledInsertion.java:29)
at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2382)
at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2335)
at net.sf.hibernate.impl.SessionImpl.flush(SessionImpl.java:2204)
奇怪之處在于程序在本機Tomcat上運行情況良好,一旦部署到Linux服務器上就掛了。
仔細分析之后,發現要存儲的對象既定義了get方法又定義了is方法。內容示例如下
public class FakePO {
String goodMan;
public String getGoodMan() {
return goodMan;
}
public void setGoodMan(String goodMan) {
this.goodMan = goodMan;
}
public boolean isGoodMan(){
return "Y".equalsIgnoreCase(goodMan);
}
}
懷疑可能是這個衍生的輔助方法isGoodMan()導致的問題。通過追蹤Hibernate 2的源代碼,發現hibernate 2是按如下方式通過反射API訪問PO的。
private static Method getterMethod(Class theClass, String propertyName) {
Method[] methods = theClass.getDeclaredMethods();
for (int i=0; i<methods.length; i++) {
// only carry on if the method has no parameters
if ( methods[i].getParameterTypes().length==0 ) {
String methodName = methods[i].getName();
// try "get"
if( methodName.startsWith("get") ) {
String testStdMethod = Introspector.decapitalize( methodName.substring(3) );
String testOldMethod = methodName.substring(3);
if( testStdMethod.equals(propertyName) || testOldMethod.equals(propertyName) ) return methods[i];
}
// if not "get" then try "is"
/*boolean isBoolean = methods[i].getReturnType().equals(Boolean.class) ||
methods[i].getReturnType().equals(boolean.class);*/
if( methodName.startsWith("is") ) {
String testStdMethod = Introspector.decapitalize( methodName.substring(2) );
String testOldMethod = methodName.substring(2);
if( testStdMethod.equals(propertyName) || testOldMethod.equals(propertyName) ) return methods[i];
}
}
}
return null;
}
仔細讀以上代碼可以發現,Hibernate就是簡單的遍歷類的public方法,看是否和屬性名稱匹配,并不檢查方法的返回值是否和屬性的類型匹配。所以在我們的例子中,既可能返回get方法,也可能返回is方法,取決于public方法列表的順序,而這個順序恰恰是沒有任何保證的。這也解釋了為什么這個問題只能在特定平臺上發生。
最近在看write系統調用的實現,雖然還有一下細節不是很清楚,但是大致的實現機理還是有一定的理解了。總結如下:
這里假設最普通的情況,不考慮Direct IO 的情況。從全家的高度看,要往一個文件中寫入內容,需要一下幾步。
1. sys_write 將用戶進程要寫的內容寫入到內核的文件頁面緩沖中。sys_write 本身到此就結束了。
2. pdflush 內核線程(定期或者由內核閾值觸發)刷新臟的頁面緩沖,其實只是提交IO請求給底層的驅動。
3. IO請求并不是同步執行的,而是由底層的驅動調度執行,發出DMA操作指令。
4. 物理IO完成之后會中斷并通知內核,內核負責更新IO的狀態。
先要去陪兒子睡覺了。有空會繼續細化各個部分的實現。
sys_write 的調用過程。(我的linux內核版本為2.6.24,文件系統為ext3)
asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count)
vfs_write(file, buf, count, &pos);
file->f_op->write(file, buf, count, pos);
這里的file->fop 是在open一個文件是初始化的函數指針,ext3文件系統對應的函數為do_sync_write。
下面是其實現的要點。
for (;;) {
300 ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
301 if (ret != -EIOCBRETRY)
302 break;
303 wait_on_retry_sync_kiocb(&kiocb);
304 }
305
306 if (-EIOCBQUEUED == ret)
307 ret = wait_on_sync_kiocb(&kiocb);
filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); 是實現的核心,其函數指針指向ext3_file_write。
307行的作用在于等待IO的完成。這里的IO完成指的是進入IO的隊列而已,不是物理IO的完成。
generic_file_aio_write(iocb, iov, nr_segs, pos);
__generic_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
generic_file_buffered_write(iocb, iov, nr_segs, pos,ppos,count,written);
generic_file_direct_IO(WRITE, iocb, iov, pos, *nr_segs);
以下的調用序列還很長,一時還消化不了。僅供自己參考。
最近開始在Unix下讀一些源代碼.下面是一點體會.
1. 工欲善其事,必先利其器
我開始的時候是用find xargs 和 egrep 配合來搜索關鍵字, 看代碼的效率很低.后來裝了ctags,方便多了.最初沒有裝ctags, 是因為覺得可能裝起來費勁, 其實還是很容易裝的,也就是那么幾步, google一下就搞定了.
2. 要及時實踐.
雖然開始是讀代碼的方式比較笨,不過這種干勁非常有用,只有動手實踐了,才有可能取得進步.否則的話,我可能還是停留在閱讀書本上代碼的階段.
3. Unix下的工具看起來不如Windows的工具異用.其實不然,可能是門檻搞一些.多數人象我一樣因此不敢去碰它.入門以后,會發現其實Unix下的工具真是短小精悍. 就拿VIM + Ctags 閱讀源代碼來說,覺得性價比高.符合80/20原則.
|