亚洲视屏在线观看,一本色道久久88综合亚洲精品高清 ,亚洲视频在线视频

�_�读paper - Application-Level Isolation and Recovery with Solitude

ZelluX — Wed, 28 May 2008 07:23:00 GMT

一套基于文件系�l�的安全�Ҏ��Q�主要通过隔离�q�行不可信�Q的程序、taint记录、事故恢复�?br />
我的presentation�Q?br />http://docs.google.com/Presentation?id=dcjk4xx7_473cv5ddgc8

��Z��旉��考虑没有提到paper中进�E�间通信的解��x��?br />

ZelluX 2008-05-28 15:23 发表评论

最�q�读的两��paper

ZelluX — Tue, 20 May 2008 12:18:00 GMT

摘要: 一��介�l�一�U�全新的Web架构�Q�另一��介�l�虚拟机的探��方�?nbsp; 阅读全文

ZelluX 2008-05-20 20:18 发表评论

阅读�W�记 - SubVirt: Implementing malware with virtual machines (2)

ZelluX — Tue, 06 May 2008 06:35:00 GMT

摘要: 一�U�利用虚拟机�q�行的攻��L��D?下篇) 阅读全文

ZelluX 2008-05-06 14:35 发表评论

阅读�W�记 - SubVirt: Implementing malware with virtual machines (1)

ZelluX — Mon, 05 May 2008 13:53:00 GMT

摘要: 一�U�利用虚拟机�q�行的攻��L��D?nbsp; 阅读全文

ZelluX 2008-05-05 21:53 发表评论

Streamware ppt

ZelluX — Wed, 16 Apr 2008 06:57:00 GMT

要提高效率果然得�q�离�|�络�Q��h床上看paper理解��h��快多�?br />�ȝ��把晚上要讲的ppt做出来了�Q�囧

ZelluX 2008-04-16 14:57 发表评论

Weekly Report

ZelluX — Wed, 26 Mar 2008 09:01:00 GMT

试试Google Document

ZelluX 2008-03-26 17:01 发表评论

DEBUG 记录 - SPEC2006 470.lbm

ZelluX — Mon, 24 Mar 2008 13:16:00 GMT

一个用Lattice Boltzmann Method模拟三维�I�间中不可压�~�流体的�E�序�Q�示意图见底部�?br />转这个程序实在是太耗体力了 -_-b

Brook本��n的不��缺陗��bug�Q�加上不习惯�U�学计算�E�序的代码风��|��D��大多数时间都在fix bug�?br />
其中de掉以后最有快感的一个bug�Q�（只能�q�么形容�?>,<�Q?br />
每个cell都有一个flag��|��管�c�d��是double�Q�但是程序中是用一个MAGIC_CAST宏把它当作整型处理的�?br />初始情况�Q�每个cell的flag都�ؓ~f�Q�也��是一�?~28位都�?�Q?9~32位�ؓ0的double型��Q炏V��根据IEEE标准�Q�应该是个NaN�?br />CPU上没有问题，攑ֈ�GPU上问题就出来了，GPU不支持这�U��{型操作，在对�q�个double型进行运��操作的时候，所有结果都会变成NaN�?br />
解决�Ҏ��Q?br />在把数据传给GPU之前可以先把�q�些flag��D�{换�ؓGPU可以操作的double型，最��单的�Ҏ��是都先转成int�Q�会有truncating�Q�，然后取反�Q�再传给GPU�?br />

ZelluX 2008-03-24 21:16 发表评论

阅读�W�记

ZelluX — Sat, 15 Mar 2008 06:46:00 GMT

摘要: 包括各种paper, survey以及workshop上的讲��{�内�?nbsp; 阅读全文

ZelluX 2008-03-15 14:46 发表评论

GP-GPU 阅读�W�记 (5)

ZelluX — Fri, 15 Feb 2008 11:51:00 GMT

Mars: A MapReduce Framework on Graphics Processors
by Bingsheng He @ Hong Kong Univ. of Sci. & Tech.
Nage K. Govindaraju @ Microsoft Corp.
Qiong Luo, Tuyong Wang @ Sina Corp.

一些重�Ҏ��讎ͼ�
1. Introduction
Three challenges in implementing the MapReduce framework on the GPU:
First, the synchronization overhead in the run-time system of the framework must be low.
Second, a fine-grained load balancing scheme is required.
Third, the core tasks of MapReduce, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.
Each thread is responsible for a Map or a Reduce task with a small number of key/value pairs as input.
Performance improvement: 1.5-16 times

2. Priliminary and Related Work
2.1. Graphics Processors
It is desirable to schedule the tasks between the CPU and the GPU to fully exploit their computation power.
Given a kernel program, the occupancy of the GPU is the ratio of active schedule units to the maximum number of schedule units supported on the GPU.
The GPU has a hardware feature called coalesced access to exploit the spatial locality of memory accesses among threads.

2.2. GPGPU
2.3. MapReduce
Map: (k1, v1) -> (k2, v2)*
Reduce: (k2, v2*) -> v3*

3. Design and Immplementation
3.1. Design Goals
3.2. System Workflow and Configuration
3.3. APIs
3.4. Implementation Techniques
Based on this compilation information and the total computation resources on the GPU, we set the number of threads per thread group and the number of thread groups to achieve a high occupancy at run time.

4. Evaluation
4.1. Experimental Setup

ZelluX 2008-02-15 19:51 发表评论

GP-GPU 阅读�W�记 (4)

ZelluX — Sun, 10 Feb 2008 08:13:00 GMT

4.2. Data Structures

The GPU Memory Model
通常使用二维的texture保存�Q�一是因��Z��l�texture能存攄��东西很少�Q�二是因为现在的GPU很难高效地写入一�?�l�texture�?br />Iteration
stream�~�程模型包含了一�U�隐式的��的�q�行遍历。 �?br />Generalized Arrays via Address Translation
在GPGPU�~�程中主要��用的数据�l�构是随��问的多位容器�Q�包括稀�?�E�密数组�{�。每个结构定义了一个虚拟域virual grid domain和一个物理域physical grid domaiin�Q�以及之间相互�{换的address translator�?br />
4.2.1. Dense Arrays
多维数组通常先映��到一�l�_��然后再到二维�?br />4.2.2. Sparse Arrays
�Ҏ��非零元素的位�|�和数量是否变化分两�U�，静态和动态�?br />4.2.3. Adaptive Structures

ZelluX 2008-02-10 16:13 发表评论

GP-GPU 阅读�W�记 (3)

ZelluX — Sat, 09 Feb 2008 05:14:00 GMT

4. GPGPU Techniques
4.1. Stream Operations
4.1.1. Map
Given a stream of data elements and a function, map will apply the function to every element in the stream.
4.1.2. Reduce
Sometimes a computation requires computing a smaller stream from a larger input stream, possibly to a single element stream. This type of computation is called a reduction. For example, computing the sum or maximum of all the elements in a stream.
On GPUs, reductions can be performed by alternately rendering to and reading from a pair of textures.
也就是用分治法，不断切换输入和输出数据，每次都能减少一定比例的数据规模�?br /> 4.1.3. Scatter and Gather
If the write and read operations access memory indirectly, they are called scatter and gather respectively.
4.1.4. Stream Filtering
This stream fitering operation is essentially a nonuniform reduction.
4.1.5. Sort
Classic sorting algorithms are data-dependent and generally require scatter operations.
主要的几个算法都和Sorting Network有关�Q�还有一�U�adaptive sort�Q�和原来序列的有序度相关�?br /> 4.1.6. Search
4.2. Data Structures

ZelluX 2008-02-09 13:14 发表评论

GP-GPU 阅读�W�记 (2)

ZelluX — Fri, 08 Feb 2008 08:05:00 GMT

2.4 GPU Program Flow Control
最新的GPU支持多种形式的分支，但是�׃��它们的高度�ƈ行化的本质，使用�q�些分支的时候一定要注意�?
2.4.1 Hardware Machanisms for Flow Control
三种主要实现�Q?
Predication �q��真正的data-dependent branch
MIMD branching
SIMD branching 同时�q�行的指令唯一�Q�即各个点的分支选择应该一�?
2.4.2 Moving Branching Up The Pipeline
2.4.2.1 Static Branch Resolution
静态分析，避免循环内部的分支。这里�D了一个在��L��I�间�Ҏ��(discrete spatial grid)上解偏微分方�E�的例子�Q�不�q�没怎么看懂�Q�大致是把��@环拆成两部分的做法�?
2.4.2.2 Pre-computation
有时候一�D�|��间内或者几�ơ��@环中某个分支的结果会是一个常数。这时候就只要在知道结果会改变的时候重新计��即可�?
2.4.2.3 Z-Cull
��C��GPU有一�p�d��用于避免处理不会被看到的像素的技术，其中之一��是Z-cull。简单的说Z-cull把没有通过深度��试�Q�Z轴覆盖）点直接放弃。在��体模拟中，把land-locked障碍单元的Z深度标记�?�Q�即可蟩�q�这些点的计��?
2.4.2.4 Data-Dependent Looping With Occlusion Queries
同样是避免处理不可见的点的技�?

3 Programming Systems
GPU的架构发展非常迅速，使得profiling和tuning需要由GPU生��商解冟�?
3.1 High-level Shading Languages
Cg, HLSL 和底层硬件很接近
OpenGL Shading Language 有一些不直接映射到硬件的�Ҏ��，比如整数支持
Sh, Ashli, ...
3.2 GPGPU Languages and Libraries
上面提到的几个语�a�在��用时都要求编�E��h员站在几何元素的视角写代码。下面的几个�pȝ��试着把一些GPGPU功能抽象出来�Q�隐藏底层的GPU实现�?br /> Brook 前几星期打过交道的东�?br /> Scout, Glift 都没听说�q�。。�?br /> 3.3 Debugging Tools
GPU的调试功能很受局限。它必须提供在某一时刻昄��多个点的调试信息的功能。一�U�printf-style的方法是把他们直接显�C�在屏幕上（汗，如果是GPGPU�~�程岂不是花屏了 >,<�Q��?

ZelluX 2008-02-08 16:05 发表评论

GP-GPU 阅读�W�记 (1)

ZelluX — Thu, 07 Feb 2008 08:31:00 GMT

实验室的寒假��d�� =_=
No.1
A Survey of General-Purpose Computation on Graphics Hardware
on EUROGRAPHICS 2005

1. Why GP-GPU?
1.1 Powerful and Inexpensive
高内存带宽：Nvidia GeForce 6800 Ultra - 35.2GB/sec
强大的计��能力：ATI X800 XT - 63GFLOPS, Intel Pentium4 SSE unit(3.7GHz) - 14.8GFLOPS
��端处理�U�技的应用：最新公�?指该survey发布的时�?的GPU包含三亿个晶体管�Q�由0.011微米技术制�?br /> 快速发展：GeForce 6800的throughput�?900的两倍。通常GPU的计��能力��^均每�q�增镉K��度�?.7x(pixels/second)�?.3x(vertices/second)�Q�而根据摩��定律，CPU的对应数值大概�ؓ每年1.4x。粗略的��_��GPU性能每六个月增长一倍�?

1.2 Flexible and Programmable

1.3 Limitations and Difficulties
GPU的强大计��性能是徏立在它高度针对的架构上的�Q�因此很多应用都不适合攑ֈ�GPU上做。比如文字处理，主要包括内存通信�Q�而且很难�q�行化�?br /> 如今的GPU也缺��一些基本的计算功能�Q�比如整数运��。而且很多只支�?2位��Q�Ҏ��Q�貌似最�q�的R670指��o集可以处理double�c�d��了）�Q�这样导致很多科学计��都没法在GPU上做�?br /> 另外即��对于适合GPU�q�些�Ҏ��的问题�Q�真正��用GPU做时也有不少问题。GPU的编�E�模型很不一��P��高效的GPU�~�程不仅仅是说多学一门高�U�语�a�。如今要借助GPU的计��能力，需要编�E��h员同时掌握相应的�U�学计算知识和计��机囑�Ş学知识。尽��如此，GPU�Ҏ��能提升的帮助还是很�׃�h的�?br />
1.4 GPGPU Today
http://gpgpu.org
一些GPGPU的应用包�?br /> Dense and sparse matrix multiplication 计算领域
Multigrid and conjugate-gradient solves for systems partial differential equations   计算领域
Ray tracing   囑փ�处理
Photon mapping   囑փ�处理
Fluid mechanics solvers   物理模拟
Datamining operations   数据�?数据挖掘

2. Overview of Programmable Graphics Hardware
2.1 Overview of the Graphics Pipeline
当今的GPU都采用了�U�Cؓgraphics pipeline的架构。pipeline被分成不同的stage�Q�硬件上每个stage都被攑ֈ�task-parallel machine organization上实现�?br />
2.2 Programmable Hardware
昑֍�商们把固定功能的pipeline转化成了一个更灉|��的可�~�程的pipeliine。主要在geometry stage和fragment stage。原来的固定的操作被用户定义的vertex program和fragment program代替
通常来说�Q�这些可�~�程阶段��d��一�l�含有限数量�?�?�?2位��Q点的向量数组�q�输��Z��l�含有限数量�?*32��点向量的数�l�。每个可�~�程阶段都可以访问常数寄存器�Q�也可以��d��对应的寄存器�?br />
2.3 Introduction to the GPU Programming Model
典型的GPGPU�E�序都��用了fragment processor作�ؓ计算引擎。通常的结构�ؓ�Q?br /> a. �E�序员确定该应用的�ƈ行部分。应用程序被分成几个独立的可�q�行�D�，每段都被看成是一个kernel�Q�被当成fragment program实现。每个kernel的输入输出都是一个或多个数据数组�Q�以texture形式保存在GPU内存中。用��相关的术语表述的话�Q�这些在texture中的数据�l�成了stream�Q�每个stream上的元素都要被kernel分别处理�?br /> b. 调用kernel前要先确定计��范��_��E�序员可以传递点的数据给GPU。注意GPU在处理一�l�数�l�时性能有所局限�?br /> c. rasterizer为每个像素生成一个fragment�?br /> d. 每个fragment�?strong>同一�?/strong>�z�d��的kernel�E�序处理。fragment�E�序可以��d��L��的全局内存�Q�但只能写到rasterizer军_��的frame buffer中�?strong>�q�块�q�没怎么搞懂
e. 每个fragment的输出是一个值或者向量��|��可以作�ؓ作中的程序结果，也可以保存�ؓ一个texture�Q�用于后面的计算�Q�复杂的应用通常需要多个pipeline之间的传�?multipass)

ZelluX 2008-02-07 16:31 发表评论

ZelluX — Wed, 16 Jan 2008 03:58:00 GMT

剩下的两星期
我负责的主要是Fortran -> IL的部�?br /> 主要的几个问�?br />
Fortran转成High WHIRL后，怎么写成IL�Q?br />     1. 参考brook�Q�看看能不能代码重用
    2. 或者试试直接将WHIRL转成Brook IR�Q�然后调用那几个routines自动转IL�Q?br />
如何在Fortran中调用CAL�Q?br />     1. 如何实现F77调用库函敎ͼ�
    2. 调用的overhead如何呢？

一些优化相关的paper�Q�CC已经攉��了几��?br />     1. Alan Leung on 6th Workshop on Compiler-Driven Performance
    2. RapidMind Development Platform
    3. LiquidSIMD

其他一些问�?br />     1. 军_��是否攑ֈ�GPU里面做的那个tradeoff如何控制�Q�或者动态控�Ӟ��

暂时惛_��q�些�Q�一步一步来

ZelluX 2008-01-16 11:58 发表评论

vectorization

ZelluX — Wed, 02 Jan 2008 11:07:00 GMT

摘要: http://wikipedia.answers.com/vectorization
阅读全文

ZelluX 2008-01-02 19:07 发表评论

ZelluX — Sun, 16 Dec 2007 13:03:00 GMT

涉及��C��化spec2006中的一些程序，orz
贴资�?br />

Fortran导引

Fortran入门快速指�?/p>

Fortran学习的一些徏�?/h1>

2006-8-6
�怿�大家都对C语言有一定的了解�Q�其实Fortran跟C相差不是很多�?
我把自己认�ؓ比较合理快速学习Fortran的方法说下�?
学习Fortran�Q�会遇到Fortran77&Fortran90�{�等�Q�两者差别不大，�����学习Fortran90或更
高，更加自由些（仅对一般用而言�Q�其他优势可能体��C��出来�Q�，对自�׃��后学习他
的程序包也会有好处�?
大家一般只是�ؓ了编�E��ؓ了计���而学Fortran�Q�而不是�ؓ了学习Fortran而学Fortran�Q�所�?
我的�����是学习Fortran不要像学C那样拿一本很详细的教材从头至���֭�下来�Q�一个大安���
有不错的C语言基础�Q�而且也没有太多的�_�֊���M��门研�I�这些，倒不如看些简易的教材�Q�我
会附上）�Q�掌握基本语句之后直接从看最���单的�E�序开始。这��P��很快��׃��体会到Fortra
n的格式，可以开始自己写�E�序了。学习的��序我徏议如下：
1�?�~�一些仅含输入输出的�E�序�Q�然后可以尝试把输入输出同文件结合�v来（从文仉����?
数据、写数据�Q�；
2�?然后可以学条件判断、��@环语句，通过几个实例也可以很快掌握；
3�?再往后就是写子程序，���是�E�序的调用，�怿�那个时候，看了我的�W�一个例子（PROG
RAM A�Q�就应该能写出简单的含函数调用的�E�序�Q�到了这里，基本上可以算告一�D�落�Q�可�?
�q�行�l�构上复杂的�E�序的编写；
4�?最后，可以学一下多个程序的�~�译甚至是多�U�语�a��E�序的�؜�~�（如既有C又有Fortran
的多个程序一��L��译）。多个程序的�~�译我不�q�不熟悉�Q�就留给siriusbobo同志来解说吧
:-)
在编�E�中遇到困难然后再去查找资料和用法不�׃ؓ一�U�好的方法，不必��L����L��学全�?
当然�Q�有���_��旉���和精力的同学强烈�����好好看教材，不必急于求成�Q�有一个好的基����?
是一件很好的事�?
Fortran相比C的优势的话在于它丰富的资源，C的优势可能是更加����z�，�~�译效率更高。但
对于我的�q�x��使用来说�Q�这两者的优势、劣劉K��体现不出来，自己的感觉是Fortran更接�q?
�q�x��的科学语�a��Q�比较严谨些�Q�更�Ҏ����L��不出错，比较�W�合习惯�Q�变量、函数的声明�?
也比C更方便灵�z�，以外函数的��用�ؓ例：
******************************************************************************
PROGRAM A
real z
read *,z
call f(z)
y=z
print *,y
end
subroutine f(x)
x=x**2
return
end
******************************************************************************
只需要加一�?subroutine"�E�序�D�，��d��数即可用"call"调用�Q�当然也可以写多个子�E�序
�Q�其中一个子�E�序也可以通过"call"来调用其他子�E�序�?
��׃��般学习而言�Q�除了子�E�序的编写，另外一个用得比较多的是文�g的读写操作，�ȝ��
"read",写用"write"�Q�如下：
******************************************************************************
PROGRAM B
real x
open (1,file='in.dat',status='unknown')
open (2,file='out.dat',status='unknown')
read (1,100) x
100 format (1e12.7)
close(1)
write (2,200) x
200 format (1e15.8)
close(2)
end
******************************************************************************
如果�?*"的话�Q�就为默认�Ş式，更具体的可以查看帮助或有兌���料，比较好的�Ҏ��是随�?
做一个test�E�序�Q�用来检���所学或所惟�?
对于上程序，出现�?100","200"是语句标��P���q�些标号为方便语句的跌��{而出玎ͼ�可以�?
现��@环、条件控制等�Q�但也�ؓ了�ɽE�序�l�构化而不推荐使用�Q�用goto语句和语句标号实�?
语句的蟩转如下：
******************************************************************************
PROGRAM C
integer n
real z
n=0
read *,z
1 call f(z)
y=z
n=n+1
if (n<10) goto 1
print *,y
end
subroutine f(x)
x=x**2
return
end
******************************************************************************
�q�类跌��{在F77里经常用刎ͼ�F90以后�q�不多见�Q�但对于"100 format (1e12.7)"之类�q�是�l?
常用刎ͼ��q�是用来表示存储��d��的数据的格式的，可以攑֜��E�序��M��位置�Q�更具体的用�?
要参看说明�?
有关注释�Q?
Fortran里注释用"!"�?C"�Q�其中，一般在Windows下���?Compad Visual Fortran"�~�译�Q?
有两�U�格式，一个是"Free Format"�Q�生�?.f90"�Q�另外一�?Fixed Format"�Q�生�?.for
"�Q�只�?.for"里两�U�注释都可用�Q?!"�?C"�Q�，但在".f90"里只能用"!"�?
有关学习的困难：
���法是语�a�的灵��没错，是最�ȝ��的，但想必大安���学过C�Q�遇到过不少���法�Q�这些可以用
C实现的，用Fortran实现都不是很困难�Q�所以这里不主要讨论�q�个“灵魂”性质的东�ѝ�?
帔R��、变量、数�l�的数据�c�d���Q�以及数据类型的��d��控制倒是�l�常�Ҏ��出错的。下面主�?
讲一些我认�ؓ需要注意的和我曄���犯过和看到过的错误�?
Fortran跟C一��P��也分整型(INTEGER)�Q�实�?REAL)�Q�双�_�ֺ�(REAL*8或REAL(8)或DOUBLE
PRECISION)�Q�这些在�U�学计算中还是比较重要的�Q�以实型��Cؓ例：
一般REAL�{��h于REAL*4或REAL(4),是单�_�ֺ�的；
而双�_�ֺ�在F77中表�C�ZؓDOUBLE PRECISION�Q�在F90中可以表�C�ZؓREAL*8或REAL(8)�Q�在高精
度计���中�Q�双�_�ֺ�的变量是很有必要的，对于一般实数可以表�C�Zؓ���数形式或指数�Ş式，
而双�_�ֺ�都表�C�成指数形式�Q�但指数E要改成D�Q�如�Q?
REAL:100.0�?e2,双精度下���得表示�?D2
�׃��Fortran中不需要对每个变量都进行声明，所以有时候会在每个程序或子程序开头做�?
说明�Q�如下：
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
代表以A-H以及O-Z字母开头的变量默认�Q�在不声明的情况下）是双�_�ֺ�的，否则则是整型
的，如下�Q?
******************************************************************************
PROGRAM D
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
J1=1D-2
J2=-0.5D-1
x=J1+J2
print *,x
end
******************************************************************************
PROGRAM E
implicit double precision (A-I,O-Z)
double precision a,i,e1,e2
data j2 /0.87450547081842D-3/
data j3 /-0.11886910646016D-4/
data j5 /-0.17242068505339D-5/
data j7 /0.10566966079622D-6/
write(*,*) "please input a"
read(*,*) a
write(*,*) "please input i"
read(*,*) i
e1=(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)&
&/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99&
&*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2)))
e2=-(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)&
&/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99&
&*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2)))
write(*,"(E9.2E3)") e1,e2
stop
end
******************************************************************************
�W�一个程序输��Z���?0.4而是0.000000000000000E+000
�W�二个程序�Q意输入a、i�Q��ƈ未得到希望得到的�l�果�Q�而是输出NAN和NAN�Q�关于NAN�q�个�?
误，有时候函数定义域不符合的时候，�q�行�q�不报错而是输出NAN�Q�这个时候检查程序这�?
地方是检查的重点�Q�当�Ӟ��会有其他情况�Q�但我碰到的不多�Q�只好就我所知跟大家交流一
下�?
�q�两个程序都因�ؓJ开头的变量不属于默认双�_�ֺ�变量�Q�而用双精度表�C�给它们赋��g���Q�导
致结果跟预期不一��_��在程序中把这些以J开头的变量用REAL*8声明一下，或把
implicit double precision (A-I,O-Z)改�ؓ�Q?
implicit double precision (A-J,O-Z)�Q�或把这个语句去�?
���可以得到预期的�l�果了�?
对于数组�Q�可以用DIMENSION定义�Q�但需要注意的是，若在�E�序头未做声明（implicit
none�Q�时�Q�用DIMENSION定义数组�Ӟ��当数�l�名首字母不属于(A-J,O-Z)里时�Q�其��D��出时
为整型，当然做了如下声明情况也会如此�Q�（implicit double precision (A-I,O-Z)�Q?
如下�Q?
******************************************************************************
PROGRAM F
dimension m(2)
m(1)=1.5
m(2)=2.5
print *,m(1),m(2)
end
******************************************************************************
输出的结果是“1�Q?”而不�?#8220;1.500000,2.500000”
当把�E�序中m改�ؓa�Ӟ��输出“1.500000,2.500000”
所以，比较好的�Ҏ��是尝试用REAL来定义数�l�（当然也可以用REAL*8�Q�：
******************************************************************************
PROGRAM G
real m(2)
m(1)=1.5
m(2)=2.5
print *,m(1),m(2)
end
******************************************************************************
另外�Q�要说的是，变量可以不定义而直接赋��|��但会出现如上面PROGRAM D-E的问题，所�?
�����大家在编�E�的时候对非整型变量声明一下，���管�ȝ���Q�但不容易出错，有时候正是这
�c�错误会让初学者困扰好久�?
定义变量�Ӟ���l�常会看��C���U�定义的写法�Q�以REAL��Z���Q?
可以�?
real m
�?real:: m
�W�一�U�方式不可以直接赋��|��必须写成�q�样�Q?
******************************************************************************
PROGRAM H
real m
m=1.0
print *,m
end
******************************************************************************
�W�二�U�则可以�Q?
******************************************************************************
PROGRAM I
real:: m=1.0
print *,m
end
******************************************************************************

一些免费的Fortran�~�译�?/h1>
Free Fortran Compilers

取自 http://www.thefreecountry.com/compilers/fortran.shtml
This page lists free Fortran compilers for various operating systems. Some of the compilers are compliant with the ANSI Fortran 77 specifications, others with Fortran 95, and so on. Some of them may also come complete with debuggers, editors and an integrated development environment (IDE).

If you need a book on Fortran, you may want to check out the selection of books available at Amazon.com.

Disclaimer

The information provided on this page comes without any warranty whatsoever. Use it at your own risk. Just because a program, book or service is listed here or has a good review does not mean that I endorse or approve of the program or of any of its contents. All the other standard disclaimers also apply.

Free Fortran Compilers and IDEs

Sun Studio Compilers and Tools

Sun Studio Compilers and Tools for Linux and Solaris OS on Sparc and x86/x64 platforms includes command line tools as well as a NetBeans-based IDE for developing, compiling and debugging C, C++ and Fortran programs. It also includes performance analysis tools.

Intel Fortran Compiler for Linux

The Intel Fortran Compiler for Linux is free for personal, non-commercial use (registration required). It features an optimizing compiler, the Intel Debugger (GUI and command-line), mixed language support (C and Fortran), full compliance with the ISO Fortran 95 standard, support for the evolving Fortran 2003 standard, multi-threaded application support (OpenMP and auto-parallelization), ability to handle big-endian data files, compatibility with various Linux tools (like make, gdb and Emacs), substantial compatibility with Compaq Visual Fortran, etc. The optimizing compiler supports interprocedural optimization, profile guided optimization, automatic vectorizer, etc.

G95

G95 is an open source Fortran 95 compiler. At the time this was written, most of the ISO Fortran 95 standard has been implemented. Platforms supported include Linux(x86, Intel IA64, AMD x86_64), Windows, Macintosh OS X, FreeBSD, Sparc Solaris and HP-UX.

Gfortran

gfortran is a Fortran 95 compiler. It runs on Linux and Windows (under cygwin).

Salford FTN95 Fortran 95 Compiler

Salford FTN95 is a Fortran 95 compiler that supports Fortran 77, Fortran 90 and Fortran 95. The compiler generates exectuables for Win32 (but Win32 console and GUI applications) and the Microsoft .NET framework. It comes with CHECKMATE, a tool that lets programmers check the correctness of their code at runtime. Also included is Plato 3 (an IDE), full source level debugging, documentation and examples. You may only generate code for your personal use on your home computer, and all executables will display a banner on execution.

Salford FTN77 PE ANSI Fortran 77 Compiler

The Salford FTN77 PE (Personal Edition) comes with a full optimising ANSI Fortran 77 compiler with support for various common extensions (including MIL-STD-1753), linker, libraries, make utility, librarian and a full screen debugger. The compiler has a built-in assembler for inline assembly, and the ability to link with code from other sources (such as C++ Fortran 90 and Fortran 95 code). It is free for personal use and for use by students. It supports Windows 95, 98 and NT.

Open Source Watcom / OpenWatcom Fortran Compiler

The Watcom (now OpenWatcom) Fortran 77 compiler is now available free of charge, complete with source code. This compiler, which generates code for Win32, Windows 3.1 (Win16), OS/2, Netware, MSDOS (16 and 32 bit), etc, was a well-known compiler some years back (until Sybase terminated it).

This system comes with the GNU G77 Fortran compiler (among other things, including a C/C++ compiler), which you can use to generate Win32 executables from F77 code. Like many systems based on the GNU tools, Mingw32 comes with complete with various programming tools, such as a program maintainence program (ie, make), text processing tools (sed, grep), lexical analyser generator (flex), parser generator (bison), etc.

DJGPP GNU G77 (Fortran 77) for MSDOS

This is a development system based on the well-known GNU compiler system that includes compilers for Fortran 77, C, C++, Objective C, etc. It generates 32 bit MSDOS executables that is Windows 95 long-filename-aware. It is a very complete system with IDEs, graphics libraries, lexical analyser generators (flex), parser generators (bison), text processing utilities (like grep, sed), a program maintainence utility (ie, make), a dos extender, and so on. The compiler, utilities and libraries come with source code.

f2j - Fortran to Java Compiler

f2j translates Fortran 77 source code to Java class files. It is distributed under the GNU GPL and runs on Linux, SunOS/Solaris.

F2C - Fortran to C Translator

This is a well-known Fortran to C converter that comes with source code. The site also includes pre-compiled binaries (executables) for MSDOS and Microsoft Windows, although these are by no means the only systems supported - the compiler works on Unix systems like BSD, Linux, etc. You have to compile the compiler yourself on those systems. Libraries containing the runtime support needed (together with the C source code) are also included. You need a C compiler to generate binaries from your Fortran sources.

FORCE Project - Fortran Compiler and Editor

FORCE is actually just an IDE for Fortran 77 that integrates the GNU Fortran 77 compiler (G77).

Emx/Rsx G77 (GNU Fortran)

This is another GNU Fortran port. The RSX port compiles DOS extended console applications for Win32 and the EMX port generates MSDOS extended applications as well as OS/2 applications. The compiler supports the Fortran 77 syntax.

Lcc-Win32 Fortran Compiler

LCC-Win32 is primarily a free C compiler and its programming environment for Win32, but it also appears to have a Fortran compiler available for download from their website. It apparently compiles Fortran 77 code (with some common extensions) to C which is subsequently compiled by the C compiler to generate a Win32 native executable. The entire process is integrated seamlessly into the IDE so you might not even realise that intemediate C files were being generated (they are deleted automatically when they are no longer needed). The IDE supports syntax highlighting in C and Fortran.

Compaq Fortran for Linux Alpha

This Fortran compiler is for Linux Alpha systems only. It implements the full Fortran-95 language as well as a few language extensions. It comes with a debugger (ladebug), an extended maths library (the Compaq Extended Math Library, CXML) containing technical and scientific subroutines. The licence for the free version allows it to be used for personal and educational purposes, and prohibits its use in any commercial venture.

我的Fortran基本用法��结

作者：gator

目录�Q?
一、说�?
二、概�q?
三、数据类型及基本输入输出
四、流�E�控�?
五、��@�?
六、数�l?
七、函�?
八、文�?

一、说�?
本文多数内容是我��d��国��u《Fortran 95 �E�序设计》的�W�记。只��d���W�九章，主要�?~9
章，都是最基本的用法（原书�?6章）。这里主要摘录了我看书过�E�中�ȝ��的一些Fortran和C�?
同的地方�Q�主要是语法斚w��。希望这份笔记能够给学过C但没有接触过Fortran的同学带��M��些帮
助。要惛_��更清楚些�Q�推荐看一下原书，觉得作者真的写得很好，很清楚；如果有C语言的基����Q?
看完前九应该很快的，�׃��两天���p��了。觉得如果耐心看完本文�Q�基本功能应该也可以��利用�v
来了。外�Q�由于我之前没有用过Fortran�Q�这�ơ�ؓ了赶文档看书又看得很�_�浅�Q�大多数东西看过
之后都没得及仔细惻I��只是按着作者的意思去理解。所以这份笔记还处于�U怸�谈兵的层�ơ。如�?
有不妥的方，希望大家指正。谢谢！
文中蓝色的部分是�E�序代码�Q?span style="color: #ff0000">!后面的内容�ؓ注释�?

二、概�q?
1、名词解�?
Fortran=Formula Translator/Translation
一看就知道有什么特色了�Q�可以把接近数学语言的文本翻译成机械语言。的���，从一开�?
�Q�IBM设计的时候就是�ؓ了方便数��D�����和�U�学数据处理。设计强大的数组操作���是��Z��实现�q�一
目标。ortran奠定了高�U�语�a�发展的基���。现在Fortran在科研和机械斚w��应用很广�?

2、Fortran的主要版本及差别
按其发展历史�Q�Fortran�~�译器的版本其实很多。现在在�q�泛使用的是Fortran 77和Fortr
an90。ortran 90在Fortran 77基础上添加了不少使用的功能，�q�且改良�?7�~�程的版面格式，
所以编�E�时推荐使用90。鉴于很多现成的�E�序只有77版本�Q�有必要知道77的一些基本常识，臛_���?
证能够看77�E�序。以下是77�?0的一些格式上的区别�?
Fortran 77�Q?/em> 固定格式�Q�fixed format�Q�，�E�序代码扩展名：.f�?for
�Q?�Q�若某行以C,c�?开��_��则该行被当成注释�Q?
�Q?�Q�每行前六个字符不能写程序代码，可空着�Q�或�?~5字符以数字表明行代码�Q�用作格
式化输入出等�Q�；7~72为程序代码编写区�Q?3往后被忽略�Q?
�Q?�Q�太长的话可以箋行，所�l�行的第六个字符必须�?0"以外的�Q何字�W��?
Fortran 90�Q?/em>自由格式�Q�free format�Q�， 扩展名：.f90
�Q?�Q�以"!"引导注释�Q?
�Q?�Q�每行可132字符�Q�行代码攑֜�每行最前面�Q?
�Q?�Q�以&�l�行�Q�放在该行末或下行初�?
以下都是讨论Fortran 90�?

3、Fortran的一些特点，和C的一些不�? 其实很多�Q�在下面涉及具体斚w��时可以看到。这里只是大致提一些�? �Q?�Q�不分大��写 �Q?�Q�每句末��不必要写分�? �Q?�Q�程序代码命令间的空格没有意�? �Q?�Q�不像C�Q�Fortran不��用{ } �Q?�Q�数据类型多��Z��复数和逻辑判断�c�d��。比如复数类�? complex :: a !声明复数的方法。复数显然方便了�U�学计算�Q�满��了工程斚w��需�? a=(1.0,2.0) ! a=1+i �Q?�Q�多��Z��乘幂�q�算�Q?*�Q�。乘�q�除了整数还可以是实数�Ş式。如开方，开立方 a=4.0**0.5�Q�a=8.0**(1.0/3.0)�? �Q?�Q�数�l�有一些整体操作的功能�Q�可以方便的寚w��分元素进行操�? �Q?�Q�有些情况下可以声明大小待定的数�l�，很实用的功能

4、Fortran的基本程序结�? 先看一看所谓的"Hello Fortran"�E�序�? program main !�E�序开始，main是program的名字，完全自定�? write(*,*) "Hello" !�ȝ��? stop !�l�止�E�序 end [program[main]] !end用于��装代码�Q�表�C�Z��码编写完毕。[ ]中的内容可省略，下同�? 再看一�D�实用一些的�E�序�Q�好有点感性认识。程序用于计��圆��q��表面�U�，要求输入底面半径和。其中展�C�Z��Fortran的一些特色用法。程序摘自维基。其实是一个叫www.answers.com 的网上引的维基的�|�页。推荐去看看!能查��C��有意思的东西�? program cylinder !�l�主函数起个名字 ! Calculate the area of a cylinder. ! Declare variables and constants. ! constants=pi ! variables=radius squared and height implicit none ! Require all variables to be explicitly declared !�q�个一般都是要写上的。下面会�q�一步说明�? integer :: ierr character :: yn real :: radius, height, area real, parameter :: pi = 3.1415926536 !�q�是帔R��的声明方�? interactive_loop: do !do循环�Q�Fortran中的循环可以加标�{�，如d前面�? !interactive_loop��是标签

! Prompt the user for radius and height
! and read them.
write (*,*) 'Enter radius and height.' !屏幕输出
read (*,*,iostat=ierr) radius,height !键盘输入。isotat的值用判断输入成功否�?br /> ! If radius and height could not be read from input,
! then cycle through the loop.
if (ierr /= 0) then
write(*,*) 'Error, invalid input.'
cycle interactive_loop !cycle 相当于C里的continue
end if
! Compute area. The ** means "raise to a power."
area = 2 * pi * (radius**2 + radius*height) ! 指数�q�算比C方便
! Write the input variables (radius, height)
! and output (area) to the screen.
write (*,'(1x,a7,f6.2,5x,a7,f6.2,5x,a5,f6.2)') &

!"&"表示�l�行。这里还昄��了格式化输出 'radius=',radius,'height=',height,'area=',area yn = ' ' yn_loop: do !内嵌的另一个do循环 write(*,*) 'Perform another calculation? y[n]' read(*,'(a1)') yn if (yn=='y' .or. yn=='Y') exit yn_loop if (yn=='n' .or. yn=='N' .or. yn==' ') exit interactive_loop end do yn_loop !�l�束内嵌do循环 end do interactive_loop end program cylinder Fortran�E�序的主要结构就是这样了。一般还会有些module的部分在��d��数前�Q�函数在��d�� 数后�?

三、数据类型及基本输入输出 1、数据类型，声明及赋初�? �Q?�Q�integer�Q?短整型kind=2, 长整型kind=4 integer([kind=]2) :: a=3 如果声明成integer:: a�Q�则默认为长整型�? !"::" 在声明�ƈ同时赋初值时必须要写上；�c�d��名后面有形容词时也必��M��?:�Q�其他情况可略去 !所谓�Ş容词�Q�可以看一下这个。比如声明常�? real�Q�parameter :: pi=3.1415926 。parameter��是形容词�? �Q?�Q�real�Q�单�_�ֺ�kind=4�Q�默认）�Q�双�_�ֺ�kind=8 real([kind=]8) :: a=3.0 �q�有指数的�Ş式，�?E10为单�_�ֺ��Q?D10为双�_�ֺ� �Q?�Q�complex 单精度和双精�? complex([kind=]4) b �Q?�Q�character character([len=]10) c !len为最大长�? �Q?�Q�logical logical*2 :: d=.ture. (�{��h�?span style="color: #0000ff">logical(2)::d=.ture.) �Q?�Q�自定义�c�d��type�Q�类��g��C中的struct Fortran 77中给变量赋初值常用DATA命��o�Q�可同时�l�多个变量赋初�? data a,b,string /1, 2.0, 'fortran'/ 与C不同的是�Q�Fortran中变量不声明也能使用,��x��默认�c�d��Q�跟implicit命��o有关�Q�。按照默认的定，以i,j,k,l,m,n开头的变量被定义�ؓinteger,其余为real。取消该讄��需在程序声�? 部分之前implicit none。彭国��u��一般都使用该语句�? 另一点关于声明的不同是Fortran�?�{��h声明"�Q? integer a,b equivalence(a,b) 使得a,b使用同一块内存。这样可以节省内存；有时可精��代码。如�Q�equivalence(很长�? 字的变量如三�l�数�l�的某个元素�Q�a)�Q�之后��用a来编写程序就��z�多了�?

2、基本输入输�? 输入�Q?span style="color: #0000ff">read(*,*) a !从键盘读�? 输出�Q?span style="color: #0000ff">write(*,*) "text" !在屏�q�上输出。Fortran 77�? text'。Fortan 90中一�? "�? '都可 print *�Q?text" !只能用于屏幕输出 �Q?,*�Q�完整写为（unit=*,fmt=*�Q�。其中unit��?输出位置�Q�如屏幕�Q�文件等�Q�fmt�? 格式。如�q�两��w��写成*�Q�则按默认的方式�q�行�Q�即上面描述的。print后面�?表示按默认格式输出�?

四、流�E�控�? 1、运��符 �Q?�Q�逻辑�q�算�W? == /= > >= < <= !Fortran 90用法 .EQ. .NE. .GT. .GE. .LT. .LE. !Fortran 77用法 �Q?�Q�涉及相互关�pȝ��集合�q�算�W? .AND. .OR. .NOT. .EQV. .NEQV. ! �?NOT.�q�接一个表辑ּ��Q�其余左右两辚w��要有表达式（可以是logical�c�d��的变量） !.EQV.�Q�当两边逻辑�q�算值相同时为真�Q?.NEQV.�Q�当两边逻辑�q�算��g��同时为真

2、IF
(1) 基本 �Q?
if(逻辑判断�? then
……
end if
如果then后面只有一句，可写�?br /> if(逻辑判断�? …… !then和end if可省�?br /> (2) 多重判断�Q?br /> if�Q�条�?�Q?then
……
else if�Q�条�?�Q�then
……
else if �Q�条�?�Q�then
……
else
……
end if
(3) 嵌套�Q?br /> if(逻辑判断�? then
if(逻辑判断�? then
if(逻辑判断�? then
else if(逻辑判断�? then
……
else
……
end if
end if
end if
(4) ��术判断�Q?br /> program example
implicit none
real c
write (*,*) "input a number"
read (*,*) c
if(c) 10,20,30 !10,20�?0��代码,�Ҏ��c��于/�{�于/大于0�Q�执�?0/20/30行的�E?br /> 10 write (*,*) "A"
goto 40 !goto可实现蟩��C�Q意前面或后面的行代码处，但用多了破坏�E�序�l?br /> 20 write (*,*) "B"
goto 40
30 write (*,*) "C"
goto 40
40 stop
end

3、SELECT CASE
�c�M��于C的switch语句
select case(变量)
case�Q�数�?�Q?/span> ! 比如case(1:5)代表1<=变量<=5会执行该模块
…… !case�Q?�Q?�Q?�Q�代表变量等�?�?�?会执行该模块
case�Q�数�?�Q?/span> !括号中数值只能是integer,character或logical型常量，不能real�?br /> …
case default
……
end case

4、PAUSE, CONTINUE pause暂停�E�序执行�Q�按enter可��l�执�? continue貌似没什么用处，可用作封装程序的标志

五、��@�? 1、DO do counter=初�? �l��? �?减量 !counter的��g��初值到�l�值按�?减量变， …… !counter每取一个值对应着一�ơ��@环。增/减量不写则认�? …… …… !循环��M��也没有必要用{} …… end do Fortran 77中不是用end do来终止，而是下面�q�样子： do 循环最后一行的行代�? counter=初�? �l��? �?减量 …… 行代�? …… !�q�是do的最后一�?

2、DO WHILE do while(逻辑�q�算) …… …… end do �c�M��于C中的while(逻辑�q�算) {……}�? 一开始那个计��圆��p��面积的程序中�Q�应该也��是�q�一�c�R��不�q�它是通过内部的if语句�? 控制循。看来也是可以的�Q�不�q�在�q�本书上没看到这样写。其实应该也可以归于下面�q�种�?

3、没看到和C里面的do{……}while(逻辑�q�算); 相对应的循环语句�Q�不�q�可以这��P��保证
臛_��做一循环�Q?br /> do while(.ture.)
……
……
if(逻辑�q�算) exit !exit��好比C里面的break。C里的continue在Fortran里是cycle
end do

4、Fortran的一个特�Ԍ��带��v名的循环可以�q�样�Q�不易出错： outer: do i=1,3 inner: do j=1,3 …… end do inner end do outer �q�可以这��P��很方便： loop 1: do i=1,3 loop2: do j=1,3 if(i==3) exit loop1 !exit�l�止整个循环loop1 if(j==2) cycle loop2 !cycle跛_��loop2的本�ơ��@环，�q�行loop2的下�ơ��@�? write(*,*) i,j end do loop2 end do loop1 �q�有一些��@环主要用于Fortran中的数组�q�算�Q��ؓFortran�Ҏ��Q�很实用�?

六、数�l? 1、数�l�的声明和C不同的是�Q�Fortran中的数组元素的烦引值写在（�Q�内�Q�且高维的也只用一个（�Q�，�? integer a(5) !声明一个整型一�l�数�l? real :: b(3,6) !声明一个实型二�l�数�l? �c�d��可以是integer, real, character, logical或type。最高可以到7�l��? 数组大小必须为常数。但是和C语言不同�Q�Fortran也有办法使用大小可变的数�l�，�Ҏ��如： integer, allocatable :: a(:)

!声明��可变经�q�某个途径得知所需数组大小size之后�Q�用下面的语句： allocate(a(size)) !配置内存�I�间之后该数�l�和通过一般方法声明的数组完全相同�? 与C不同�Q�Fortran索引值默认�ؓ�?开始，而且可以在声明时改变该规则： integer a(-3:1) ! 索引��gؓ-3�Q?2�Q?1�Q?�Q? integer b(2:3,-1:3) !b(2~3,-1~3)为可使用的元�?

2、数�l�在内存中的存放和C不同�Q�Fortran中的数组比如a(2,2)在内存中存放��序为a(1,1),a(2,1),a(1,2),a(2,2 )。原则是放低�l�的元素�Q�再��N��l�的元素。此规则�U�Cؓcolumn major�?

3、赋初�? �Q?�Q�最普通的做法�Q? integer a(5) data a /1,2,3,4,5/ �?span style="color: #0000ff">integer :: a(5)=(/1,2,3,4,5/) �?span style="color: #0000ff">integer :: a(5)=5�Q�则5个元素均�? 对于integer :: a(2,2)=(/1,2,3,4/) �Ҏ��数组元素在内存中存放的方式，�{��h于赋�?span style="color: #0000ff">a(1,1)=1,a(2,1)=2,a(1,2)=3,a(2,2)=4 �Q?�Q�利用Fortran的特�Ԍ��隐含式��@环。看例子��明白了�? integer a(5) integer i data (a(i),i=2,4)/2,3,4/ !(a(i),i=2,4)表示i�?�?循环�Q�增量�ؓ默认�? �q�可以这��P�� integer i integer :: a(5)=(/1,(2,i=2,4),5/) !五个元素分别赋��gؓ1�Q?�Q?�Q?�Q? integer :: b(5)=(/i, i=1,5/) !五个元素分别赋��gؓ1�Q?�Q?�Q?�Q? �q�可以嵌�? data ((a(i,j),i=1,2),j=1,2)=/1,2,3,4/ !a(1,1)=1,1(2,1)=2,a(1,2)=3,a(2,2)=4

4、操作整个数�l? 设a�Q�b为相同类型、维数和大小的数�l? a=5 !所有元素赋��gؓ5 a=(/1,2,3/) !�q�里假设a��Z��l�_��a(1)=1,a(2)=2,a(3)=3 a=b !对应元素赋��|��要求a,b,c�l�数和大��相同，下同 a=b+c a=b-c a=b*c a=b/c a=sin(b) !内部函数都可以这��L��

5、操作部分数�l�元�? a��Z��l�数�l? a(3:5)=(/3,4,5/) !a(3)=3,a(4)=4,a(5)=5 a(1:5:2)=3 !a(1)=3,a(3)=3,a(5)=3 a(3:)=5 !a(3)以及之后的所有元素赋��gؓ5 a(1:3)=b(4:6) !�c�M��于这�U�的要求左右数组元素个数相同 a(:)=b(:,2) !a(1)=b(1,2),a(2)=b(2,2)�Q�以此类�?

6、WHERE where形式上类��g��if�Q�但只用于设�|�数�l�。设有两个同��L��型、维数和大小的数�l�a,b where(a<3) b=a !a中小�?的元素赋值给b对应位置的元�? end where 再如�Q?span style="color: #0000ff">where(a(1:3)/=0) c=a !略去了end where,因�ؓ只跟了一行where可嵌�Q�也 !可类似do循环有��v名标�{��?

7、FORALL 有点像C中的for循环�Q? forall(triplet1[,triplet2 [,triplet3…]],mask) 其中triplet形如i=2�Q?�Q?�Q�表�C��@环，最后一个数字省略则增量�? 例如�Q? forall(i=1:5,j=1:5,a(i,j)<10) a(i,j)=1 end forall 又如�Q?span style="color: #0000ff"> forall(i=1:5,j=1:5,a(i,j)/=0) a(i,j)=1/a(i,j) forall也可以嵌套��用，好比C中for循环的嵌套�?

七、函�? Fortran中函数分两类�Q�子�E�序�Q�subroutine�Q�和自定义函敎ͼ�function�Q�。自定义函数�? 质上��是学上的函敎ͼ�一般要传递自变量�l�自定义函数�Q�返回函数倹{��子�E�序不一定是�q�样�Q�可以没有返倹{��传递参数要注意�c�d��的对应，�q�跟C是一��L��? 1、子�E�序目的�Q�把某一�D늻��怋�用的有特定功能的�E�序独立出来�Q�可以方便调用�? 习惯上一般都把子�E�序攑֜��ȝ��序结束之后�? 形式�Q? subroutine name (parameter1, parameter2) !�l�子�E�序起一个有意义的名字。可以传递参敎ͼ��q�样可以有返回倹{��括号内也可�? �I�着�Q�代不传递参数�? implicit none integer:: parameter1, parameter2 !需要定义一下接收参数的�c�d��? …… !接下来的�E�序�~�写跟主�E�序没有��M��别�? …… mreturn !跟C不同�Q�这里表�C�子�E�序执行后回到调用它的地方��l�执行下面的�E�序。不一定放

!在最后。可以放在子�E�序的其他位�|�，作用相同�Q�子�E�序中return之后的部分不执行�? end [subroutine name] 调用�Q��用call命��o直接使用�Q�不需要声明。在调用处写�Q? call subroutine name(parameter1,parameter2) 注意点： a.子程序之间也可相互调用。直接调用就是了�Q�像在主�E�序中调用子�E�序一栗��? b.传递参数的原理和C中不同。Fortran里是传址调用(call by address/reference)�Q�就�? 传递时用参数和子程序中接收时用的参��C��用同一个地址�Q�尽��命名可以不同。这样如果子�E�序的执行改子程序中接收参数的��|��所传递的参数也相应发生变化�? c.子程序各自内部定义的变量��h��独立性，�c�M��于C。各自的行代码也��h��独立性。因此各个子�E�序�ȝ��序中有相同的变量名、行代码��P��q�不会相互媄响�?

2、自定义函数和子�E�序的明显不同在于：需要在�ȝ��序中声明之后才能使用。调用方式也有差别。另�? 按照惯例用函��C��L��变自变量的倹{��如果要改变传递参数的��|��习惯上用子程序来做�? 声明方式�Q?span style="color: #0000ff">real, external :: function_name 一般自定义函数也是攑֜��ȝ��序之后�? 形式�Q? function function_name(parameter1, parameter2) implicit none real:: parameter1, parameter2 !声明函数参数�c�d��Q�这是必需�? real::function_name !声明函数�q�回值类型，�q�是必需�? …… …… function_name=…. !�q�回值的表达�? return end 也可以这��L��接声明返回值类型，��z�些�Q? real function function_name(parameter1, parameter2) implicit none real:: parameter1, parameter2 !�q�个�q�是必需�? …… …… function_name=…. !�q�回��D��辑ּ� return end 调用�Q?span style="color: #0000ff">function_name(parameter1,parameter2) 不需要call命��o�? 自定义函数可以相互调用。调用时也需要事先声明�? ��M��Q�调用自定义函数前需要做声明�Q�调用子�E�序则不需要�?

3、关于函��C��的变�? �Q?�Q�注意类型的对应。Fortran中甚臛_��以传递数值常量，但只有跟函数定义的参数类�? 对应才会到想要的�l�果。如call ShowReal(1.0)��必��ȝ��1.0而不�?�? �Q?�Q�传递数�l�参敎ͼ�也跟C一��h��传地址�Q�不�q�不一定是数组首地址�Q�而可以是数组某个指定元素地址。比如有数组a(5)�Q�调用call function(a)则传递a(1)的地址�Q�调用call functio n(a(3))则递a(3)的地址�? �Q?�Q�多�l�数�l�作为函数参敎ͼ�跟C相反的是�Q�最后一�l�的大小可以不写�Q�其他维大小必须写。这决于Fortran中数�l�元素column major的存放方式�? �Q?�Q�在函数中，如果数组是接收用的参敎ͼ�则在声明时可以用变量赋值它的大��，甚至�? 以不指定��。例如： subroutine Array(num,size) implicit none integer:: size integer num(size) !可以定义一个数�l�，其大��是通过传递过来的参数军_��的。这很实�? …… …… return end �Q?�Q�save命��o�Q�将函数中的变量值在调用之后保留下来�Q�下�ơ调用此函数时该变量的值就是上�ơ保的倹{��只要在定义时加上save��p��Q? integer, save :: a=1 �Q?�Q�传递函敎ͼ�包括自定义函数、库函数、子�E�序都是可以的）。类��g��C中的函数指针需要在 �ȝ��序和调用函数的函��C��都声明作为参��C��递的函数。如 real, external :: function !自定义函�? real, intrinsic :: sin !库函�? external sub !子程�? �Q?�Q�函��C��用接口（interface�Q�：一�D늨�序模块。以下情况必需�Q? a.函数�q�回��gؓ数组 b.指定参数位置来传递参数时 c.所调用的函数参��C��C��固定 d.输入指标参数�? e.函数�q�回��gؓ指针时�? 具体用法�l�合例子�Ҏ��看懂。例子都很长。看书吧�?

4、全局变量
功能��׃��用说了。原理：�Ҏ��声明时的相对位置关系而取用，不同与C中根据变量名使用�?br /> 如果在主�E�序中定义：
integer :: a,b
common a,b !��是�q�样定义全局变量�?br /> 在子�E�序或自定义函数中定义：
integer :: c,d
common c,d
则a和c��q��相同内存�Q�b和d��q��相同内存�?br /> 全局变量太多时会很麻烦。可以把它们��Zؓ归类�Q�只需在定义时在common后面加上区间�?br /> 。如
common /groupe1/ a, common /group2/ b。这样��用时��׃��必把所有全局变量
都列出来�Q�再声明common /groupe1/ c��可以用a、c全局变量了�?br /> 可以使用block data�E�序模块。在�ȝ��序和函数中不能直接��用前面提到的data命��o�l�全
局变量赋初倹{��可以给它们各自赋初��|��如果要��用data命��o必须要这��P��
block data [name]
implicit none
integer a,b,c
real d,e
common a b c
common /group1/ d,e
data a,b,c,d,e /1,2,3,4.0,5.0/
end [block data [name]]

5、Module Module不是函数。它用于��装�E�序模块�Q�一般是把具有相兛_��能的函数及变量封装在一�? 。用法很单，但能提供很多方便�Q��ɽE�序变得��z�，比如使用全局变量不必每次都声明一长串�Q? 写在odule里调用就行了。Module一般写在主�E�序开始之前�? 形式�Q?span style="color: #0000ff"> module module_name …… …… end [module [module_name]] 使用�Q�在�ȝ��序或函数中��用时�Q�需要在声明之前先写上一行：use module_name. Module中有函数时必��d��contains命��o之后�Q�即在某一行写上contains然后�? 面开始写敎ͼ�多所有函数都写在�q�个contains之后�Q�。�ƈ且module中定义过的变量在module里的函数中可直接使用�Q�函��C��间也可以直接�怺�调用�Q�连module中的自定义函数在被调用时也不�? 先声明�?

6、include攑֜�需要的��M��地方�Q�插入另外的文�g(必须在同一目录�?。如�Q?br /> include 'funcion.f90'

八、文�? 1、文本文�? Fortran里有两种��d��文�g的方式，对应于两�U�文�? ��序��d��Q�用于文本文�? 直接��d��Q�用于二�q�制文�g �q�里只摘录关于文本文件的��d��。一般模式如下�? character(len=20)::filenamein="in.txt", filenameout="out.txt" !文�g�? logical alive integer::fileidin=10,fileidout=20 !10�Q?0是给文�g�~�的��P��?�Q?�Q?�Q?的正整数都可�Q�因�?�?是默认的输出位置�Q�屏�q? �Q�，1�?是默认的输入位置�Q�键盘） integer::error real::in,out !下面�q�一�D는�于确认指定名字的文�g是否存在 inquire(file=filenamein, exist=alive) !如果存在�Q�alive赋��gؓ0 if(.NOT. alive) then write(*,*) trim(filenamein), " doesn't exist."!trim用于删去filenamein中字�? !后面的stop多余�I�格�Q�输出时好看�? end if open([unit=]fileidin, file=filenamein, status="old") open([unit=]fileidout,file=filenameout[,status="new"]) !unit指定输入/输出的位�|�。打开已有文�g一定要用status="old"�Q�打开新文件用status="new"�Q? !不指定status�Q�则默认status="unknown"�Q�覆盖已有文件或打开新文�?#8230;… read([unit=]fileidin, [fmt=]100,iostat=error )in !error=0表示正确��d��数据�? 100 format(1X,F6.3) !按一定格式输入输出，格式可以另外写�ƈ指定行代码，也可以直接写在read/write�? write(([unit=]fileidout, "(1X,F6.3)")out close(fileidin) close(fileidout) !1X代表一个空根{��F6.3代表real型数据用�?个字�W�（含小数点�Q�，其中��数点后三位�? !常用的还有I3�Q�用于整型数据，共占三个字符�Q�A8�Q�字�W�型�Q�占8个字�W�。换行用 / 二进制文件的��d��有所不同。不再列举�?

2、内部文�? 另一个很实用的读写功能是内部文�g�Q�internal file�Q�。看看这个例子就明白了�? integer::a=1,b=2 character(len=20)::string write(unit=string,fmt="(I2,'+',I2,'=',I2)")a,b,a+b write(*,*)string 则结果输�?+2=3。反�q�来也是可以的： integer a character(len=20)::string="123" read(string,*)a write(*,*)a 则输�?23�?

!全文�l�束�?

ZelluX 2007-12-16 21:03 发表评论

Sampling

ZelluX — Fri, 14 Dec 2007 05:42:00 GMT

CAL样例�E�序里面出现很多sample指��o�Q�google到的��单介�l�：

Antialias �Q�抗锯��Q?/span>

虽然减小像素的大��可以��囑փ�可以更加�_��Q�一定程度上减轻了锯齿，但是只要像素的大��大到可以互相彼此区分，那么锯��的��生是不可避免的！抗锯齿的�Ҏ��一般是多点�Q�注意此处是“点”而不是“像素”，后面可以看出它们间的区别�Q�采栗��?/span>

一�?span style="FONT: 7pt 'Times New Roman'">        理论与方法：

1 �Q?/span> Oversampling �Q�重复取��P��Q?/span>

�Q?/span> 1 �Q�方法：

　首先�Q�将场景以比你的昄��器（前缓�Ԍ��更高分��L率进行渲染：

假设当前的（�?/span> / 后缓�Ԍ��的分辨率�?/span> 800 × 600 �Q�那么可以先��场景渲染到 1600 × 1200 的渲染目标上�Q�纹理）�Q?/span>

　然后�Q�从高分辨率的渲染目标得��C��分��L率的场景渲染�l�果�Q?/span>

      此时取每 2 × 2 个像素块颜色的��^均��gؓ最�l�渲染的像素颜色倹{�?/span>

�Q?/span> 2 �Q�优点：可以显著地改善锯齿导致的��q��?/span>

�Q?/span> 3 �Q�缺点：需要更大的�~�冲�Q�同时填充缓冲导致性能消耗变大；

           �q�行多个像素的取��P��D��性能下降�Q?/span>

           �׃��以上�~�点�Q?/span> D3D �q�没有采用这�U�抗锯��Ҏ��?/span>

2 �Q?/span> Multisampling �Q�多取样�Q�：

�Q?/span> 1 �Q�方法：

只需要对像素�q�行一�ơ取��P��而是在每个像素中�?/span> N 个点�Q�取决于具体的取��h��型）�Q?strong style="mso-bidi-font-weight: normal">该像素的最�l�颜�?/span> = 该像素原先的颜色 * 　多边形覆盖的�Ҏ��　 / 　�ȝ��取样�Ҏ��Q?/span>

�Q?/span> 2 �Q�优点：可以改善锯��带来的失真的同时而不会增加取��h��敎ͼ�同时比�v Oversampling 它也不需要更大的后备�~�冲�?/span>

�Q?/span> 3 �Q�缺点：原本当一个多边�Ş覆盖了一个像素的中心�Ҏ��Q�该像素的颜色才会由该多边�Ş军_��Q�在像素��线阶段典型的就是寻址到合适的�U�理颜色与顶点管�U�输出的颜色�q�行调制�Q�，但是 Multisampling 中，如果该多边�Ş覆盖了其中一部分取样点却未覆盖像素中心点�Q�该像素颜色仍然由此多边形决定。如此一来，�U�理��d��可能出现错误�Q�这对于�U�理集（ atlas �Q�会出现另一�U�失真效果：多边形边�~�颜色错误！

3 �Q?/span> Centriod Sampling �Q�质心采��P��Q?/span>

�Q?/span> 1 �Q�方法：

     ��Z��解决在��?/span> Multisampling ��D��的在�U�理集中�q�行�U�理��d��带来的错误，不再采用像素中心的颜色作为�?strong style="mso-bidi-font-weight: normal">该像素原先的颜色”，而是用�?strong style="mso-bidi-font-weight: normal">该像素中被多边�Ş覆盖的那些取��L��的中心点的颜�?/span>”。这样就保证了被渲染的像素点始终是多边�Ş的内部（也就是说�U�理地址不会��出多边形的范围�Q��?/span>

�Q?/span> 2 �Q�如何��用：

         ①�Q何有COLOR语义作�ؓ输入�?span lang="EN-US">Pixel Shader会自动运用质心采��P��

     ②在Pixel Shader的输入参数的语义后中手动加入 _centroid 扩展�Q�例如：

   float4 TexturePointCentroidPS( float4 TexCoord : TEXCOORD0_centroid ) : COLOR0

{

return tex2D( PointSampler, TexCoord );

}

�Q?/span> 3 �Q�注意：

    质心采样主要用于采用�U�理集的 Multisampling �Q�对于一整张�U�理对应一个的多边形网格的情况�Q�采用质心采样反而会��D��错误�Q?/span>

ZelluX 2007-12-14 13:42 发表评论

Inter-Procedural Analysis 相关的资�?(3)

ZelluX — Tue, 27 Nov 2007 07:24:00 GMT
ORC (Open Research Compiler) 的一个讲座，里面有不��IPA的内�?br />http://m.tkk7.com/Files/zellux/ORC-PACT02-tutorial.rar

然后貌似龙书�W�二版里也讲了大量的IPA优化和call graph斚w��的东西，啃啊�?

ZelluX 2007-11-27 15:24 发表评论

Inter-Procedural Analysis 相关的资�?(2)

ZelluX — Mon, 26 Nov 2007 04:53:00 GMT
University of Houston, Computer Science Department, High Performance Computing Tools Group的一��论文：
Overview of the Open64 Compiler Infrastructure
VI.4. Interprocedural Analysis
Interprocedural Analysis (IPA) is performed in the following phases of Open64:
• Inliner phase
• IPA local summary phase
• IPA analysis phase
• IPA optimization phase
• IPA miscellaneous
By default the IPA does the function inlining in the inliner facility. The local summary phase is done in the IPL module and the analysis phase and optimization phase in the ipa-link module.
During the analysis phase, it does the following:
• IPA_Padding Analysis (common blocks Padding/Split Analysis)
• Construction of the Callgraph
Then it does space and multigot partitioning of the Callgraph. The partitioning algorithm takes into account whether it is doing partitioning for solving space or the multigot problem.
During the optimization phase the following phases are performed:
• IPA Global Variable Optimization
• IPA Dead function elimination
• IPA Interprocedural Alias Analysis
• IPA Cloning Analysis (It propagates information about formal parameters used as symbolic terms in array section summaries. This information is later used to trigger cloning.
• IPA Interprocedural Constant propagation
• IPA Array_Section Analysis
• IPA Inlining Analysis
• Array section summaries arrays for the Dependence Analyzer of the Loop Nest Optimizer.

ZelluX 2007-11-26 12:53 发表评论

Inter-Procedural Analysis 相关的资�?(1)

ZelluX — Sun, 25 Nov 2007 15:04:00 GMT

�H�然要做一个相关的�~�译优化��目�Q�先放一点国外网的IPA的资料上来，教育�|�出国不方便

GCC wiki:

Analysis and optimizations that work on more than one procedure at a time. This is usually done by making walking the Strongly Connected Components of the call graph, and performing some analysis and optimization across some set of procedures (be it the whole program, or just a subset) at once.

GCC has had a callgraph for a few versions now (since GCC 3.4 in the FSF releases), but the procedures didn't have control flow graphs (CFGs) built. The tree-profiling-branch in GCC CVS now has a CFG for every procedure built and accessible from the callgraph, as well as a basic IPA pass manager. It also contains in-progress interprocedural optimizations and analyses: interprocedural constant propagation (with cloning for specialization) and interprocedural type escape analysis.

IBM的XL Fortran V10.1 for Linux:

Benefits of interprocedural analysis (IPA)

Interprocedural Analysis (IPA) can analyze and optimize your application as a whole, rather than on a file-by-file basis. Run during the link step of an application build, the entire application, including linked libraries, is available for interprocedural analysis. This whole program analysis opens your application to a powerful set of transformations available only when more than one file or compilation unit is accessible. IPA optimizations are also effective on mixed language applications.

Figure 2. IPA at the link step
The following are some of the link-time transformations that IPA can use to restructure and optimize your application:

Inlining between compilation units

Complex data flow analyses across subprogram calls to eliminate parameters or propagate constants directly into called subprograms.

Improving parameter usage analysis, or replacing external subprogram calls to system libraries with more efficient inline code.

Restructuring data structures to maximize access locality.

In order to maximize IPA link-time optimization, you must use IPA at both the compile and link step. Objects you do not compile with IPA can only provide minimal information to the optimizer, and receive minimal benefit. However when IPA is active on the compile step, the resulting object file contains program information that IPA can read during the link step. The program information is invisible to the system linker, and you can still use the object file and link without invoking IPA. The IPA optimizations use hidden information to reconstruct the original compilation and can completely analyze the subprograms the object contains in the context of their actual usage in your application.

During the link step, IPA restructures your application, partitioning it into distinct logical code units. After IPA optimizations are complete, IPA applies the same low-level compilation-unit transformations as the -O2 and -O3 base optimizations levels. Following those transformations, the compiler creates one or more object files and linking occurs with the necessary libraries through the system linker.

It is important that you specify a set of compilation options as consistent as possible when compiling and linking your application. This includes all compiler options, not just -qipa suboptions. When possible, specify identical options on all compilations and repeat the same options on the IPA link step. Incompatible or conflicting options that you specify to create object files, or link-time options in conflict with compile-time options can reduce the effectiveness of IPA optimizations.

Using IPA on the compile step only

IPA can still perform transformations if you do not specify IPA on the link step. Using IPA on the compile step initiates optimizations that can improve performance for an individual object file even if you do not link the object file using IPA. The primary focus of IPA is link-step optimization, but using IPA only on the compile-step can still be beneficial to your application without incurring the costs of link-time IPA.

Figure 3. IPA at the compile step

IPA Levels and other IPA suboptions

You can control many IPA optimization functions using the -qipa option and suboptions. The most important part of the IPA optimization process is the level at which IPA optimization occurs. Default compilation does not invoke IPA. If you specify -qipa without a level, or specify -O4, IPA optimizations are at level one. If you specify -O5, IPA optimizations are at level two.

Table 5. The levels of IPA

IPA Level Behaviors

qipa=level=0

Automatically recognizes standard library functions

Localizes statically bound variables and procedures

Organizes and partitions your code according to call affinity, expanding the scope of the -O2 and -O3 low-level compilation unit optimizer

Lowers compilation time in comparison to higher levels, though limits analysis

qipa=level=1

Level 0 optimizations

Performs procedure inlining across compilation units

Organizes and partitions static data according to reference affinity

qipa=level=2

Level 0 and level 1 optimizations

Performs whole program alias analysis which removes ambiguity between pointer references and calls, while refining call side effect information

Propagates interprocedural constants

Eliminates dead code

Performs pointer analysis

Performs procedure cloning

Optimizes intraprocedural operations, using specifically:
Value numbering
Code propagation and simplification
Code motion, into conditions and out of loops
Redundancy elimination techniques

IPA includes many suboptions that can help you guide IPA to perform optimizations important to the particular characteristics of your application. Among the most relevant to providing information on your application are:

lowfreq which allows you to specify a list of procedures that are likely to be called infrequently during the course of a typical program run. Performance can increase because optimization transformations will not focus on these procedures.

partition which allows you to specify the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize.

threads which allows you to specify the number of parallel threads available to IPA optimizations. This can provide an increase in compilation-time performance on multi-processor systems.

clonearch which allows you to instruct the compiler to generate duplicate subprograms with each tuned to a particular architecture.

Using IPA across the XL compiler family

The XL compiler family shares optimization technology. Object files you create using IPA on the compile step with the XL C, C++, and Fortran compilers can undergo IPA analysis during the link step. Where program analysis shows that objects were built with compatible options, such as -qnostrict, IPA can perform transformations such as inlining C functions into Fortran code, or propagating C++ constant data into C function calls.

ZelluX 2007-11-25 23:04 发表评论

IPA Level	Behaviors
qipa=level=0	Automatically recognizes standard library functions Localizes statically bound variables and procedures Organizes and partitions your code according to call affinity, expanding the scope of the -O2 and -O3 low-level compilation unit optimizer Lowers compilation time in comparison to higher levels, though limits analysis
qipa=level=1	Level 0 optimizations Performs procedure inlining across compilation units Organizes and partitions static data according to reference affinity
qipa=level=2	Level 0 and level 1 optimizations Performs whole program alias analysis which removes ambiguity between pointer references and calls, while refining call side effect information Propagates interprocedural constants Eliminates dead code Performs pointer analysis Performs procedure cloning Optimizes intraprocedural operations, using specifically: Value numbering Code propagation and simplification Code motion, into conditions and out of loops Redundancy elimination techniques

亚洲视屏在线观看,一本色道久久88综合亚洲精品高清 ,亚洲视频在线视频

�_�读paper - Application-Level Isolation and Recovery with Solitude

最�q�读的两���paper

阅读�W�记 - SubVirt: Implementing malware with virtual machines (2)

阅读�W�记 - SubVirt: Implementing malware with virtual machines (1)

Streamware ppt

Weekly Report

DEBUG 记录 - SPEC2006 470.lbm

阅读�W�记

GP-GPU 阅读�W�记 (5)

GP-GPU 阅读�W�记 (4)

GP-GPU 阅读�W�记 (3)

GP-GPU 阅读�W�记 (2)

GP-GPU 阅读�W�记 (1)

vectorization

Fortran导引

我的Fortran基本用法���结

Sampling

Inter-Procedural Analysis 相关的资�?(3)

Inter-Procedural Analysis 相关的资�?(2)

Inter-Procedural Analysis 相关的资�?(1)

Benefits of interprocedural analysis (IPA)

Using IPA on the compile step only

IPA Levels and other IPA suboptions

Using IPA across the XL compiler family

最�q�读的两��paper

我的Fortran基本用法��结