??xml version="1.0" encoding="utf-8" standalone="yes"?>亚洲视屏在线观看,一本色道久久88综合亚洲精品高清 ,亚洲视频在线视频http://m.tkk7.com/zellux/category/29378.htmlq的大二:M 思?{待zh-cnWed, 28 May 2008 07:25:22 GMTWed, 28 May 2008 07:25:22 GMT60_读paper - Application-Level Isolation and Recovery with Solitudehttp://m.tkk7.com/zellux/archive/2008/05/28/203510.htmlZelluXZelluXWed, 28 May 2008 07:23:00 GMThttp://m.tkk7.com/zellux/archive/2008/05/28/203510.htmlhttp://m.tkk7.com/zellux/comments/203510.htmlhttp://m.tkk7.com/zellux/archive/2008/05/28/203510.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/203510.htmlhttp://m.tkk7.com/zellux/services/trackbacks/203510.html
我的presentationQ?br />http://docs.google.com/Presentation?id=dcjk4xx7_473cv5ddgc8

Z旉考虑没有提到paper中进E间通信的解x?br />

ZelluX 2008-05-28 15:23 发表评论
]]>
最q读的两paperhttp://m.tkk7.com/zellux/archive/2008/05/20/201737.htmlZelluXZelluXTue, 20 May 2008 12:18:00 GMThttp://m.tkk7.com/zellux/archive/2008/05/20/201737.htmlhttp://m.tkk7.com/zellux/comments/201737.htmlhttp://m.tkk7.com/zellux/archive/2008/05/20/201737.html#Feedback1http://m.tkk7.com/zellux/comments/commentRss/201737.htmlhttp://m.tkk7.com/zellux/services/trackbacks/201737.html阅读全文

ZelluX 2008-05-20 20:18 发表评论
]]>
阅读W记 - SubVirt: Implementing malware with virtual machines (2)http://m.tkk7.com/zellux/archive/2008/05/06/198693.htmlZelluXZelluXTue, 06 May 2008 06:35:00 GMThttp://m.tkk7.com/zellux/archive/2008/05/06/198693.htmlhttp://m.tkk7.com/zellux/comments/198693.htmlhttp://m.tkk7.com/zellux/archive/2008/05/06/198693.html#Feedback1http://m.tkk7.com/zellux/comments/commentRss/198693.htmlhttp://m.tkk7.com/zellux/services/trackbacks/198693.html阅读全文

ZelluX 2008-05-06 14:35 发表评论
]]>
阅读W记 - SubVirt: Implementing malware with virtual machines (1)http://m.tkk7.com/zellux/archive/2008/05/05/198564.htmlZelluXZelluXMon, 05 May 2008 13:53:00 GMThttp://m.tkk7.com/zellux/archive/2008/05/05/198564.htmlhttp://m.tkk7.com/zellux/comments/198564.htmlhttp://m.tkk7.com/zellux/archive/2008/05/05/198564.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/198564.htmlhttp://m.tkk7.com/zellux/services/trackbacks/198564.html阅读全文

ZelluX 2008-05-05 21:53 发表评论
]]>
Streamware ppthttp://m.tkk7.com/zellux/archive/2008/04/16/193446.htmlZelluXZelluXWed, 16 Apr 2008 06:57:00 GMThttp://m.tkk7.com/zellux/archive/2008/04/16/193446.htmlhttp://m.tkk7.com/zellux/comments/193446.htmlhttp://m.tkk7.com/zellux/archive/2008/04/16/193446.html#Feedback1http://m.tkk7.com/zellux/comments/commentRss/193446.htmlhttp://m.tkk7.com/zellux/services/trackbacks/193446.htmlȝ把晚上要讲的ppt做出来了Q囧



ZelluX 2008-04-16 14:57 发表评论
]]>
Weekly Reporthttp://m.tkk7.com/zellux/archive/2008/03/26/188797.htmlZelluXZelluXWed, 26 Mar 2008 09:01:00 GMThttp://m.tkk7.com/zellux/archive/2008/03/26/188797.htmlhttp://m.tkk7.com/zellux/comments/188797.htmlhttp://m.tkk7.com/zellux/archive/2008/03/26/188797.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/188797.htmlhttp://m.tkk7.com/zellux/services/trackbacks/188797.html



ZelluX 2008-03-26 17:01 发表评论
]]>
DEBUG 记录 - SPEC2006 470.lbmhttp://m.tkk7.com/zellux/archive/2008/03/24/188352.htmlZelluXZelluXMon, 24 Mar 2008 13:16:00 GMThttp://m.tkk7.com/zellux/archive/2008/03/24/188352.htmlhttp://m.tkk7.com/zellux/comments/188352.htmlhttp://m.tkk7.com/zellux/archive/2008/03/24/188352.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/188352.htmlhttp://m.tkk7.com/zellux/services/trackbacks/188352.html转这个程序实在是太耗体力了 -_-b

Brook本n的不缺陗bugQ加上不习惯U学计算E序的代码风|D大多数时间都在fix bug?br />
其中de掉以后最有快感的一个bugQ(只能q么形容?>,<Q?br />
每个cell都有一个flag|管cd是doubleQ但是程序中是用一个MAGIC_CAST宏把它当作整型处理的?br />初始情况Q每个cell的flag都ؓ~fQ也是一?~28位都?Q?9~32位ؓ0的double型Q炏V根据IEEE标准Q应该是个NaN?br />CPU上没有问题,攑ֈGPU上问题就出来了,GPU不支持这U{型操作,在对q个double型进行运操作的时候,所有结果都会变成NaN?br />
解决ҎQ?br />在把数据传给GPU之前可以先把q些flagD{换ؓGPU可以操作的double型,最单的Ҏ是都先转成intQ会有truncatingQ,然后取反Q再传给GPU?br />

what_you_compute.png


ZelluX 2008-03-24 21:16 发表评论
]]>
阅读W记http://m.tkk7.com/zellux/archive/2008/03/15/186488.htmlZelluXZelluXSat, 15 Mar 2008 06:46:00 GMThttp://m.tkk7.com/zellux/archive/2008/03/15/186488.htmlhttp://m.tkk7.com/zellux/comments/186488.htmlhttp://m.tkk7.com/zellux/archive/2008/03/15/186488.html#Feedback4http://m.tkk7.com/zellux/comments/commentRss/186488.htmlhttp://m.tkk7.com/zellux/services/trackbacks/186488.html阅读全文

ZelluX 2008-03-15 14:46 发表评论
]]>
GP-GPU 阅读W记 (5)http://m.tkk7.com/zellux/archive/2008/02/15/180111.htmlZelluXZelluXFri, 15 Feb 2008 11:51:00 GMThttp://m.tkk7.com/zellux/archive/2008/02/15/180111.htmlhttp://m.tkk7.com/zellux/comments/180111.htmlhttp://m.tkk7.com/zellux/archive/2008/02/15/180111.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/180111.htmlhttp://m.tkk7.com/zellux/services/trackbacks/180111.htmlMars: A MapReduce Framework on Graphics Processors
by Bingsheng He @ Hong Kong Univ. of Sci. & Tech. 
    Nage K. Govindaraju @ Microsoft Corp.
    Qiong Luo, Tuyong Wang @ Sina Corp.

一些重Ҏ讎ͼ
1. Introduction
Three challenges in implementing the MapReduce framework on the GPU:
First, the synchronization overhead in the run-time system of the framework must be low.
Second, a fine-grained load balancing scheme is required.
Third, the core tasks of MapReduce, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.
Each thread is responsible for a Map or a Reduce task with a small number of key/value pairs as input.
Performance improvement: 1.5-16 times

2. Priliminary and Related Work
2.1. Graphics Processors
It is desirable to schedule the tasks between the CPU and the GPU to fully exploit their computation power.
Given a kernel program, the occupancy of the GPU is the ratio of active schedule units to the maximum number of schedule units supported on the GPU.
The GPU has a hardware feature called coalesced access to exploit the spatial locality of memory accesses among threads.

2.2. GPGPU
2.3. MapReduce
Map: (k1, v1) -> (k2, v2)*
Reduce: (k2, v2*) -> v3*

3. Design and Immplementation
3.1. Design Goals
3.2. System Workflow and Configuration
3.3. APIs
3.4. Implementation Techniques
Based on this compilation information and the total computation resources on the GPU, we set the number of threads per thread group and the number of thread groups to achieve a high occupancy at run time.

4. Evaluation
4.1. Experimental Setup




ZelluX 2008-02-15 19:51 发表评论
]]>
GP-GPU 阅读W记 (4)http://m.tkk7.com/zellux/archive/2008/02/10/179557.htmlZelluXZelluXSun, 10 Feb 2008 08:13:00 GMThttp://m.tkk7.com/zellux/archive/2008/02/10/179557.htmlhttp://m.tkk7.com/zellux/comments/179557.htmlhttp://m.tkk7.com/zellux/archive/2008/02/10/179557.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/179557.htmlhttp://m.tkk7.com/zellux/services/trackbacks/179557.html4.2. Data Structures

The GPU Memory Model
通常使用二维的texture保存Q一是因Zltexture能存攄东西很少Q二是因为现在的GPU很难高效地写入一?ltexture?br />Iteration
stream~程模型包含了一U隐式的的q行遍历。   ?br />Generalized Arrays via Address Translation
在GPGPU~程中主要用的数据l构是随问的多位容器Q包括稀?E密数组{。每个结构定义了一个虚拟域virual grid domain和一个物理域physical grid domaiinQ以及之间相互{换的address translator?br />
4.2.1. Dense Arrays
多维数组通常先映到一l_然后再到二维?br />4.2.2. Sparse Arrays
Ҏ非零元素的位|和数量是否变化分两U,静态和动态?br />4.2.3. Adaptive Structures



ZelluX 2008-02-10 16:13 发表评论
]]>
GP-GPU 阅读W记 (3)http://m.tkk7.com/zellux/archive/2008/02/09/179495.htmlZelluXZelluXSat, 09 Feb 2008 05:14:00 GMThttp://m.tkk7.com/zellux/archive/2008/02/09/179495.htmlhttp://m.tkk7.com/zellux/comments/179495.htmlhttp://m.tkk7.com/zellux/archive/2008/02/09/179495.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/179495.htmlhttp://m.tkk7.com/zellux/services/trackbacks/179495.html 4.1. Stream Operations
4.1.1. Map
Given a stream of data elements and a function, map will apply the function to every element in the stream.
4.1.2. Reduce
Sometimes a computation requires computing a smaller stream from a larger input stream, possibly to a single element stream. This type of computation is called a reduction. For example, computing the sum or maximum of all the elements in a stream.
On GPUs, reductions can be performed by alternately rendering to and reading from a pair of textures.
也就是用分治法,不断切换输入和输出数据,每次都能减少一定比例的数据规模?br /> 4.1.3. Scatter and Gather
If the write and read operations access memory indirectly, they are called scatter and gather respectively.
4.1.4. Stream Filtering
This stream fitering operation is essentially a nonuniform reduction.
4.1.5. Sort
Classic sorting algorithms are data-dependent and generally require scatter operations.
主要的几个算法都和Sorting Network有关Q还有一Uadaptive sortQ和原来序列的有序度相关?br /> 4.1.6. Search
4.2. Data Structures
    

ZelluX 2008-02-09 13:14 发表评论
]]>
GP-GPU 阅读W记 (2)http://m.tkk7.com/zellux/archive/2008/02/08/179459.htmlZelluXZelluXFri, 08 Feb 2008 08:05:00 GMThttp://m.tkk7.com/zellux/archive/2008/02/08/179459.htmlhttp://m.tkk7.com/zellux/comments/179459.htmlhttp://m.tkk7.com/zellux/archive/2008/02/08/179459.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/179459.htmlhttp://m.tkk7.com/zellux/services/trackbacks/179459.html  最新的GPU支持多种形式的分支,但是׃它们的高度ƈ行化的本质,使用q些分支的时候一定要注意?
2.4.1 Hardware Machanisms for Flow Control
 三种主要实现Q?
 Predication q真正的data-dependent branch
 MIMD branching 
 SIMD branching  同时q行的指令唯一Q即各个点的分支选择应该一?
2.4.2 Moving Branching Up The Pipeline
2.4.2.1 Static Branch Resolution
 静态分析,避免循环内部的分支。这里D了一个在LI间Ҏ(discrete spatial grid)上解偏微分方E的例子Q不q没怎么看懂Q大致是把@环拆成两部分的做法?
2.4.2.2 Pre-computation
 有时候一D|间内或者几ơ@环中某个分支的结果会是一个常数。这时候就只要在知道结果会改变的时候重新计即可?
2.4.2.3 Z-Cull
 CGPU有一pd用于避免处理不会被看到的像素的技术,其中之一是Z-cull。简单的说Z-cull把没有通过深度试QZ轴覆盖)点直接放弃。在体模拟中,把land-locked障碍单元的Z深度标记?Q即可蟩q这些点的计?
2.4.2.4 Data-Dependent Looping With Occlusion Queries
 同样是避免处理不可见的点的技?
 
3 Programming Systems
 GPU的架构发展非常迅速,使得profiling和tuning需要由GPU生商解冟?
3.1 High-level Shading Languages
 Cg, HLSL  和底层硬件很接近
 OpenGL Shading Language  有一些不直接映射到硬件的Ҏ,比如整数支持
 Sh, Ashli, ...
3.2 GPGPU Languages and Libraries
 上面提到的几个语a在用时都要求编Eh员站在几何元素的视角写代码。下面的几个pȝ试着把一些GPGPU功能抽象出来Q隐藏底层的GPU实现?br />  Brook  前几星期打过交道的东?br />  Scout, Glift  都没听说q。。?br /> 3.3 Debugging Tools
 GPU的调试功能很受局限。它必须提供在某一时刻昄多个点的调试信息的功能。一Uprintf-style的方法是把他们直接显C在屏幕上(汗,如果是GPGPU~程岂不是花屏了 >,<Q?

ZelluX 2008-02-08 16:05 发表评论
]]>
GP-GPU 阅读W记 (1)http://m.tkk7.com/zellux/archive/2008/02/07/179433.htmlZelluXZelluXThu, 07 Feb 2008 08:31:00 GMThttp://m.tkk7.com/zellux/archive/2008/02/07/179433.htmlhttp://m.tkk7.com/zellux/comments/179433.htmlhttp://m.tkk7.com/zellux/archive/2008/02/07/179433.html#Feedback1http://m.tkk7.com/zellux/comments/commentRss/179433.htmlhttp://m.tkk7.com/zellux/services/trackbacks/179433.html No.1
A Survey of General-Purpose Computation on Graphics Hardware
on EUROGRAPHICS 2005

1. Why GP-GPU?
1.1 Powerful and Inexpensive
高内存带宽:Nvidia GeForce 6800 Ultra - 35.2GB/sec
强大的计能力:ATI X800 XT - 63GFLOPS, Intel Pentium4 SSE unit(3.7GHz) - 14.8GFLOPS
端处理U技的应用:最新公?指该survey发布的时?的GPU包含三亿个晶体管Q由0.011微米技术制?br /> 快速发展:GeForce 6800的throughput?900的两倍。通常GPU的计能力^均每q增镉K度?.7x(pixels/second)?.3x(vertices/second)Q而根据摩定律,CPU的对应数值大概ؓ每年1.4x。粗略的_GPU性能每六个月增长一倍?

1.2 Flexible and Programmable

1.3 Limitations and Difficulties
GPU的强大计性能是徏立在它高度针对的架构上的Q因此很多应用都不适合攑ֈGPU上做。比如文字处理,主要包括内存通信Q而且很难q行化?br /> 如今的GPU也缺一些基本的计算功能Q比如整数运。而且很多只支?2位QҎQ貌似最q的R670指o集可以处理doublecd了)Q这样导致很多科学计都没法在GPU上做?br /> 另外即对于适合GPUq些Ҏ的问题Q真正用GPU做时也有不少问题。GPU的编E模型很不一P高效的GPU~程不仅仅是说多学一门高U语a。如今要借助GPU的计能力,需要编Eh员同时掌握相应的U学计算知识和计机囑Ş学知识。尽如此,GPUҎ能提升的帮助还是很׃h的?br />
1.4 GPGPU Today
http://gpgpu.org
一些GPGPU的应用包?br /> Dense and sparse matrix multiplication  计算领域
Multigrid and conjugate-gradient solves for systems partial differential equations   计算领域
Ray tracing   囑փ处理
Photon mapping   囑փ处理
Fluid mechanics solvers   物理模拟
Datamining operations   数据?数据挖掘

2. Overview of Programmable Graphics Hardware
2.1 Overview of the Graphics Pipeline
当今的GPU都采用了UCؓgraphics pipeline的架构。pipeline被分成不同的stageQ硬件上每个stage都被攑ֈtask-parallel machine organization上实现?br />
2.2 Programmable Hardware
昑֍商们把固定功能的pipeline转化成了一个更灉|的可~程的pipeliine。主要在geometry stage和fragment stage。原来的固定的操作被用户定义的vertex program和fragment program代替
通常来说Q这些可~程阶段d一l含有限数量???2位Q点的向量 数组q输Zl含有限数量?*32点向量的数l。每个可~程阶段都可以访问常数寄存器Q也可以d对应的寄存器?br />
2.3 Introduction to the GPU Programming Model
典型的GPGPUE序都用了fragment processor作ؓ计算引擎。通常的结构ؓQ?br /> a. E序员确定该应用的ƈ行部分。应用程序被分成几个独立的可q行D,每段都被看成是一个kernelQ被当成fragment program实现。每个kernel的输入输出都是一个或多个数据数组Q以texture形式保存在GPU内存中。用相关的术语表述的话Q这些在texture中的数据l成了streamQ每个stream上的元素都要被kernel分别处理?br /> b. 调用kernel前要先确定计范_E序员可以传递点的数据给GPU。注意GPU在处理一l数l时性能有所局限?br /> c. rasterizer为每个像素生成一个fragment?br /> d. 每个fragment?strong>同一?/strong>zd的kernelE序处理。fragmentE序可以dL的全局内存Q但只能写到rasterizer军_的frame buffer中?strong>q块q没怎么搞懂
e. 每个fragment的输出是一个值或者向量|可以作ؓ作中的程序结果,也可以保存ؓ一个textureQ用于后面的计算Q复杂的应用通常需要多个pipeline之间的传?multipass)

ZelluX 2008-02-07 16:31 发表评论
]]>
草拟一个计?/title><link>http://m.tkk7.com/zellux/archive/2008/01/16/175647.html</link><dc:creator>ZelluX</dc:creator><author>ZelluX</author><pubDate>Wed, 16 Jan 2008 03:58:00 GMT</pubDate><guid>http://m.tkk7.com/zellux/archive/2008/01/16/175647.html</guid><wfw:comment>http://m.tkk7.com/zellux/comments/175647.html</wfw:comment><comments>http://m.tkk7.com/zellux/archive/2008/01/16/175647.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://m.tkk7.com/zellux/comments/commentRss/175647.html</wfw:commentRss><trackback:ping>http://m.tkk7.com/zellux/services/trackbacks/175647.html</trackback:ping><description><![CDATA[剩下的两星期<br /> 我负责的主要是Fortran -> IL的部?br /> 主要的几个问?br /> <br /> Fortran转成High WHIRL后,怎么写成ILQ?br />     1. 参考brookQ看看能不能代码重用<br />     2. 或者试试直接将WHIRL转成Brook IRQ然后调用那几个routines自动转ILQ?br /> <br /> 如何在Fortran中调用CALQ?br />     1. 如何实现F77调用库函敎ͼ<br />     2. 调用的overhead如何呢?<br /> <br /> 一些优化相关的paperQCC已经攉了几?br />     1. Alan Leung on 6th Workshop on Compiler-Driven Performance<br />     2. RapidMind Development Platform<br />     3. LiquidSIMD<br /> <br /> 其他一些问?br />     1. 军_是否攑ֈGPU里面做的那个tradeoff如何控制Q或者动态控Ӟ<br /> <br /> 暂时惛_q些Q一步一步来 <img src ="http://m.tkk7.com/zellux/aggbug/175647.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://m.tkk7.com/zellux/" target="_blank">ZelluX</a> 2008-01-16 11:58 <a href="http://m.tkk7.com/zellux/archive/2008/01/16/175647.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>vectorizationhttp://m.tkk7.com/zellux/archive/2008/01/02/172281.htmlZelluXZelluXWed, 02 Jan 2008 11:07:00 GMThttp://m.tkk7.com/zellux/archive/2008/01/02/172281.htmlhttp://m.tkk7.com/zellux/comments/172281.htmlhttp://m.tkk7.com/zellux/archive/2008/01/02/172281.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/172281.htmlhttp://m.tkk7.com/zellux/services/trackbacks/172281.html  阅读全文

ZelluX 2008-01-02 19:07 发表评论
]]>
居然要看Fortran?/title><link>http://m.tkk7.com/zellux/archive/2007/12/16/168108.html</link><dc:creator>ZelluX</dc:creator><author>ZelluX</author><pubDate>Sun, 16 Dec 2007 13:03:00 GMT</pubDate><guid>http://m.tkk7.com/zellux/archive/2007/12/16/168108.html</guid><wfw:comment>http://m.tkk7.com/zellux/comments/168108.html</wfw:comment><comments>http://m.tkk7.com/zellux/archive/2007/12/16/168108.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://m.tkk7.com/zellux/comments/commentRss/168108.html</wfw:commentRss><trackback:ping>http://m.tkk7.com/zellux/services/trackbacks/168108.html</trackback:ping><description><![CDATA[涉及C化spec2006中的一些程序,orz<br /> 贴资?br /> <br /> <h1 class="book-heading">Fortran导引</h1> <p>Fortran入门快速指?/p> <h1 class="book-heading">Fortran学习的一些徏?/h1> <pre>2006-8-6 怿大家都对C语言有一定的了解Q其实Fortran跟C相差不是很多? 我把自己认ؓ比较合理快速学习Fortran的方法说下? 学习FortranQ会遇到Fortran77&Fortran90{等Q两者差别不大,学习Fortran90或更 高,更加自由些(仅对一般用而言Q其他优势可能体C出来Q,对自׃后学习他 的程序包也会有好处? 大家一般只是ؓ了编Eؓ了计而学FortranQ而不是ؓ了学习Fortran而学FortranQ所? 我的是学习Fortran不要像学C那样拿一本很详细的教材从头至֭下来Q一个大安 有不错的C语言基础Q而且也没有太多的_֊M门研I这些,倒不如看些简易的教材Q我 会附上)Q掌握基本语句之后直接从看最单的E序开始。这P很快׃体会到Fortra n的格式,可以开始自己写E序了。学习的序我徏议如下: 1?~一些仅含输入输出的E序Q然后可以尝试把输入输出同文件结合v来(从文仉? 数据、写数据Q; 2?然后可以学条件判断、@环语句,通过几个实例也可以很快掌握; 3?再往后就是写子程序,是E序的调用,怿那个时候,看了我的W一个例子(PROG RAM AQ就应该能写出简单的含函数调用的E序Q到了这里,基本上可以算告一D落Q可? q行l构上复杂的E序的编写; 4?最后,可以学一下多个程序的~译甚至是多U语aE序的؜~(如既有C又有Fortran 的多个程序一L译)。多个程序的~译我不q不熟悉Q就留给siriusbobo同志来解说吧 :-) 在编E中遇到困难然后再去查找资料和用法不׃ؓ一U好的方法,不必LL学全? 当然Q有_旉和精力的同学强烈好好看教材,不必急于求成Q有一个好的基? 是一件很好的事? Fortran相比C的优势的话在于它丰富的资源,C的优势可能是更加z,~译效率更高。但 对于我的qx使用来说Q这两者的优势、劣劉K体现不出来,自己的感觉是Fortran更接q? qx的科学语aQ比较严谨些Q更ҎL不出错,比较W合习惯Q变量、函数的声明? 也比C更方便灵z,以外函数的用ؓ例: ****************************************************************************** PROGRAM A real z read *,z call f(z) y=z print *,y end subroutine f(x) x=x**2 return end ****************************************************************************** 只需要加一?subroutine"E序D,d数即可用"call"调用Q当然也可以写多个子E序 Q其中一个子E序也可以通过"call"来调用其他子E序? ׃般学习而言Q除了子E序的编写,另外一个用得比较多的是文g的读写操作,ȝ "read",写用"write"Q如下: ****************************************************************************** PROGRAM B real x open (1,file='in.dat',status='unknown') open (2,file='out.dat',status='unknown') read (1,100) x 100 format (1e12.7) close(1) write (2,200) x 200 format (1e15.8) close(2) end ****************************************************************************** 如果?*"的话Q就为默认Ş式,更具体的可以查看帮助或有兌料,比较好的Ҏ是随? 做一个testE序Q用来检所学或所惟? 对于上程序,出现?100","200"是语句标Pq些标号为方便语句的跌{而出玎ͼ可以? 现@环、条件控制等Q但也ؓ了ɽE序l构化而不推荐使用Q用goto语句和语句标号实? 语句的蟩转如下: ****************************************************************************** PROGRAM C integer n real z n=0 read *,z 1 call f(z) y=z n=n+1 if (n<10) goto 1 print *,y end subroutine f(x) x=x**2 return end ****************************************************************************** q类跌{在F77里经常用刎ͼF90以后q不多见Q但对于"100 format (1e12.7)"之类q是l? 常用刎ͼq是用来表示存储d的数据的格式的,可以攑֜E序M位置Q更具体的用? 要参看说明? 有关注释Q? Fortran里注释用"!"?C"Q其中,一般在Windows下?Compad Visual Fortran"~译Q? 有两U格式,一个是"Free Format"Q生?.f90"Q另外一?Fixed Format"Q生?.for "Q只?.for"里两U注释都可用Q?!"?C"Q,但在".f90"里只能用"!"? 有关学习的困难: 法是语a的灵没错,是最ȝ的,但想必大安学过CQ遇到过不少法Q这些可以用 C实现的,用Fortran实现都不是很困难Q所以这里不主要讨论q个“灵魂”性质的东ѝ? 帔R、变量、数l的数据cdQ以及数据类型的d控制倒是l常Ҏ出错的。下面主? 讲一些我认ؓ需要注意的和我曄犯过和看到过的错误? Fortran跟C一P也分整型(INTEGER)Q实?REAL)Q双_ֺ(REAL*8或REAL(8)或DOUBLE PRECISION)Q这些在U学计算中还是比较重要的Q以实型Cؓ例: 一般REAL{h于REAL*4或REAL(4),是单_ֺ的; 而双_ֺ在F77中表CZؓDOUBLE PRECISIONQ在F90中可以表CZؓREAL*8或REAL(8)Q在高精 度计中Q双_ֺ的变量是很有必要的,对于一般实数可以表CZؓ数形式或指数Ş式, 而双_ֺ都表C成指数形式Q但指数E要改成DQ如Q? REAL:100.0?e2,双精度下得表示?D2 ׃Fortran中不需要对每个变量都进行声明,所以有时候会在每个程序或子程序开头做? 说明Q如下: IMPLICIT DOUBLE PRECISION(A-H,O-Z) 代表以A-H以及O-Z字母开头的变量默认Q在不声明的情况下)是双_ֺ的,否则则是整型 的,如下Q? ****************************************************************************** PROGRAM D IMPLICIT DOUBLE PRECISION(A-H,O-Z) J1=1D-2 J2=-0.5D-1 x=J1+J2 print *,x end ****************************************************************************** PROGRAM E implicit double precision (A-I,O-Z) double precision a,i,e1,e2 data j2 /0.87450547081842D-3/ data j3 /-0.11886910646016D-4/ data j5 /-0.17242068505339D-5/ data j7 /0.10566966079622D-6/ write(*,*) "please input a" read(*,*) a write(*,*) "please input i" read(*,*) i e1=(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)& &/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99& &*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2))) e2=-(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)& &/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99& &*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2))) write(*,"(E9.2E3)") e1,e2 stop end ****************************************************************************** W一个程序输Z?0.4而是0.000000000000000E+000 W二个程序Q意输入a、iQƈ未得到希望得到的l果Q而是输出NAN和NANQ关于NANq个? 误,有时候函数定义域不符合的时候,q行q不报错而是输出NANQ这个时候检查程序这? 地方是检查的重点Q当Ӟ会有其他情况Q但我碰到的不多Q只好就我所知跟大家交流一 下? q两个程序都因ؓJ开头的变量不属于默认双_ֺ变量Q而用双精度表C给它们赋gQ导 致结果跟预期不一_在程序中把这些以J开头的变量用REAL*8声明一下,或把 implicit double precision (A-I,O-Z)改ؓQ? implicit double precision (A-J,O-Z)Q或把这个语句去? 可以得到预期的l果了? 对于数组Q可以用DIMENSION定义Q但需要注意的是,若在E序头未做声明(implicit noneQ时Q用DIMENSION定义数组Ӟ当数l名首字母不属于(A-J,O-Z)里时Q其D出时 为整型,当然做了如下声明情况也会如此Q(implicit double precision (A-I,O-Z)Q? 如下Q? ****************************************************************************** PROGRAM F dimension m(2) m(1)=1.5 m(2)=2.5 print *,m(1),m(2) end ****************************************************************************** 输出的结果是“1Q?”而不?#8220;1.500000,2.500000” 当把E序中m改ؓaӞ输出“1.500000,2.500000” 所以,比较好的Ҏ是尝试用REAL来定义数l(当然也可以用REAL*8Q: ****************************************************************************** PROGRAM G real m(2) m(1)=1.5 m(2)=2.5 print *,m(1),m(2) end ****************************************************************************** 另外Q要说的是,变量可以不定义而直接赋|但会出现如上面PROGRAM D-E的问题,所? 大家在编E的时候对非整型变量声明一下,管ȝQ但不容易出错,有时候正是这 c错误会让初学者困扰好久? 定义变量Ӟl常会看CU定义的写法Q以REALZQ? 可以? real m ?real:: m W一U方式不可以直接赋|必须写成q样Q? ****************************************************************************** PROGRAM H real m m=1.0 print *,m end ****************************************************************************** W二U则可以Q? ****************************************************************************** PROGRAM I real:: m=1.0 print *,m end ****************************************************************************** </pre> <h1 class="book-heading">一些免费的Fortran~译?/h1> <p>Free Fortran Compilers</p> <p>取自 <a target="_blank">http://www.thefreecountry.com/compilers/fortran.shtml</a><br /> This page lists free Fortran compilers for various operating systems. Some of the compilers are compliant with the ANSI Fortran 77 specifications, others with Fortran 95, and so on. Some of them may also come complete with debuggers, editors and an integrated development environment (IDE). </p> <p>If you need a book on Fortran, you may want to check out the selection of books available at <a >Amazon.com</a>. </p> <p>Disclaimer</p> <p>The information provided on this page comes without any warranty whatsoever. Use it at your own risk. Just because a program, book or service is listed here or has a good review does not mean that I endorse or approve of the program or of any of its contents. All the other standard disclaimers also apply. </p> <p>Free Fortran Compilers and IDEs<br /> <dl> <dt><a >Sun Studio Compilers and Tools</a> <dd> <p>Sun Studio Compilers and Tools for Linux and Solaris OS on Sparc and x86/x64 platforms includes command line tools as well as a NetBeans-based IDE for developing, compiling and debugging C, C++ and Fortran programs. It also includes performance analysis tools. </p> <dt><a >Intel Fortran Compiler for Linux</a> <dd> <p>The Intel Fortran Compiler for Linux is free for personal, non-commercial use (registration required). It features an optimizing compiler, the Intel Debugger (GUI and command-line), mixed language support (C and Fortran), full compliance with the ISO Fortran 95 standard, support for the evolving Fortran 2003 standard, multi-threaded application support (OpenMP and auto-parallelization), ability to handle big-endian data files, compatibility with various Linux tools (like make, gdb and Emacs), substantial compatibility with Compaq Visual Fortran, etc. The optimizing compiler supports interprocedural optimization, profile guided optimization, automatic vectorizer, etc. </p> <dt><a >G95</a> <dd> <p>G95 is an open source Fortran 95 compiler. At the time this was written, most of the ISO Fortran 95 standard has been implemented. Platforms supported include Linux(x86, Intel IA64, AMD x86_64), Windows, Macintosh OS X, FreeBSD, Sparc Solaris and HP-UX. </p> <dt><a >Gfortran</a> <dd> <p>gfortran is a Fortran 95 compiler. It runs on Linux and Windows (under cygwin). </p> <dt><a >Salford FTN95 Fortran 95 Compiler</a> <dd> <p>Salford FTN95 is a Fortran 95 compiler that supports Fortran 77, Fortran 90 and Fortran 95. The compiler generates exectuables for Win32 (but Win32 console and GUI applications) and the Microsoft .NET framework. It comes with CHECKMATE, a tool that lets programmers check the correctness of their code at runtime. Also included is Plato 3 (an IDE), full source level debugging, documentation and examples. You may only generate code for your personal use on your home computer, and all executables will display a banner on execution. </p> <dt><a >Salford FTN77 PE ANSI Fortran 77 Compiler</a> <dd> <p>The Salford FTN77 PE (Personal Edition) comes with a full optimising ANSI Fortran 77 compiler with support for various common extensions (including MIL-STD-1753), linker, libraries, make utility, librarian and a full screen debugger. The compiler has a built-in assembler for inline assembly, and the ability to link with code from other sources (such as C++ Fortran 90 and Fortran 95 code). It is free for personal use and for use by students. It supports Windows 95, 98 and NT. </p> <dt><a >Open Source Watcom / OpenWatcom Fortran Compiler</a> <dd> <p>The Watcom (now OpenWatcom) Fortran 77 compiler is now available free of charge, complete with source code. This compiler, which generates code for Win32, Windows 3.1 (Win16), OS/2, Netware, MSDOS (16 and 32 bit), etc, was a well-known compiler some years back (until Sybase terminated it). </p> <dt><a S G77 (GNU Fortran)</a> <dd> <p>This system comes with the GNU G77 Fortran compiler (among other things, including a C/C++ compiler), which you can use to generate Win32 executables from F77 code. Like many systems based on the GNU tools, Mingw32 comes with complete with various programming tools, such as a program maintainence program (ie, make), text processing tools (sed, grep), lexical analyser generator (flex), parser generator (bison), etc. </p> <dt><a >DJGPP GNU G77 (Fortran 77) for MSDOS</a> <dd> <p>This is a development system based on the well-known GNU compiler system that includes compilers for Fortran 77, C, C++, Objective C, etc. It generates 32 bit MSDOS executables that is Windows 95 long-filename-aware. It is a very complete system with IDEs, graphics libraries, lexical analyser generators (flex), parser generators (bison), text processing utilities (like grep, sed), a program maintainence utility (ie, make), a dos extender, and so on. The compiler, utilities and libraries come with source code. </p> <dt><a >f2j - Fortran to Java Compiler</a> <dd> <p>f2j translates Fortran 77 source code to Java class files. It is distributed under the GNU GPL and runs on Linux, SunOS/Solaris. </p> <dt><a >F2C - Fortran to C Translator</a> <dd> <p>This is a well-known Fortran to C converter that comes with source code. The site also includes pre-compiled binaries (executables) for MSDOS and Microsoft Windows, although these are by no means the only systems supported - the compiler works on Unix systems like BSD, Linux, etc. You have to compile the compiler yourself on those systems. Libraries containing the runtime support needed (together with the C source code) are also included. You need a <a target="_top">C compiler</a> to generate binaries from your Fortran sources. </p> <dt><a >FORCE Project - Fortran Compiler and Editor</a> <dd> <p>FORCE is actually just an IDE for Fortran 77 that integrates the GNU Fortran 77 compiler (G77). </p> <dt><a >Emx/Rsx G77 (GNU Fortran)</a> <dd> <p>This is another GNU Fortran port. The RSX port compiles DOS extended console applications for Win32 and the EMX port generates MSDOS extended applications as well as OS/2 applications. The compiler supports the Fortran 77 syntax. </p> <dt><a >Lcc-Win32 Fortran Compiler</a> <dd> <p>LCC-Win32 is primarily a free C compiler and its programming environment for Win32, but it also appears to have a Fortran compiler available for download from their website. It apparently compiles Fortran 77 code (with some common extensions) to C which is subsequently compiled by the C compiler to generate a Win32 native executable. The entire process is integrated seamlessly into the IDE so you might not even realise that intemediate C files were being generated (they are deleted automatically when they are no longer needed). The IDE supports syntax highlighting in C and Fortran. </p> <dt><a >Compaq Fortran for Linux Alpha</a> <dd> <p>This Fortran compiler is for Linux Alpha systems only. It implements the full Fortran-95 language as well as a few language extensions. It comes with a debugger (ladebug), an extended maths library (the Compaq Extended Math Library, CXML) containing technical and scientific subroutines. The licence for the free version allows it to be used for personal and educational purposes, and prohibits its use in any commercial venture. </p> </dd></dl> <h1 class="book-heading">我的Fortran基本用法结</h1> <pre>作者:gator </pre> <pre>目录Q? 一、说? 二、概q? 三、数据类型及基本输入输出 四、流E控? 五、@? 六、数l? 七、函? 八、文? </pre> <pre>一、说? 本文多数内容是我d国u《Fortran 95 E序设计》的W记。只dW九章,主要?~9 章,都是最基本的用法(原书?6章)。这里主要摘录了我看书过E中ȝ的一些Fortran和C? 同的地方Q主要是语法斚w。希望这份笔记能够给学过C但没有接触过Fortran的同学带M些帮 助。要惛_更清楚些Q推荐看一下原书,觉得作者真的写得很好,很清楚;如果有C语言的基Q? 看完前九应该很快的,׃两天p了。觉得如果耐心看完本文Q基本功能应该也可以利用v 来了。外Q由于我之前没有用过FortranQ这ơؓ了赶文档看书又看得很_浅Q大多数东西看过 之后都没得及仔细惻I只是按着作者的意思去理解。所以这份笔记还处于U怸谈兵的层ơ。如? 有不妥的方,希望大家指正。谢谢! 文中<span style="color: #0000ff">蓝色</span>的部分是E序代码Q?span style="color: #ff0000">!后面的内容ؓ注释</span>? </pre> <pre> </pre> <pre>二、概q? 1、名词解? Fortran=<span style="color: #339966">For</span>mula <span style="color: #ff6600">Tran</span>slator/Translation 一看就知道有什么特色了Q可以把接近数学语言的文本翻译成机械语言。的,从一开? QIBM设计的时候就是ؓ了方便数D和U学数据处理。设计强大的数组操作是Z实现q一 目标。ortran奠定了高U语a发展的基。现在Fortran在科研和机械斚w应用很广? </pre> <pre>2、Fortran的主要版本及差别 按其发展历史QFortran~译器的版本其实很多。现在在q泛使用的是Fortran 77和Fortr an90。ortran 90在Fortran 77基础上添加了不少使用的功能,q且改良?7~程的版面格式, 所以编E时推荐使用90。鉴于很多现成的E序只有77版本Q有必要知道77的一些基本常识,臛_? 证能够看77E序。以下是77?0的一些格式上的区别? <em>Fortran 77Q?/em> 固定格式Qfixed formatQ,E序代码扩展名:.f?for Q?Q若某行以C,c?开_则该行被当成注释Q? Q?Q每行前六个字符不能写程序代码,可空着Q或?~5字符以数字表明行代码Q用作格 式化输入出等Q;7~72为程序代码编写区Q?3往后被忽略Q? Q?Q太长的话可以箋行,所l行的第六个字符必须?0"以外的Q何字W? <em>Fortran 90Q?/em>自由格式Qfree formatQ, 扩展名:.f90 Q?Q以"!"引导注释Q? Q?Q每行可132字符Q行代码攑֜每行最前面Q? Q?Q以&l行Q放在该行末或下行初? 以下都是讨论Fortran 90? </pre> <pre>3、Fortran的一些特点,和C的一些不? 其实很多Q在下面涉及具体斚w时可以看到。这里只是大致提一些? Q?Q不分大写 Q?Q每句末不必要写分? Q?Q程序代码命令间的空格没有意? Q?Q不像CQFortran不用{ } Q?Q数据类型多Z复数和逻辑判断cd。比如复数类? <span style="color: #0000ff">complex :: a </span> !声明复数的方法。复数显然方便了U学计算Q满了工程斚w需? <span style="color: #0000ff">a=(1.0,2.0) </span> ! a=1+i Q?Q多Z乘幂q算Q?*Q。乘q除了整数还可以是实数Ş式。如开方,开立方 <span style="color: #0000ff">a=4.0**0.5Qa=8.0**(1.0/3.0)</span>? Q?Q数l有一些整体操作的功能Q可以方便的寚w分元素进行操? Q?Q有些情况下可以声明大小待定的数l,很实用的功能 </pre> <pre>4、Fortran的基本程序结? 先看一看所谓的"Hello Fortran"E序? <span style="color: #0000ff">program main </span> !E序开始,main是program的名字,完全自定? <span style="color: #0000ff">write(*,*) "Hello" </span> !ȝ? <span style="color: #0000ff">stop </span> !l止E序 <span style="color: #0000ff">end [program[main]]</span> !end用于装代码Q表CZ码编写完毕。[ ]中的内容可省略,下同? 再看一D实用一些的E序Q好有点感性认识。程序用于计圆q表面U,要求输入底面 半径和。其中展CZFortran的一些特色用法。程序摘自维基。其实是一个叫<a target="_blank">www.answers.com</a> 的网上引的维基的|页。推荐去看看!能查C有意思的东西? <span style="color: #0000ff">program cylinder </span> !l主函数起个名字 ! Calculate the area of a cylinder. ! Declare variables and constants. ! constants=pi ! variables=radius squared and height <span style="color: #0000ff">implicit none </span> ! Require all variables to be explicitly declared !q个一般都是要写上的。下面会q一步说明? <span style="color: #0000ff">integer :: ierr character :: yn real :: radius, height, area real, parameter :: pi = 3.1415926536 </span> !q是帔R的声明方? <span style="color: #0000ff">interactive_loop: do </span> !do循环QFortran中的循环可以加标{,如d前面? !interactive_loop是标签 </pre> <pre>! Prompt the user for radius and height<br /> ! and read them.<br /> <span style="color: #0000ff">write (*,*) 'Enter radius and height.' </span> !屏幕输出<br /> <span style="color: #0000ff">read (*,*,iostat=ierr) radius,height </span> !键盘输入。isotat的值用判断输入成功否?br /> ! If radius and height could not be read from input,<br /> ! then cycle through the loop.<br /> <span style="color: #0000ff">if (ierr /= 0) then <br /> write(*,*) 'Error, invalid input.'<br /> cycle interactive_loop </span> !cycle 相当于C里的continue<br /> <span style="color: #0000ff">end if</span><br /> ! Compute area. The ** means "raise to a power."<br /> <span style="color: #0000ff">area = 2 * pi * (radius**2 + radius*height) </span> ! 指数q算比C方便<br /> ! Write the input variables (radius, height)<br /> ! and output (area) to the screen. <br /> <span style="color: #0000ff">write (*,'(1x,a7,f6.2,5x,a7,f6.2,5x,a5,f6.2)') & </span> </pre> <pre> !"&"表示l行。这里还昄了格式化输出 <span style="color: #0000ff">'radius=',radius,'height=',height,'area=',area yn = ' ' yn_loop: do </span> !内嵌的另一个do循环 <span style="color: #0000ff">write(*,*) 'Perform another calculation? y[n]' read(*,'(a1)') yn if (yn=='y' .or. yn=='Y') exit yn_loop if (yn=='n' .or. yn=='N' .or. yn==' ') exit interactive_loop end do yn_loop </span> !l束内嵌do循环 <span style="color: #0000ff">end do interactive_loop end program cylinder </span> FortranE序的主要结构就是这样了。一般还会有些module的部分在d数前Q函数在d 数后? </pre> <pre> </pre> <pre>三、数据类型及基本输入输出 1、数据类型,声明及赋初? Q?QintegerQ?短整型kind=2, 长整型kind=4 <span style="color: #0000ff">integer([kind=]2) :: a=3</span> 如果声明成integer:: aQ则默认为长整型? !"::" 在声明ƈ同时赋初值时必须要写上;cd名后面有形容词时也必M?:Q其他情况可略去 !所谓Ş容词Q可以看一下这个。比如声明常? <span style="color: #0000ff">realQparameter :: pi=3.1415926</span> 。parameter是形容词? Q?QrealQ单_ֺkind=4Q默认)Q双_ֺkind=8 <span style="color: #0000ff">real([kind=]8) :: a=3.0</span> q有指数的Ş式,?E10为单_ֺQ?D10为双_ֺ Q?Qcomplex 单精度和双精? <span style="color: #0000ff">complex([kind=]4) b</span> Q?Qcharacter <span style="color: #0000ff">character([len=]10) c </span> !len为最大长? Q?Qlogical <span style="color: #0000ff">logical*2 :: d=.ture. </span>({h?span style="color: #0000ff">logical(2)::d=.ture.</span>) Q?Q自定义cdtypeQ类gC中的struct Fortran 77中给变量赋初值常用DATA命oQ可同时l多个变量赋初? <span style="color: #0000ff">data a,b,string /1, 2.0, 'fortran'/</span> 与C不同的是QFortran中变量不声明也能使用,x默认cdQ跟implicit命o有关Q。按 照默认的定,以i,j,k,l,m,n开头的变量被定义ؓinteger,其余为real。取消该讄需在程序声? 部分之前implicit none。彭国u一般都使用该语句? 另一点关于声明的不同是Fortran?{h声明"Q? <span style="color: #0000ff">integer a,b equivalence(a,b)</span> 使得a,b使用同一块内存。这样可以节省内存;有时可精代码。如Qequivalence(很长? 字的变量如三l数l的某个元素Qa)Q之后用a来编写程序就z多了? </pre> <pre>2、基本输入输? 输入Q?span style="color: #0000ff">read(*,*) a </span> !从键盘读? 输出Q?span style="color: #0000ff">write(*,*) "text"</span> !在屏q上输出。Fortran 77? text'。Fortan 90中一? "? '都可 <span style="color: #0000ff">print *Q?text" </span> !只能用于屏幕输出 Q?,*Q完整写为(unit=*,fmt=*Q。其中unit?输出位置Q如屏幕Q文件等Qfmt? 格式。如q两w写成*Q则按默认的方式q行Q即上面描述的。print后面?表示按默认格式输 出? </pre> <pre> </pre> <pre>四、流E控? 1、运符 Q?Q逻辑q算W? == /= > >= < <= !Fortran 90用法 .EQ. .NE. .GT. .GE. .LT. .LE. !Fortran 77用法 Q?Q涉及相互关pȝ集合q算W? .AND. .OR. .NOT. .EQV. .NEQV. ! ?NOT.q接一个表辑ּQ其余左右两辚w要有表达式(可以是logicalcd的变量) !.EQV.Q当两边逻辑q算值相同时为真Q?.NEQV.Q当两边逻辑q算g同时为真 </pre> <pre>2、IF<br /> (1) 基本 Q?<br /> <span style="color: #0000ff">if(逻辑判断? then<br /> ……<br /> end if </span><br /> 如果then后面只有一句,可写?br /> if(逻辑判断? …… !then和end if可省?br /> (2) 多重判断Q?br /> <span style="color: #0000ff">ifQ条?Q?then<br /> ……<br /> else ifQ条?Qthen<br /> ……<br /> else if Q条?Qthen<br /> ……<br /> else<br /> ……<br /> end if</span><br /> (3) 嵌套Q?br /> <span style="color: #0000ff">if(逻辑判断? then<br /> if(逻辑判断? then<br /> if(逻辑判断? then<br /> else if(逻辑判断? then<br /> ……<br /> else</span><br /> <span style="color: #0000ff"> ……<br /> end if <br /> end if<br /> end if</span><br /> (4) 术判断Q?br /> <span style="color: #0000ff">program example<br /> implicit none<br /> real c<br /> write (*,*) "input a number"<br /> read (*,*) c<br /> if(c) 10,20,30</span> !10,20?0代码,Ҏc于/{于/大于0Q执?0/20/30行的E?br /> <span style="color: #0000ff">10 write (*,*) "A"<br /> goto 40 </span> !goto可实现蟩CQ意前面或后面的行代码处,但用多了破坏E序l?br /> <span style="color: #0000ff">20 write (*,*) "B"<br /> goto 40<br /> 30 write (*,*) "C"<br /> goto 40<br /> 40 stop<br /> end</span> </pre> <pre>3、SELECT CASE<br /> cM于C的switch语句<br /> <span style="color: #0000ff">select case(变量)<br /> caseQ数?Q?/span> ! 比如case(1:5)代表1<=变量<=5会执行该模块<br /> <span style="color: #0000ff">…… </span> !caseQ?Q?Q?Q代表变量等???会执行该模块<br /> <span style="color: #0000ff">caseQ数?Q?/span> !括号中数值只能是integer,character或logical型常量,不能real?br /> <span style="color: #0000ff">…<br /> case default<br /> ……<br /> end case</span> </pre> <pre> 4、PAUSE, CONTINUE pause暂停E序执行Q按enter可l执? continue貌似没什么用处,可用作封装程序的标志 </pre> <pre> </pre> <pre>五、@? 1、DO <span style="color: #0000ff">do counter=初? l? ?减量 </span> !counter的g初值到l值按?减量变, <span style="color: #0000ff">…… </span> !counter每取一个值对应着一ơ@环。增/减量不写则认? <span style="color: #0000ff">…… </span> <span style="color: #0000ff">…… </span> !循环M也没有必要用{} <span style="color: #0000ff">…… </span> <span style="color: #0000ff">end do</span> Fortran 77中不是用end do来终止,而是下面q样子: <span style="color: #0000ff">do 循环最后一行的行代? counter=初? l? ?减量 …… 行代? …… </span> !q是do的最后一? </pre> <pre>2、DO WHILE <span style="color: #0000ff">do while(逻辑q算) …… …… end do</span> cM于C中的while(逻辑q算) {……}? 一开始那个计圆p面积的程序中Q应该也是q一cR不q它是通过内部的if语句? 控制循。看来也是可以的Q不q在q本书上没看到这样写。其实应该也可以归于下面q种? </pre> <pre>3、没看到和C里面的do{……}while(逻辑q算); 相对应的循环语句Q不q可以这P保证<br /> 臛_做一循环Q?br /> <span style="color: #0000ff">do while(.ture.)<br /> …… <br /> …… <br /> if(逻辑q算) exit !exit好比C里面的break。C里的continue在Fortran里是cycle<br /> end do</span> </pre> <pre> 4、Fortran的一个特Ԍ带v名的循环 可以q样Q不易出错: <span style="color: #0000ff">outer: do i=1,3 inner: do j=1,3 …… end do inner end do outer</span> q可以这P很方便: <span style="color: #0000ff">loop 1: do i=1,3 loop2: do j=1,3 if(i==3) exit loop1 </span> !exitl止整个循环loop1 <span style="color: #0000ff">if(j==2) cycle loop2 </span> !cycle跛_loop2的本ơ@环,q行loop2的下ơ@? <span style="color: #0000ff">write(*,*) i,j end do loop2 end do loop1</span> q有一些@环主要用于Fortran中的数组q算QؓFortranҎQ很实用? </pre> <pre> </pre> <pre>六、数l? 1、数l的声明 和C不同的是QFortran中的数组元素的烦引值写在(Q内Q且高维的也只用一个(Q,? <span style="color: #0000ff">integer a(5) </span> !声明一个整型一l数l? <span style="color: #0000ff">real :: b(3,6)</span> !声明一个实型二l数l? cd可以是integer, real, character, logical或type。最高可以到7l? 数组大小必须为常数。但是和C语言不同QFortran也有办法使用大小可变的数l,Ҏ如: <span style="color: #0000ff">integer, allocatable :: a(:)</span> </pre> <pre>!声明可变经q某个途径得知所需数组大小size之后Q用下面的语句: <span style="color: #0000ff">allocate(a(size))</span> !配置内存I间 之后该数l和通过一般方法声明的数组完全相同? 与C不同QFortran索引值默认ؓ?开始,而且可以在声明时改变该规则: <span style="color: #0000ff">integer a(-3:1) </span> ! 索引gؓ-3Q?2Q?1Q?Q? <span style="color: #0000ff">integer b(2:3,-1:3)</span> !b(2~3,-1~3)为可使用的元? </pre> <pre>2、数l在内存中的存放 和C不同QFortran中的数组比如a(2,2)在内存中存放序为a(1,1),a(2,1),a(1,2),a(2,2 )。原则是放低l的元素Q再Nl的元素。此规则UCؓcolumn major? </pre> <pre>3、赋初? Q?Q最普通的做法Q? <span style="color: #0000ff">integer a(5) data a /1,2,3,4,5/</span> ?span style="color: #0000ff">integer :: a(5)=(/1,2,3,4,5/)</span> ?span style="color: #0000ff">integer :: a(5)=5</span>Q则5个元素均? 对于<span style="color: #0000ff">integer :: a(2,2)=(/1,2,3,4/) </span> Ҏ数组元素在内存中存放的方式,{h于赋?span style="color: #0000ff">a(1,1)=1,a(2,1)=2,a(1,2)=3,a(2,2)=4</span> Q?Q利用Fortran的特Ԍ隐含式@环。看例子明白了? <span style="color: #0000ff">integer a(5) integer i data (a(i),i=2,4)/2,3,4/ </span> !(a(i),i=2,4)表示i??循环Q增量ؓ默认? q可以这P <span style="color: #0000ff">integer i integer :: a(5)=(/1,(2,i=2,4),5/) </span> !五个元素分别赋gؓ1Q?Q?Q?Q? <span style="color: #0000ff">integer :: b(5)=(/i, i=1,5/) </span> !五个元素分别赋gؓ1Q?Q?Q?Q? q可以嵌? <span style="color: #0000ff">data ((a(i,j),i=1,2),j=1,2)=/1,2,3,4/</span> !a(1,1)=1,1(2,1)=2,a(1,2)=3,a(2,2)=4 </pre> <pre>4、操作整个数l? 设aQb为相同类型、维数和大小的数l? <span style="color: #0000ff">a=5 </span> !所有元素赋gؓ5 <span style="color: #0000ff">a=(/1,2,3/) </span>!q里假设aZl_a(1)=1,a(2)=2,a(3)=3 <span style="color: #0000ff">a=b </span> !对应元素赋|要求a,b,cl数和大相同,下同 <span style="color: #0000ff">a=b+c a=b-c a=b*c a=b/c a=sin(b) </span> !内部函数都可以这L </pre> <pre>5、操作部分数l元? aZl数l? <span style="color: #0000ff">a(3:5)=(/3,4,5/) </span> !a(3)=3,a(4)=4,a(5)=5 <span style="color: #0000ff">a(1:5:2)=3 </span> !a(1)=3,a(3)=3,a(5)=3 <span style="color: #0000ff">a(3:)=5 </span> !a(3)以及之后的所有元素赋gؓ5 <span style="color: #0000ff">a(1:3)=b(4:6) </span> !cM于这U的要求左右数组元素个数相同 <span style="color: #0000ff">a(:)=b(:,2) </span> !a(1)=b(1,2),a(2)=b(2,2)Q以此类? </pre> <pre>6、WHERE where形式上类gifQ但只用于设|数l。设有两个同L型、维数和大小的数la,b <span style="color: #0000ff">where(a<3) b=a </span> !a中小?的元素赋值给b对应位置的元? <span style="color: #0000ff">end where </span> 再如Q?span style="color: #0000ff">where(a(1:3)/=0) c=a</span> !略去了end where,因ؓ只跟了一行where可嵌Q也 !可类似do循环有v名标{? </pre> <pre>7、FORALL 有点像C中的for循环Q? <span style="color: #0000ff">forall(triplet1[,triplet2 [,triplet3…]],mask)</span> 其中triplet形如i=2Q?Q?Q表C@环,最后一个数字省略则增量? 例如Q? <span style="color: #0000ff">forall(i=1:5,j=1:5,a(i,j)<10) a(i,j)=1 end forall</span> 又如Q?span style="color: #0000ff"> forall(i=1:5,j=1:5,a(i,j)/=0) a(i,j)=1/a(i,j)</span> forall也可以嵌套用,好比C中for循环的嵌套? </pre> <pre> </pre> <pre>七、函? Fortran中函数分两类Q子E序QsubroutineQ和自定义函敎ͼfunctionQ。自定义函数? 质上是学上的函敎ͼ一般要传递自变量l自定义函数Q返回函数倹{子E序不一定是q样Q可 以没有返倹{传递参数要注意cd的对应,q跟C是一L? 1、子E序 目的Q把某一D늻怋用的有特定功能的E序独立出来Q可以方便调用? 习惯上一般都把子E序攑֜ȝ序结束之后? 形式Q? <span style="color: #0000ff">subroutine name (parameter1, parameter2) </span> !l子E序起一个有意义的名字。可以传递参敎ͼq样可以有返回倹{括号内也可? I着Q代不传递参数? <span style="color: #0000ff">implicit none integer:: parameter1, parameter2 </span> !需要定义一下接收参数的cd? <span style="color: #0000ff">…… </span> !接下来的E序~写跟主E序没有M别? <span style="color: #0000ff">…… </span> <span style="color: #0000ff">mreturn </span>!跟C不同Q这里表C子E序执行后回到调用它的地方l执行下面的E序。不一定放 </pre> <pre> !在最后。可以放在子E序的其他位|,作用相同Q子E序中return之后的部分不执行? <span style="color: #0000ff">end [subroutine name]</span> 调用Q用call命o直接使用Q不需要声明。在调用处写Q? <span style="color: #0000ff">call subroutine name(parameter1,parameter2)</span> 注意点: a.子程序之间也可相互调用。直接调用就是了Q像在主E序中调用子E序一栗? b.传递参数的原理和C中不同。Fortran里是传址调用(call by address/reference)Q就? 传递时用参数和子程序中接收时用的参C用同一个地址Q尽命名可以不同。这样如果子E序 的执行改子程序中接收参数的|所传递的参数也相应发生变化? c.子程序各自内部定义的变量h独立性,cM于C。各自的行代码也h独立性。因此各 个子E序ȝ序中有相同的变量名、行代码Pq不会相互媄响? </pre> <pre>2、自定义函数 和子E序的明显不同在于:需要在ȝ序中声明之后才能使用。调用方式也有差别。另? 按照惯例用函CL变自变量的倹{如果要改变传递参数的|习惯上用子程序来做? 声明方式Q?span style="color: #0000ff">real, external :: function_name</span> 一般自定义函数也是攑֜ȝ序之后? 形式Q? <span style="color: #0000ff">function function_name(parameter1, parameter2) implicit none real:: parameter1, parameter2 </span> !声明函数参数cdQ这是必需? <span style="color: #0000ff">real::function_name</span> !声明函数q回值类型,q是必需? <span style="color: #0000ff">…… …… function_name=…. </span> !q回值的表达? <span style="color: #0000ff">return end </span> 也可以这L接声明返回值类型,z些Q? <span style="color: #0000ff">real function function_name(parameter1, parameter2) implicit none real:: parameter1, parameter2 </span> !q个q是必需? <span style="color: #0000ff">…… …… function_name=…. </span> !q回D辑ּ <span style="color: #0000ff">return end </span> 调用Q?span style="color: #0000ff">function_name(parameter1,parameter2)</span> 不需要call命o? 自定义函数可以相互调用。调用时也需要事先声明? MQ调用自定义函数前需要做声明Q调用子E序则不需要? </pre> <pre>3、关于函C的变? Q?Q注意类型的对应。Fortran中甚臛_以传递数值常量,但只有跟函数定义的参数类? 对应才会到想要的l果。如call ShowReal(1.0)必ȝ1.0而不?? Q?Q传递数l参敎ͼ也跟C一h传地址Q不q不一定是数组首地址Q而可以是数组某个 指定元素地址。比如有数组a(5)Q调用call function(a)则传递a(1)的地址Q调用call functio n(a(3))则递a(3)的地址? Q?Q多l数l作为函数参敎ͼ跟C相反的是Q最后一l的大小可以不写Q其他维大小必须 写。这决于Fortran中数l元素column major的存放方式? Q?Q在函数中,如果数组是接收用的参敎ͼ则在声明时可以用变量赋值它的大,甚至? 以不指定。例如: <span style="color: #0000ff">subroutine Array(num,size) implicit none integer:: size integer num(size)</span> !可以定义一个数l,其大是通过传递过来的参数军_的。这很实? <span style="color: #0000ff">…… …… return end</span> Q?Qsave命oQ将函数中的变量值在调用之后保留下来Q下ơ调用此函数时该变量的值就 是上ơ保的倹{只要在定义时加上savepQ? <span style="color: #0000ff">integer, save :: a=1</span> Q?Q传递函敎ͼ包括自定义函数、库函数、子E序都是可以的)。类gC中的函数指针需要在 ȝ序和调用函数的函C都声明作为参C递的函数。如 <span style="color: #0000ff">real, external :: function </span> !自定义函? <span style="color: #0000ff">real, intrinsic :: sin</span> !库函? <span style="color: #0000ff">external sub </span> !子程? Q?Q函C用接口(interfaceQ:一D늨序模块。以下情况必需Q? a.函数q回gؓ数组 b.指定参数位置来传递参数时 c.所调用的函数参CC固定 d.输入指标参数? e.函数q回gؓ指针时? 具体用法l合例子Ҏ看懂。例子都很长。看书吧? </pre> <pre>4、全局变量<br /> 功能׃用说了。原理:Ҏ声明时的相对位置关系而取用,不同与C中根据变量名使用?br /> 如果在主E序中定义:<br /> <span style="color: #0000ff">integer :: a,b<br /> common a,b </span> !是q样定义全局变量?br /> 在子E序或自定义函数中定义:<br /> <span style="color: #0000ff">integer :: c,d<br /> common c,d</span><br /> 则a和cq相同内存Qb和dq相同内存?br /> 全局变量太多时会很麻烦。可以把它们Zؓ归类Q只需在定义时在common后面加上区间?br /> 。如<br /> <span style="color: #0000ff">common /groupe1/ a, common /group2/ b</span>。这样用时׃必把所有全局变量<br /> 都列出来Q再声明<span style="color: #0000ff">common /groupe1/ c</span>可以用a、c全局变量了?br /> 可以使用block dataE序模块。在ȝ序和函数中不能直接用前面提到的data命ol全<br /> 局变量赋初倹{可以给它们各自赋初|如果要用data命o必须要这P<br /> <span style="color: #0000ff">block data [name]<br /> implicit none<br /> integer a,b,c<br /> real d,e<br /> common a b c<br /> common /group1/ d,e<br /> data a,b,c,d,e /1,2,3,4.0,5.0/<br /> end [block data [name]]</span> </pre> <pre> 5、Module Module不是函数。它用于装E序模块Q一般是把具有相兛_能的函数及变量封装在一? 。用法很单,但能提供很多方便QɽE序变得z,比如使用全局变量不必每次都声明一长串Q? 写在odule里调用就行了。Module一般写在主E序开始之前? 形式Q?span style="color: #0000ff"> module module_name …… …… end [module [module_name]]</span> 使用Q在ȝ序或函数中用时Q需要在声明之前先写上一行:use module_name. Module中有函数时必dcontains命o之后Q即在某一行写上contains然后? 面开始写敎ͼ多所有函数都写在q个contains之后Q。ƈ且module中定义过的变量在module里的 函数中可直接使用Q函C间也可以直接怺调用Q连module中的自定义函数在被调用时也不? 先声明? </pre> <pre> 6、include攑֜需要的M地方Q插入另外的文g(必须在同一目录?。如Q?br /> <span style="color: #0000ff">include 'funcion.f90'</span> </pre> <pre> </pre> <pre> 八、文? 1、文本文? Fortran里有两种d文g的方式,对应于两U文? 序dQ用于文本文? 直接dQ用于二q制文g q里只摘录关于文本文件的d。一般模式如下? <span style="color: #0000ff">character(len=20)::filenamein="in.txt", filenameout="out.txt"</span> !文g? <span style="color: #0000ff">logical alive integer::fileidin=10,fileidout=20 </span> !10Q?0是给文g~的P?Q?Q?Q?的正整数都可Q因??是默认的输出位置Q屏q? Q,1?是默认的输入位置Q键盘) <span style="color: #0000ff">integer::error real::in,out</span> !下面q一D는于确认指定名字的文g是否存在 <span style="color: #0000ff">inquire(file=filenamein, exist=alive) </span> !如果存在Qalive赋gؓ0 <span style="color: #0000ff">if(.NOT. alive) then write(*,*) trim(filenamein), " doesn't exist."</span>!trim用于删去filenamein中字? !后面的stop多余I格Q输出时好看? <span style="color: #0000ff">end if open([unit=]fileidin, file=filenamein, status="old") open([unit=]fileidout,file=filenameout[,status="new"])</span> !unit指定输入/输出的位|。打开已有文g一定要用status="old"Q打开新文件用status="new"Q? !不指定statusQ则默认status="unknown"Q覆盖已有文件或打开新文?#8230;… <span style="color: #0000ff">read([unit=]fileidin, [fmt=]100,iostat=error )in </span> !error=0表示正确d数据? <span style="color: #0000ff">100 format(1X,F6.3) </span> !按一定格式输入输出,格式可以另外写ƈ指定行代码,也可以直接写在read/write? <span style="color: #0000ff">write(([unit=]fileidout, "(1X,F6.3)")out close(fileidin) close(fileidout)</span> !1X代表一个空根{F6.3代表real型数据用?个字W(含小数点Q,其中数点后三位? !常用的还有I3Q用于整型数据,共占三个字符QA8Q字W型Q占8个字W。换行用 / 二进制文件的d有所不同。不再列举? </pre> <pre>2、内部文? 另一个很实用的读写功能是内部文gQinternal fileQ。看看这个例子就明白了? <span style="color: #0000ff">integer::a=1,b=2 character(len=20)::string write(unit=string,fmt="(I2,'+',I2,'=',I2)")a,b,a+b write(*,*)string</span> 则结果输?+2=3。反q来也是可以的: <span style="color: #0000ff">integer a character(len=20)::string="123" read(string,*)a write(*,*)a</span> 则输?23? </pre> <pre>!全文l束? </pre> <img src ="http://m.tkk7.com/zellux/aggbug/168108.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://m.tkk7.com/zellux/" target="_blank">ZelluX</a> 2007-12-16 21:03 <a href="http://m.tkk7.com/zellux/archive/2007/12/16/168108.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Samplinghttp://m.tkk7.com/zellux/archive/2007/12/14/167758.htmlZelluXZelluXFri, 14 Dec 2007 05:42:00 GMThttp://m.tkk7.com/zellux/archive/2007/12/14/167758.htmlhttp://m.tkk7.com/zellux/comments/167758.htmlhttp://m.tkk7.com/zellux/archive/2007/12/14/167758.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/167758.htmlhttp://m.tkk7.com/zellux/services/trackbacks/167758.htmlCAL样例E序里面出现很多sample指oQgoogle到的单介l:

Antialias Q抗锯Q?/span>

虽然减小像素的大可以囑փ可以更加_Q一定程度上减轻了锯齿,但是只要像素的大大到可以互相彼此区分,那么锯的生是不可避免的!抗锯齿的Ҏ一般是多点Q注意此处是“点”而不是“像素”,后面可以看出它们间的区别Q采栗?/span>

一?span style="FONT: 7pt 'Times New Roman'">        理论与方法:

1 Q?/span> Oversampling Q重复取PQ?/span>

Q?/span> 1 Q方法:

 首先Q将场景以比你的昄器(前缓Ԍ更高分L率进行渲染:

假设当前的(?/span> / 后缓Ԍ的分辨率?/span> 800 × 600 Q那么可以先场景渲染到 1600 × 1200 的渲染目标上Q纹理)Q?/span>

 然后Q从高分辨率的渲染目标得C分L率的场景渲染l果Q?/span>

      此时取每 2 × 2 个像素块颜色的^均gؓ最l渲染的像素颜色倹{?/span>

Q?/span> 2 Q优点:可以显著地改善锯齿导致的q?/span>

Q?/span> 3 Q缺点:需要更大的~冲Q同时填充缓冲导致性能消耗变大;

           q行多个像素的取PD性能下降Q?/span>

           ׃以上~点Q?/span> D3D q没有采用这U抗锯Ҏ?/span>

2 Q?/span> Multisampling Q多取样Q:

Q?/span> 1 Q方法:

只需要对像素q行一ơ取P而是在每个像素中?/span> N 个点Q取决于具体的取h型)Q?strong style="mso-bidi-font-weight: normal">该像素的最l颜?/span> = 该像素原先的颜色 *  多边形覆盖的Ҏ  /  ȝ取样ҎQ?/span>

Q?/span> 2 Q优点:可以改善锯带来的失真的同时而不会增加取h敎ͼ同时比v Oversampling 它也不需要更大的后备~冲?/span>

Q?/span> 3 Q缺点:原本当一个多边Ş覆盖了一个像素的中心ҎQ该像素的颜色才会由该多边Ş军_Q在像素线阶段典型的就是寻址到合适的U理颜色与顶点管U输出的颜色q行调制Q,但是 Multisampling 中,如果该多边Ş覆盖了其中一部分取样点却未覆盖像素中心点Q该像素颜色仍然由此多边形决定。如此一来,U理d可能出现错误Q这对于U理集( atlas Q会出现另一U失真效果:多边形边~颜色错误!

3 Q?/span> Centriod Sampling Q质心采PQ?/span>

Q?/span> 1 Q方法:

     Z解决在?/span> Multisampling D的在U理集中q行U理d带来的错误,不再采用像素中心的颜色作为?strong style="mso-bidi-font-weight: normal">该像素原先的颜色”,而是用?strong style="mso-bidi-font-weight: normal">该像素中被多边Ş覆盖的那些取L的中心点的颜?/span>”。这样就保证了被渲染的像素点始终是多边Ş的内部(也就是说U理地址不会出多边形的范围Q?/span>

Q?/span> 2 Q如何用:

         ①Q何有COLOR语义作ؓ输入?span lang="EN-US">Pixel Shader会自动运用质心采P

     ②在Pixel Shader的输入参数的语义后中手动加入 _centroid 扩展Q例如:

   float4  TexturePointCentroidPS( float4 TexCoord : TEXCOORD0_centroid ) : COLOR0

{

  return tex2D( PointSampler, TexCoord );

}

Q?/span> 3 Q注意:

    质心采样主要用于采用U理集的 Multisampling Q对于一整张U理对应一个的多边形网格的情况Q采用质心采样反而会D错误Q?/span>



ZelluX 2007-12-14 13:42 发表评论
]]>
Inter-Procedural Analysis 相关的资?(3)http://m.tkk7.com/zellux/archive/2007/11/27/163465.htmlZelluXZelluXTue, 27 Nov 2007 07:24:00 GMThttp://m.tkk7.com/zellux/archive/2007/11/27/163465.htmlhttp://m.tkk7.com/zellux/comments/163465.htmlhttp://m.tkk7.com/zellux/archive/2007/11/27/163465.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/163465.htmlhttp://m.tkk7.com/zellux/services/trackbacks/163465.htmlhttp://m.tkk7.com/Files/zellux/ORC-PACT02-tutorial.rar

然后貌似龙书W二版里也讲了大量的IPA优化和call graph斚w的东西,啃啊?

ZelluX 2007-11-27 15:24 发表评论
]]>
Inter-Procedural Analysis 相关的资?(2)http://m.tkk7.com/zellux/archive/2007/11/26/163137.htmlZelluXZelluXMon, 26 Nov 2007 04:53:00 GMThttp://m.tkk7.com/zellux/archive/2007/11/26/163137.htmlhttp://m.tkk7.com/zellux/comments/163137.htmlhttp://m.tkk7.com/zellux/archive/2007/11/26/163137.html#Feedback1http://m.tkk7.com/zellux/comments/commentRss/163137.htmlhttp://m.tkk7.com/zellux/services/trackbacks/163137.html Overview of the Open64 Compiler Infrastructure
VI.4. Interprocedural Analysis
Interprocedural Analysis (IPA) is performed in the following phases of Open64:
• Inliner phase
• IPA local summary phase
• IPA analysis phase
• IPA optimization phase
• IPA miscellaneous
By default the IPA does the function inlining in the inliner facility. The local summary phase is done in the IPL module and the analysis phase and optimization phase in the ipa-link module.
During the analysis phase, it does the following:
• IPA_Padding Analysis (common blocks Padding/Split Analysis)
• Construction of the Callgraph
Then it does space and multigot partitioning of the Callgraph. The partitioning algorithm takes into account whether it is doing partitioning for solving space or the multigot problem.
During the optimization phase the following phases are performed:
• IPA Global Variable Optimization
• IPA Dead function elimination
• IPA Interprocedural Alias Analysis
• IPA Cloning Analysis (It propagates information about formal parameters used as symbolic terms in array section summaries. This information is later used to trigger cloning.
• IPA Interprocedural Constant propagation
• IPA Array_Section Analysis
• IPA Inlining Analysis
• Array section summaries arrays for the Dependence Analyzer of the Loop Nest Optimizer.


ZelluX 2007-11-26 12:53 发表评论
]]>
Inter-Procedural Analysis 相关的资?(1)http://m.tkk7.com/zellux/archive/2007/11/25/163046.htmlZelluXZelluXSun, 25 Nov 2007 15:04:00 GMThttp://m.tkk7.com/zellux/archive/2007/11/25/163046.htmlhttp://m.tkk7.com/zellux/comments/163046.htmlhttp://m.tkk7.com/zellux/archive/2007/11/25/163046.html#Feedback0http://m.tkk7.com/zellux/comments/commentRss/163046.htmlhttp://m.tkk7.com/zellux/services/trackbacks/163046.htmlH然要做一个相关的~译优化目Q先放一点国外网的IPA的资料上来,教育|出国不方便

GCC wiki:

Analysis and optimizations that work on more than one procedure at a time. This is usually done by making walking the Strongly Connected Components of the call graph, and performing some analysis and optimization across some set of procedures (be it the whole program, or just a subset) at once.

GCC has had a callgraph for a few versions now (since GCC 3.4 in the FSF releases), but the procedures didn't have control flow graphs (CFGs) built. The tree-profiling-branch in GCC CVS now has a CFG for every procedure built and accessible from the callgraph, as well as a basic IPA pass manager. It also contains in-progress interprocedural optimizations and analyses: interprocedural constant propagation (with cloning for specialization) and interprocedural type escape analysis.

IBM的XL Fortran V10.1 for Linux:

Benefits of interprocedural analysis (IPA)

Interprocedural Analysis (IPA) can analyze and optimize your application as a whole, rather than on a file-by-file basis. Run during the link step of an application build, the entire application, including linked libraries, is available for interprocedural analysis. This whole program analysis opens your application to a powerful set of transformations available only when more than one file or compilation unit is accessible. IPA optimizations are also effective on mixed language applications.

 

Figure 2. IPA at the link step

The following are some of the link-time transformations that IPA can use to restructure and optimize your application:

  • Inlining between compilation units
  • Complex data flow analyses across subprogram calls to eliminate parameters or propagate constants directly into called subprograms.
  • Improving parameter usage analysis, or replacing external subprogram calls to system libraries with more efficient inline code.
  • Restructuring data structures to maximize access locality.

In order to maximize IPA link-time optimization, you must use IPA at both the compile and link step. Objects you do not compile with IPA can only provide minimal information to the optimizer, and receive minimal benefit. However when IPA is active on the compile step, the resulting object file contains program information that IPA can read during the link step. The program information is invisible to the system linker, and you can still use the object file and link without invoking IPA. The IPA optimizations use hidden information to reconstruct the original compilation and can completely analyze the subprograms the object contains in the context of their actual usage in your application.

During the link step, IPA restructures your application, partitioning it into distinct logical code units. After IPA optimizations are complete, IPA applies the same low-level compilation-unit transformations as the -O2 and -O3 base optimizations levels. Following those transformations, the compiler creates one or more object files and linking occurs with the necessary libraries through the system linker.

It is important that you specify a set of compilation options as consistent as possible when compiling and linking your application. This includes all compiler options, not just -qipa suboptions. When possible, specify identical options on all compilations and repeat the same options on the IPA link step. Incompatible or conflicting options that you specify to create object files, or link-time options in conflict with compile-time options can reduce the effectiveness of IPA optimizations.

Using IPA on the compile step only

IPA can still perform transformations if you do not specify IPA on the link step. Using IPA on the compile step initiates optimizations that can improve performance for an individual object file even if you do not link the object file using IPA. The primary focus of IPA is link-step optimization, but using IPA only on the compile-step can still be beneficial to your application without incurring the costs of link-time IPA.

 

Figure 3. IPA at the compile step

IPA Levels and other IPA suboptions

You can control many IPA optimization functions using the -qipa option and suboptions. The most important part of the IPA optimization process is the level at which IPA optimization occurs. Default compilation does not invoke IPA. If you specify -qipa without a level, or specify -O4, IPA optimizations are at level one. If you specify -O5, IPA optimizations are at level two.

Table 5. The levels of IPA
IPA Level Behaviors
qipa=level=0
  • Automatically recognizes standard library functions
  • Localizes statically bound variables and procedures
  • Organizes and partitions your code according to call affinity, expanding the scope of the -O2 and -O3 low-level compilation unit optimizer
  • Lowers compilation time in comparison to higher levels, though limits analysis
qipa=level=1
  • Level 0 optimizations
  • Performs procedure inlining across compilation units
  • Organizes and partitions static data according to reference affinity
qipa=level=2
  • Level 0 and level 1 optimizations
  • Performs whole program alias analysis which removes ambiguity between pointer references and calls, while refining call side effect information
  • Propagates interprocedural constants
  • Eliminates dead code
  • Performs pointer analysis
  • Performs procedure cloning
  • Optimizes intraprocedural operations, using specifically:
    • Value numbering
    • Code propagation and simplification
    • Code motion, into conditions and out of loops
    • Redundancy elimination techniques

IPA includes many suboptions that can help you guide IPA to perform optimizations important to the particular characteristics of your application. Among the most relevant to providing information on your application are:

  • lowfreq which allows you to specify a list of procedures that are likely to be called infrequently during the course of a typical program run. Performance can increase because optimization transformations will not focus on these procedures.
  • partition which allows you to specify the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize.
  • threads which allows you to specify the number of parallel threads available to IPA optimizations. This can provide an increase in compilation-time performance on multi-processor systems.
  • clonearch which allows you to instruct the compiler to generate duplicate subprograms with each tuned to a particular architecture.

Using IPA across the XL compiler family

The XL compiler family shares optimization technology. Object files you create using IPA on the compile step with the XL C, C++, and Fortran compilers can undergo IPA analysis during the link step. Where program analysis shows that objects were built with compatible options, such as -qnostrict, IPA can perform transformations such as inlining C functions into Fortran code, or propagating C++ constant data into C function calls.



ZelluX 2007-11-25 23:04 发表评论
]]>
վ֩ģ壺 ҵС身߿Ѹ | ҹƵ| ޳AVƬ| ĻӰƬ߹ۿ| ˾ҹѸ| þAVҹƷһ| ɫ7777Ƶ߹ۿ| Ѽվһҳ| 67paoǿ67194ҹ| Ļѹۿַ | 99ŮŮѾƷƵ߹ۿ| ҳվ߹ۿ| ޹Ʒһþ| Ʒҳ߲| þþ| ޾Ʒ߹ۿ| ޾AA߹ۿSEE| ѹۿavëƬվ| ĻѲ| AV˾Ʒһ| ɫ| һ| ҹƵѹۿ| þùѹۿƷ3| AVһ| ޻ҳ߹ۿ| þþþAVվ| ѿƷ鶹| Ʒ10000| ߹ۿƬ˳Ƶ| av뾫Ʒ| ޹պ߳ѿ| ޹߹ۿ| ޹˾Ʒþþþþۺ| ѿƬ߹ۿ| þùѹۿƷ3| Ƭѿ| һѹۿƵ| ޹Ʒlv| һɪ| ޻ɫַ|