<?xml version="1.0" ?> 
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
  <title>AMD Developer Forums - AMD Core Math Library (ACML)</title> 
  <description></description> 
  <link>http://forums.amd.com/forum/index.cfm?forumid=9</link> 
  <generator>FuseTalk Hosting Executive Plan</generator> 

	<item>
		<title>Valgrind memcheck complains about ACML cpuid function</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=122280</link> 
		<pubDate>2009-11-18T18:01:39 -05.00</pubDate> 
		<dc:creator>martinkr</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Dear all,</p>
<p>I'm using the ACML library on an Intel Core2 processor under Ubuntu linux. When using valgrind's memcheck (valgrind version 3.5) to debug my programs, I get a bunch of warning from ACML's acmlcpuid function. While this does not affect the program's functionality, I makes it difficult for me to find actual errors hidden in the middle of hundreds of warnings. Anyone else having the same problem? If it&nbsp; is a bug in the ACML library (and not some shortcoming of valgrind), I would be very happy if it could be fixed for the next ACML release.</p>
<p>Here is the error message (just call the sgemm function):</p>
<p>==1940== Conditional jump or move depends on uninitialised value(s)<br />==1940==&nbsp;&nbsp;&nbsp; at 0x4E9CF55: IdentifyCPUTYPE (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x4E9F7E1: acmlcpuid (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554DF81: sgemmp_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554F782: sgemm_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554D662: sgemm (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x40081E: main (test.cpp:20)<br />==1940== <br />==1940== Conditional jump or move depends on uninitialised value(s)<br />==1940==&nbsp;&nbsp;&nbsp; at 0x4C26470: strncmp (mc_replace_strmem.c:386)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x4E9D09D: IdentifyCPUTYPE (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x4E9F7E1: acmlcpuid (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554DF81: sgemmp_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554F782: sgemm_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554D662: sgemm (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x40081E: main (test.cpp:20)</p>
<p>...</p>
<p>==1940== Conditional jump or move depends on uninitialised value(s)<br />==1940==&nbsp;&nbsp;&nbsp; at 0x4E9B2AC: IdentifyIntelCpu (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x4E9D100: IdentifyCPUTYPE (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x4E9F7E1: acmlcpuid (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554DF81: sgemmp_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554F782: sgemm_ (in /usr/local/lib/libacml.so)<br />==1940==&nbsp;&nbsp;&nbsp; by 0x554D662: sgemm (in /usr/local/lib/libacml.so)<br /> ==1940==&nbsp;&nbsp;&nbsp; by 0x40081E: main (test.cpp:20)</p>
<p><br />Best,</p>
<p>Martin</p>]]></description>
	</item>

	<item>
		<title>Parallelized BLAS routines with OpenMP</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=122111</link> 
		<pubDate>2009-11-16T11:18:48 -05.00</pubDate> 
		<dc:creator>eS-Tea</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>Hello everyone!<br /><br />I am currently checking whether the OpenMP version of ACML is suitable for our algorithms which make extensive use of BLAS routines.<br />The release notes of ACML version 2.5.3 indicate that "SMP support has been added to various key level 2 and level 3 BLAS routines".<br />I also found a list of LAPACK routines parallelized with OpenMP in the release notes of ACML Version 3.6.0 but I did not find a corresponding<br />list for the BLAS routines.<br /><br />Can anyone provide me with such a list of BLAS routines which benefit from OpenMP?<br /><br />Regards,<br />Sven</p>]]></description>
	</item>

	<item>
		<title>Incompatibility between OpenMP in ACML and gfortran</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=121681</link> 
		<pubDate>2009-11-07T14:10:24 -05.00</pubDate> 
		<dc:creator>nmmamd</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <pre id="comment_text_0">This is ACML 4.3.0<br /><br />gfortran -fopenmp is incompatible with -lacml_mp under at least some<br />circumstances.  The program is LAPACK Cholesky and its solver, converted<br />to Fortran 90 and instrumented for a course.  It fails with 4.4.1 in ACML<br />(i.e. the LAPACK call returns an erroneous error value), and in 4.3.2<br />the solver (which does NOT call ACML) takes 10 times as long as it should<br />do.  Omitting the -fopenmp and using -lacml instead both work perfectly;<br />i.e. it fails ONLY if both -fopenmp and -lacml_mp are used.  Oh, and it<br />works with -llapack, too.  Oh, joy.<br /><br />Here is some of the gfortran grobble:<br /><br />osset$gfortran -v<br />Using built-in specs.<br />Target: x86_64-unknown-linux-gnu<br />Configured with: ../configure --prefix=/home/nmm/gfortran --disable-shared<br />--disable-threads --disable-bootstrap -enable-languages=fortran<br />--enable-werror=yes --enable-checking=all --disable-decimal-float<br />Thread model: single<br />gcc version 4.4.1 (GCC)<br />gosset$/usr/bin/gfortran -v<br />Using built-in specs.<br />Target: x86_64-suse-linux<br />...<br />Thread model: posix<br />gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)<br /><br />The source is fairly short and clean, but fred_f_1000 is 16 MB.<br />The 4.4.1 failure shows with a 160 MB test file, but not the 4.3.2<br />one.  However, I can the generating programs, which are very<br />short.<br /></pre>
<div></div>
<div></div>]]></description>
	</item>

	<item>
		<title>acml 4.3.0 + openmp = Segmentation fault (core dumped)</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=120517</link> 
		<pubDate>2009-10-18T06:00:02 -05.00</pubDate> 
		<dc:creator>Ruslan Tabolin</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I use gfortran 4.4.1 (and ifort 11.0) with&nbsp;acml 4.3.0 in Ubuntu 9.10 32bit (and Scientific Linux 5.3&nbsp;32bit). I compile this code:</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;program eigenlapack&nbsp;&nbsp; &nbsp; &nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;PARAMETER &nbsp; &nbsp; &nbsp; &nbsp;(n=1000)</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;PARAMETER &nbsp; &nbsp; &nbsp; &nbsp;(lwork=8*n)&nbsp;&nbsp; &nbsp; &nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;integer &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; m, info, iwork(5*n), IL, IU, ifail(n),i,j</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;REAL &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w(n), work(lwork), a(n,n), VL, VU, z(n,n)</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;open(11,file='eigen.txt',form='formatted',status='UNKNOWN')</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;IL = 1</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;IU = 5</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;do 10 i=1,n</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;do 20 j=1,n</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;a(i,j)=rand()*100</p>
<p>&nbsp;&nbsp; 20 continue</p>
<p>&nbsp;&nbsp; 10 continue</p>
<p>&nbsp;</p>
<p>!$omp parallel default(firstprivate)</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;CALL SSYEVX ('N', 'A', &nbsp;'U', n, a, n, VL, VU, IL, IU, 0.0d0,&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;m, w, z, n, work, lwork, &nbsp;iwork, ifail, info)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;write(*,'(3x,a)') 'Number of eigenvalues:'</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;write(*,'(3x,I100)') m</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;write(*,'(3x,a)') 'Eigenvalues:'</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;write(11,'(10000f12.2)') (w(i),i=1,n)</p>
<p>&nbsp;</p>
<p>!$omp end parallel</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;end</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>If I set "export OMP_NUM_THREADS=1" then it works correctly. But if I use more than one thread (OMP_NUM_THREADS=2) then it don't work:</p>
<p>&nbsp;</p>
<p>Segmentation fault (core dumped)</p>
<p>&nbsp;</p>
<p>It work&nbsp;with variable n=416 or less.&nbsp;</p>
<p>Single version of code work right.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>What can I do?</p>]]></description>
	</item>

	<item>
		<title>open64_mp missing?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=120396</link> 
		<pubDate>2009-10-15T18:14:00 -05.00</pubDate> 
		<dc:creator>jacob_liberman</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Is there an mp version of ACML 4.3.0 for Open64?</p>
<p>(Similar to gfortran64_mp)</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>ACML-GPU on single-precision only GPUs</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=120007</link> 
		<pubDate>2009-10-08T10:43:55 -05.00</pubDate> 
		<dc:creator>black_jack</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi,<br />my question is very simple: is it possible to use the GPU acceleration for ACML on single-precision only GPUs?<br />I ask this because I tried to run the sgemm C example on my notebook, equipped with a Mobile Radeon HD 4570, and I got the following error:</p>
<p>ACML example: SGEMM call<br />--------------------------------------------------------------<br />[...]<br />calclCompile returned ERROR (Operational error)<br />No error<br />ERROR: ILScanILBinary: Unsupported opcode for architecture<br />ERROR: failed to compile compute kernels for DGEMM<br />ERROR: Failed to initialize GPU(s) for SGEMM<br /><br />Warning: libCALBLAS is not reentrant!<br />Multi-threaded use of this library is not supported.<br />When GPU-accellerated ACML routines are called on multiple threads<br />concurrently, the requests will be executed serially, even if multiple<br />GPUs are present.<br />[...]</p>
<p>I recompiled the source code by myself with Windows SDK v.7, and the result is the same. This is very strange, because I used the Stream SDK (v1.4) with no problems...<br />My OS is Vista x64 (SP2), and the ATI drivers are the latest supported by Dell (ver.8.634.3).</p>
<p>Thanks,<br />Giacomo</p>]]></description>
	</item>

	<item>
		<title>How to use ACML with different versions of GCC/GFORTRAN</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=119824</link> 
		<pubDate>2009-10-05T01:33:37 -05.00</pubDate> 
		<dc:creator>jkvelu</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>
<p class="MsoNormal">How to use ACML with different versions of GCC/GFORTRAN</p>
<p class="MsoNormal">We regularly receive comments in our archive survey section about ACML with different versions of GCC/FORTRAN. So we have added a articles on &ldquo;How to use ACML with different versions of GCC/GFORTRAN&rdquo; here <a href="http://developer.amd.com/documentation/articles/pages/ACMLwithDifferentGCCGFORTRAN.aspx">http://developer.amd.com/documentation/articles/pages/ACMLwithDifferentGCCGFORTRAN.aspx</a></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">We hope this article is useful. Please post your comments or share you experience here.</p>
<p class="MsoNormal">&nbsp;</p>
</p>]]></description>
	</item>

	<item>
		<title>The problem of FFT performance of ACML-GPU</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=119298</link> 
		<pubDate>2009-09-24T00:48:16 -05.00</pubDate> 
		<dc:creator>scutan</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi, all.</p>
<p>I have tested the performance of FFT using ACML-GPU library, but I found that the performance of FFT running on ATI is very poor.</p>
<p>For instance, while doing FFT on 2M float complex data, it takes about 140ms, however, I have also tested the performance of FFT running on nVidia card. the same size running on nVidia will only take 39ms. Also, the price of my nVidia card is much cheaper than the ATI card.</p>
<p>So, is it because the FFT&nbsp; of ACML-GPU is not accelerated?</p>
<p>Thanks. &nbsp;</p>]]></description>
	</item>

	<item>
		<title>ACML-GPU status</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=118710</link> 
		<pubDate>2009-09-11T05:22:04 -05.00</pubDate> 
		<dc:creator>w.miah@rl.ac.uk</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Does anyone know the status of ACML subroutines implemented for the GPU? For example, are all the BLAS (level 1, 2 and 3) subroutines implemented for the FireStream GPUs?</p>
<p>I have code that calls BLAS subroutines which I want executed on the GPU. Will it be as simple as re-linking the object file with the ACML-GPU libraries?</p>
<p>Thanks in advance.</p>]]></description>
	</item>

	<item>
		<title>No GPU found</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=118081</link> 
		<pubDate>2009-08-27T11:48:55 -05.00</pubDate> 
		<dc:creator>gsteri1</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>&nbsp;</p>
<p>I apologize if I am missing something obvious. I installed a new Radeon HD 4850 based card into an AMD Linux box running OpenSUSE 11.1. The card seems to work. fgl_fglxgears works fine. When I attempt to use the examples in acmlg1 (specifically info) I get:</p>
<p>&nbsp;</p>
<p>CAL RT version: 1.3.186<br />CAL CL version: 1.3.186<br /><br />GPUs found: 0</p>
<p>&nbsp;</p>
<p>Is the 4850 card, not supported? Or have I made a mistake on some configuration?</p>
<p>&nbsp;</p>
<p>Many thanks,</p>
<p>&nbsp;</p>
<p>-Greg</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Small Lapack &amp; ACML (with GPU) Performance Test needed</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=117584</link> 
		<pubDate>2009-08-17T12:42:09 -05.00</pubDate> 
		<dc:creator>ilghiz</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>our company is going to upgrate a hardware for internal usage and we are thinkig what to take. Before, on pure CPU time, AMD platforms was cheaper in regards to achieved GFlop/s per dollar. Right now we have some development on NVIDIA card, so, NVIDIA GTX 260 + AMD 4 Cores Optherons are cheaper that AMD 4 Cores Optherons for us, however, we did not measure HD4870 + 4/6 Cores Optherons. We have no possibility to check GPU enabled ACML.</p>
<p>Would anybody help us to run small test on GPUs like 4870?</p>
<p>It is pure C: http://www.elegant-mathematics.com/software/em-acml-gpu-test.c</p>
<p>Please, compile it with Lapack and ACML library under Linux. It takes several minutes to run and produce one page output.</p>
<p>It would be very kind if you can run the same test with and without GPU acceleration and provide me both results together with your CPU and GPU configuration.</p>
<p>Thank you</p>
<p>Ilgis</p>
<p>--</p>
<p>Dr. Ilgis Ibragimov, VP</p>
<p>Elegant Mathematics Ltd.</p>]]></description>
	</item>

	<item>
		<title>When GPU-accelerated version of FFT will be available?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=117335</link> 
		<pubDate>2009-08-12T06:11:10 -05.00</pubDate> 
		<dc:creator>Raistmer</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ ACML suffers from lack of GPU-accelerated FFT routines AFAIK.<br />When they will be added?<br />]]></description>
	</item>

	<item>
		<title>Sparse Matrices</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=117114</link> 
		<pubDate>2009-08-07T11:00:13 -05.00</pubDate> 
		<dc:creator>donpellegrino</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Given an AMD on 64-bit Linux environment what is the best way to work with sparse matrices?&nbsp; I hunted through the AMD Core Math Library but I didn't see support for a sparse matrix.&nbsp; I found the documentation for "3 BLAS: Basic Linear Algebra Subprograms" at http://developer.amd.com/cpu/Libraries/acml/onlinehelp/Documents/BLAS.html#index-sparse-BLAS-47 which reads:</p>
<p>"ACML also includes interfaces to the extensions to Level 1 BLAS known as the sparse BLAS. These routines perform operations on a sparse vector x which is stored in compressed form and a vector y in full storage form.  See reference [4] for more information."</p>
<p>So it would seem that ACML supports sparse vectors but not sparse matrices.&nbsp; If this is the case can someone recommend an alternative library?</p>]]></description>
	</item>

	<item>
		<title>Problem with linking acml4.3.0\ifort32\lib\libacml.lib</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=116971</link> 
		<pubDate>2009-08-04T15:27:03 -05.00</pubDate> 
		<dc:creator>kcvogelsang</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I'm trying to build a DLL that links to acml4.3.0\ifort32\lib\libacml.lib,<br />it's all compling fine without warnings, but I get this linker error:</p>
<p>
<p><span style="font-size: x-small;">
<p>error LNK2019: unresolved external symbol _cfft1d@20 referenced in function <a href="mailto:_Stft50InverseFftInit@16">_Stft50InverseFftInit@16</a></p>
<p>The file with Stft50InverseFFTInit in it is a plain .c file and was linked into a static .lib. This .lib and libacml.lib are then linked into the final DLL.</p>
<p>Any ideas?</p>
<p>Thanks,<br />-Carlo</p>
<font size="2">
<p>&nbsp;</p>
</font></span>
<p>&nbsp;</p>
<br /></p>
</p>]]></description>
	</item>

	<item>
		<title>Problem with SuperLU on windows 32bit</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=116563</link> 
		<pubDate>2009-07-26T01:56:38 -05.00</pubDate> 
		<dc:creator>byteme</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I am getting this:</p>
<p>&nbsp;** ACML error: on entry to ZTRSV&nbsp; parameter number&nbsp; 1 had an illegal value</p>
<p>using SuperLU with the intel fortran 32 bit acml library, however all the other functions seem to work.</p>
<p>Parameter 1 is called using the form:</p>
<p>"U"</p>
<p>which is a char *.&nbsp; What is the proper calling convention for the fortran interface when passing these single character arguments?&nbsp; I am using visual c++ 2008 expres.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Can someone please tel me how to use ACML</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=116416</link> 
		<pubDate>2009-07-23T03:28:10 -05.00</pubDate> 
		<dc:creator>prako</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi All,</p>
<p>I want to use ACML to try and optimise a serial application. I use Visual Studio 2005. Working on Win XP, AMD Opteron 4-core cpu. Please tel me how to link ACML wit ma application once i download&nbsp; it...........</p>]]></description>
	</item>

	<item>
		<title>Sunperf and ACML</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115698</link> 
		<pubDate>2009-07-06T14:40:17 -05.00</pubDate> 
		<dc:creator>mbaran</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Can ACML and the sun performance library be used together?</p>
<p>&nbsp;</p>
<p>When I do a</p>
<p>#include &lt;sunperf.h&gt;</p>
<p>#include &lt;acml.h&gt;</p>
<p>&nbsp;</p>
<p>I get TONS of errors</p>]]></description>
	</item>

	<item>
		<title>Portability - BLAS, LAPAC</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115562</link> 
		<pubDate>2009-07-03T20:52:38 -05.00</pubDate> 
		<dc:creator>mbaran</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>All,</p>
<p>I am a newb here, so please bare with me.</p>
<p>I have a project to create some DSP code in C. However, this code needs to be portable. Our computing platform has a bunch of AMD cores, so I would like to use ACML.</p>
<p>Is there a way to write C code that calls BLAS and LAPACK so that things are portable? If years from now, someone decides to buy different CPU's I don't want to write things again.</p>
<p>From my understanding, there are three ways to call BLAS, and LPACK:</p>
<ul>
<li><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;">direct FORTRAN calls</span></li>
<li><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;"><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;">C interface routines (CBLAS, CLAPACK)</span></span></li>
<li><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;"><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;">ACML wrappers</span></span></li>
</ul>
<p><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;"><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;">
<p class="MsoNormal" style="margin: 0in 0in 0pt;">The direct FORTRAN calls are difficult. However, they definitely will be portable with any vendor that implements BLAS or LAPACK.</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;">&nbsp;</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;">The vendor wrappers are nice and easy to understand. However, you cannot directly take the code from say, the Sun Performance Library and run it on another box that implements BLAS or LAPACK.</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;">&nbsp;</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt;"><span style="font-size: 12pt; font-family: "Calibri","sans-serif"; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: EN-US;">The CBLAS and CLAPACK are like an in between. They are easier to call from C</span></p>
</span></span></p>
<p>&nbsp;</p>
<p><span style="font-size: small; font-family: Calibri;">Can anyone speak on this? Anything would be great. Thank you !</span></p>]]></description>
	</item>

	<item>
		<title>Using ACML</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115498</link> 
		<pubDate>2009-07-02T07:27:12 -05.00</pubDate> 
		<dc:creator>prako</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Hi All,<br />Im new to ACML. My application is a legacy single threaded VC++ code(MSVS2005) with a good part of it involving linear algebra, FFT etc. I wanna know how i shud start using ACML in my application to increase its performance.</p>]]></description>
	</item>

	<item>
		<title>ACML access with MSVS</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115497</link> 
		<pubDate>2009-07-02T05:58:43 -05.00</pubDate> 
		<dc:creator>prako</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi,<br />Im Using MSVS 2005. How can i link ACML with the projects in VS2005????</p>]]></description>
	</item>

	<item>
		<title>Subcontracting for AMD/ATI (copy from General discussion forum)</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115476</link> 
		<pubDate>2009-07-01T17:36:15 -05.00</pubDate> 
		<dc:creator>ilghiz</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <div class="MessageText_Container">
<p>Hi,</p>
<p>would like to be advised to whom can I forward my question. Our company developed a lot of numerical software starting from sparse/dence/compressed linear system solvers, CFD and wave propargation and many others. Last year we port a lot of our software to NVIDIA GPUs (I can provide 4 reference links from NVIDIA corporate site to our site).</p>
<p>We can use our 17 years software development experience to considerably improve ACML that will compete with MKL and beat most of CUDA enabled BLAS/LAPACK and other scientific libraries.</p>
<p>It would be very kind if somebody can forward this message or my contacts to some executives in AMD for discussion.</p>
<p>Sincerely,</p>
<p>Ilgis Ibragimov</p>
<p>--</p>
<p>Dr. Ilgis Ibragimov</p>
<p>Vice-President</p>
<p>Elegant Mathematics Ltd.</p>
<p>+49-163-7414473</p>
</div>]]></description>
	</item>

	<item>
		<title>dynamic versus static linking</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115242</link> 
		<pubDate>2009-06-25T18:33:15 -05.00</pubDate> 
		<dc:creator>byteme</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Are statically linked archives faster than shared libraries on AMD 64?&nbsp; Is the performance difference significant for the ACML?</p>]]></description>
	</item>

	<item>
		<title>(32bit static) Windows builds with a free Fortran compiler&amp;co.</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115184</link> 
		<pubDate>2009-06-24T07:43:33 -05.00</pubDate> 
		<dc:creator>domagoj.saric</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>1.</p>
<p>Looking at the archive I found that version 3.6.0 of the ACML is the last one that provides a 32bit static .lib build for the Win32 platform with a free (GNU) Fortran compiler. My question is why was this practice discontinued?</p>
<p>It seems rather pointless to provide and distribute a free library that forces you to use a 3rd party commercial compiler (therefor with no benefit to you/AMD/the library developer) only to be able to link statically with the library (because the ACML library needs functions from the Fortran compiler's runtime lib)...</p>
<p>Is there any chance that 32bit and 64bit builds with free tools will be included in the official releases again? <img src="i/expressions/face-icon-small-wink.gif" border="0"></p>
<p>&nbsp;</p>
<p>2.</p>
<p>While here I'd also like to request that builds that link both statically and dynamically to the CRT be available (otherwise you force users to a specific linkage strategy that might not suit their needs).</p>
<p>&nbsp;</p>
<p>3.</p>
<p>It would also be welcome if you could provide (Windows) static libs that are built with "link time code generation" (the /GL and /LTCG switches in MSVC) so as to provide more efficient (most notably in terms of final binary size) final binaries.</p>
<p>Connected to the "space efficiency" issue is the usage of the CRT: looking at the distributed libs one can see that they mention (for example) many functions from the printf family. I suppose that those are from debugging code that was not properly removed from release builds and are probably completely useless/pointless in a math library (used in a GUI environment). If I am correct in this assumption then it would also be welcomed if such "useless code" be removed from release builds <img src="i/expressions/face-icon-small-wink.gif" border="0"></p>]]></description>
	</item>

	<item>
		<title>GNU OpenMP Linking issues</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=115135</link> 
		<pubDate>2009-06-23T13:53:19 -05.00</pubDate> 
		<dc:creator>idg101</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I am using acml 4.2.0 and gcc 4.4 and when I try to compile, I get the following linking errors:</p>
<p>&nbsp;</p>
<p>12:34pm|sf1&gt; make -f Makefile.integration_test.openmp<br />g++ -I/mnt/ddn3/Geomatics/lib/acml4.2.0/gfortran64_mp/include&nbsp; -L/mnt/ddn3/Geomatics/lib/gcc-4.4/lib64/ -L/mnt/ddn3/Geomatics/lib/acml4.2.0/gfortran64_mp/lib -lgfortran -lm -lpng -lstdc++ -lrt -lacml_mp integration_test.o CUls.o CNnls.o CUtils.o CMatrix.o CMathFunctions.o CStopwatch.o CDataBlock_Float32.o CLogger.o CPng.o&nbsp; -o integration_test</p>
<p><br />/home/isaacg/opt/gcc-4.4/lib64/libgomp.so.1: undefined reference to `pthread_setaffinity_np@GLIBC_2.3.4'<br />/home/isaacg/opt/gcc-4.4/lib64/libgomp.so.1: undefined reference to `pthread_attr_setaffinity_np@GLIBC_2.3.4'<br />/home/isaacg/opt/gcc-4.4/lib64/libgomp.so.1: undefined reference to `pthread_getaffinity_np@GLIBC_2.3.4'<br />/home/isaacg/opt/gcc-4.4/lib64/libgomp.so.1: undefined reference to `__sched_cpucount@GLIBC_2.6'<br />collect2: ld returned 1 exit status<br />make: *** [integration_test] Error 1</p>
<p>Please help!</p>
<p>&nbsp;</p>
<p>Thanks in advance,</p>
<p>Isaac</p>]]></description>
	</item>

	<item>
		<title>Linking trouble with VS2008 &amp; Ifort v11</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=114425</link> 
		<pubDate>2009-06-08T11:31:11 -05.00</pubDate> 
		<dc:creator>_Sigma</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ <p>I'm trying to use the fastsinf and fastexpf routines of ACML in some existing fortran code that I am compiling under Windows XP with Visual Studio 2008 and Intel fortran v11.0 build 20080930.</p>
<p>My steps:</p>
<ol>
<li>Download the ifort release of ACML</li>
<li>Replace an exp call with fastexpf</li>
<li>Under project properties -&gt; linker -&gt;&nbsp; input -&gt; additional dependencies and add to it c:\AMD\acml4.2.0\ifort32_mp\lib\libacml_mp_dll.lib, which is where the library is located.</li>
</ol>
<p>However, upon build, it says that there is an unresolved external symbol _FASTEXPF in my function.</p>
<p>I am primairly a C++ coder, so I am familiar with linking problems with that language, however I"m a bit new to Fortran, so perhaps I am missing something simple. Would appreciate any insight into this.</p>
<p>//edit</p>
<p>I should also note that I have tried all the variants, _mp and otherwise, static and shared. All result in the same error. I am also compiling with /Qopenmp so that shouldn't be the problem</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Plato + ACML 4.2.0 32bit for Windows</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=114096</link> 
		<pubDate>2009-06-01T04:46:02 -05.00</pubDate> 
		<dc:creator>weikoon</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I downloaded Intel Fortran ACML 4.2.0 32bit for Windows to use with Salford Plato 3.</p>
<p>I link to the library (static/dynamic) via the Reference in the project tree. After compilation, I get the message:</p>
<p>WARNING the following symbols are missing:<br />_fltused&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C:\AMD\acml4.2.0\ifort32\lib\libacml.lib (dgetri.obj/&nbsp;&nbsp;&nbsp; )<br />_alloca_probe&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C:\AMD\acml4.2.0\ifort32\lib\libacml.lib (dtrtri.obj/&nbsp;&nbsp;&nbsp; )<br />_chkstk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C:\AMD\acml4.2.0\ifort32\lib\libacml.lib (dgemv.obj/&nbsp;&nbsp;&nbsp;&nbsp; )<br />__intel_f2int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C:\AMD\acml4.2.0\ifort32\lib\libacml.lib (/12742&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; )<br />.....</p>
<p>What's wrong? My questions are;</p>
<p>Do I need to compile ACML? How?</p>
<p>What's the header file in the \ifort32\INCLUDE\ directory for?</p>
<p>Rreally gald if anyone can advise.</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>Fast Fourier Transform in ACML 4.2.0</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=112946</link> 
		<pubDate>2009-05-06T12:22:44 -05.00</pubDate> 
		<dc:creator>maxbelkin</dc:creator>
   	    <slash:comments>6</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I've got s small problem with <strong>ZFFT2DX</strong> in <strong>ACML 4.2.0</strong>.</p>
<p>I wrote a sample program - all it does is it takes <strong>2D</strong> array of <strong>real</strong> data, then it takes forward Fourier Transform, multiplies it by <span style="text-decoration: underline;"><strong>1/scale</strong></span> and then takes backward Fourier Transform. In ideal case I would get original array. But, in real world I get tiny imaginary parts, which are close to 0. Is there any way to get rid of this tiny non-zero elements in the final result?</p>
<p>The reason I'm asking is that in my basic program I combine two real quantaties into complex number and perform the procedure described above (in order to take one FFT instead of two). And currently it means that one quantaty influences another.</p>
<p>I use <strong><em><span style="text-decoration: underline;">ACML 4.2.0</span></em></strong> in conjunction with <span style="text-decoration: underline;"><em><strong>ifort</strong></em></span> (lates version) OR <strong><em><span style="text-decoration: underline;">gfortran</span></em></strong>. Same results for both compilers.</p>
<p><strong>Compiling lines:</strong></p>
<p><em>gfortran -m64 -static acml_fft_check.f -L/opt/acml4.2.0/gfortran64/lib -lacml</em></p>
<p><em>ifort -O3 -static acml_fft_check.f -L/opt/acml4.2.0/ifort64/lib/ -lacml</em></p>
<p>&nbsp;</p>
<p><strong>The code of sample program:</strong></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; program compare_ffts<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; parameter (N=5)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; implicit real*8 (a-h,o-z)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; complex*16 COMM(N*N+6*N+200), m1(N,N)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; complex*16 CMPT2(N,N), diff(N,N), CHECK(N,N)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; real*8 cor1, sc, scale<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; logical ltrans, inpl<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CMPT2=dcmplx(0.0d0,0.0d0)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sc=1.0d0<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ltrans=.false.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; inpl=.false.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; call zfft2dx(100,sc,ltrans,inpl,N,N,CMPT1,1,N,m1,1,N,COMM,INFO)<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do j=1,N<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do i=1,N<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CMPT2(i,j)=((dble(j)-1.0d0)*dble(N)+dble(i))*1.0d0<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CHECK(i,j)=CMPT2(i,j)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end do<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end do</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sc=1.0d0<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ltrans=.true.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; inpl=.true.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CALL zfft2dx(-1,sc,ltrans,inpl,N,N,CMPT2,1,N,m1,1,N,COMM,INFO)</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; scale=N**2</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do j=1,N<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do i=1,N<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CMPT2(i,j)=CMPT2(i,j)/scale<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end do<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end do<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CALL zfft2dx(1,sc,ltrans,inpl,N,N,CMPT2,1,N,m1,1,N,COMM,INFO)<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(*,*) 'CMPT_ACML AFTER'<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; diff=CMPT2-CHECK<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(*,*) 'mean_real= ', sum(dble(diff))/scale<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(*,*) 'mean_imag= ', sum(dimag(diff))/scale<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end</p>
<p>&nbsp;</p>
<p>I should also mention that there is no problem if<strong> N&lt;=4.</strong></p>
<p>Also, these non-zero imaginary values are less if <strong>N = 4, 16, 64, 256.</strong> However further increase to <strong>N=512, 1024</strong> doesn't lead to decrease in the error.</p>
<p>In my basic program I use <strong>N=512</strong>.</p>
<p>&nbsp;</p>
<p><strong>GCC/GFORTRAN:</strong> version 4.3.2</p>
<p><strong>IFORT:</strong> Version 11.0</p>]]></description>
	</item>

	<item>
		<title>ACML 4.0.0 on Intel Xeon 5420</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=112528</link> 
		<pubDate>2009-04-27T10:04:53 -05.00</pubDate> 
		<dc:creator>3on</dc:creator>
   	    <slash:comments>1</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I tried to benchmark DGEMM & SGEMM operations of 32-bit ACML openMP library compiled by Fortran compiler on 2x 4-core Intel Xeon 5420 with 32-bit Windows XP SP3.</p>
<p>Version 4.2.0 and 4.1.0 did not run at all.</p>
<p>Version 4.0.0 ran surprisingly faster than Intel MKL 10.1.0.018 on larger matrices (256x256 for SGEMM and 2048x2048 for DGEMM). ACML's SGEMM operation on 4096x4096 matrices was nearly 5-times faster than MKL with 8 threads!</p>
<p>I experimented with setting different number of threads by calling omp_set_num_threads function before calling DGEMM or SGEMM routine, but could not get any difference in performance of ACML. No change with OMP_NUM_THREADS environment variable either. MKL was always affected when changing number of threads, so my code should be ok.</p>
<p>Some idea how to set number of threads for ACML 4.0.0 on Intel Xeon 5420?</p>]]></description>
	</item>

	<item>
		<title>GCC 4.1.2 or G++</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=112524</link> 
		<pubDate>2009-04-27T08:11:21 -05.00</pubDate> 
		<dc:creator>kty027</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi</p>
<p>I'm Mr. Kim.</p>
<p>I am looking for some information.</p>
<p>We have a Intel(CPU) system, a AMD(CPU) system</p>
<p>And we will compile a program using the GCC 4.1.2 (from AMD) on AMD system.</p>
<p>So I wonder, do the binary (compiled GCC 4.1.2 from AMD) execute the two system (Intel, AMD)?</p>
<p>We will use same binary. not to recompile on Intel system.</p>]]></description>
	</item>

	<item>
		<title>MSVCR90.DLL needed?</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=112393</link> 
		<pubDate>2009-04-24T11:25:51 -05.00</pubDate> 
		<dc:creator>Xezlec</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>It looks like ACML 4.1.0 does not come with the MSVCR90.DLL redistributable, which it requires.&nbsp; Am I correct in that observation?</p>
<p>I think that would technically be a bug in the distribution, because this DLL, though redistributable, is not publicly available, and I don't own a copy (because I do not happen to have Visual Studio 2008).</p>
<p>&nbsp;</p>]]></description>
	</item>

	<item>
		<title>ACML 4.1.0 vs MKL 10.q opt2356 E5530 DGEMV()</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=112043</link> 
		<pubDate>2009-04-16T13:10:32 -05.00</pubDate> 
		<dc:creator>brockp</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I am seeing some interesting behavior of a new Itel chips, vs barcelonas, and MKL ACML for the DGEMV kernel,</p>
<p>&nbsp;</p>
<p>Problem size is 1010, openMP 8 cores</p>
<p>dgemv()</p>
<p>CPU&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MFlop/s&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; BLAS Lib</p>
<p>opt2356&nbsp;&nbsp; 838&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ACML 4.1</p>
<p>E5530&nbsp;&nbsp;&nbsp; 4435&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ACML 4.1</p>
<p>opt2356&nbsp;&nbsp; 858&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MKL</p>
<p>E5530&nbsp;&nbsp;&nbsp; 3743&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MKL</p>
<p>&nbsp;</p>
<p>Strange thing is the DGEMM() kernel and DDOT() are about the same speeds on both systems.&nbsp; With both BLAS libraries.&nbsp; ACML has issues with dgemm() on the Intel and MKL has issues with dgemm() on the amd, no surpise.</p>
<p>&nbsp;</p>
<p>I expected the tripple channgel memory bandwdith of the Intel to show an 50% improvment in the ddot() and similar kernels, but am not.</p>
<p>&nbsp;</p>
<p>I do like the imporoved DGEMV() performance of the new intel platform, and I wish I would have tested it on a Shanghi, I also like how ACML is getting perofmrnace bumps in DGEMV() the same as MKL. Portability is nice must say.</p>
<p>&nbsp;</p>
<p>Any comments would be liked.</p>
&nbsp;]]></description>
	</item>

	<item>
		<title>QR Factorization</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=111915</link> 
		<pubDate>2009-04-14T04:24:16 -05.00</pubDate> 
		<dc:creator>Nir</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Hi,</p>
<p>I'm trying to use ACML LAPACK/BLAS routines. I specifically use C fortran interfaces (e.g. function names end with underscores).</p>
<p>The problem I face is that ACML method signatures differ from the LAPACK/BLAS standard. For instance, in LAPACK dormqr take 13 parameters, while the ACML version requires 15 parameters.</p>
<p>The first problem is that now I have to change my code specifically for ACML usage. The second problem is that there's no documentation for the additional parameters.</p>
<p>Where can I find documentation for the additional parameters? Is there a header file the comply with LAPACK/BLAS?</p>
<p>Thanks!</p>]]></description>
	</item>

	<item>
		<title>ACML on Windows with GFortran</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=111048</link> 
		<pubDate>2009-03-25T18:42:37 -05.00</pubDate> 
		<dc:creator>dgreisen</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>I'm running 32 bit windows and using GFortran with MinGW. I'm wondering which version of ACML I want to download. Should I use one of the GFortran builds for Linux, or the PGI or Intel builds for Windows?</p>
<p>Also, if I'd like to be able to run the .exe on both 32 bit and 64 bit windows systems, should I choose 32 or 64 bit ACML for the best performance.</p>
<p>Thank you,</p>
<p>Daniel</p>]]></description>
	</item>

	<item>
		<title>ACML 4.2.0 ifort 32 bit windows, how to link with the static library</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=110187</link> 
		<pubDate>2009-03-09T18:23:44 -05.00</pubDate> 
		<dc:creator>jarod4</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I try to link with the static library(libacml.lib) in VS 2008. The linker reports missing ifconsol.lib. The dll version works fine.</p>
<p>Ifconsol.lib is a part of intel fortran, and I don't have it.</p>
<p>Can someone tell me how to solve this problem?&nbsp;</p>
<p>Thanks&nbsp;</p>]]></description>
	</item>

	<item>
		<title>SuperLU with ACML</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=109913</link> 
		<pubDate>2009-03-04T12:27:23 -05.00</pubDate> 
		<dc:creator>uokuyucu</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>Hi folks,</p>
<p>I'm trying to use ACML's BLAS library in SuperLU</p>
<p>(http://crd.lbl.gov/~xiaoye/SuperLU/)</p>
<p>and somehow not bein able to manage that. ACML's blas library is not recognized as a library by SuperLU make process. Anyone experienced such thing?</p>]]></description>
	</item>

	<item>
		<title>acml4.2.0 and VC6</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=108632</link> 
		<pubDate>2009-02-11T21:35:49 -05.00</pubDate> 
		<dc:creator>generic</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>version 4.2.0 for windows does not work with VC6. Version 4.1.0 seems to work fine. Is this intended?</p>
<p>&nbsp;</p>
<p>Also, the documentation refers to VC example projects that are not included in the release.</p>]]></description>
	</item>

	<item>
		<title>ACML and numpy, revisited</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=107485</link> 
		<pubDate>2009-01-24T15:08:39 -05.00</pubDate> 
		<dc:creator>gideonsimpson</dc:creator>
   	    <slash:comments>2</slash:comments> 
		<description><![CDATA[ <p>I would like to configure my numpy to use the ACML's BLAS/LAPACK.&nbsp; However, as I gather from a post several months ago, there is a problem with this as the ACML does not include a CBLAS.&nbsp; I was wondering if anyone had successfully accomplished this goal, and what the easiest way to get it done was.</p>]]></description>
	</item>

	<item>
		<title>ACML bug (or undocumented feature?) : 32 bit versions for GFortran, PGI Fortran</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=106528</link> 
		<pubDate>2009-01-09T13:10:15 -05.00</pubDate> 
		<dc:creator>shamsundar@uh.edu</dc:creator>
   	    <slash:comments>3</slash:comments> 
		<description><![CDATA[ While debugging some code, I found that the ACML 32-bit libraries for GFortran (4.2.0, downloaded today) and PGI Fortran (3.6.0) for Linux contained code sequences such as<br /><br />f3 dd 06             	repz fldl (%esi)<br />f3 dd 07             	repz fldl (%edi)<br />f3 d8 c9             	repz fmul %st(1),%st<br /><br />in routines such as "dmmkern30x87_". <br /><br />Valgrind flags these instructions as invalid and raises a SIGILL exception. <br /><br />The AMD CPU instruction set descriptions state that "rep", "repz" and "repnz" prefixes apply only to string operations.<br /><br />Is there a gap in my understanding of this issue?<br /><br />I am running openSUSE 11.0-x64, on an HP PC with an Athlon-X2 CPU, 4G RAM.<br /><br />Thanks.<br /><br />N. Shamsundar<br />University of Houston]]></description>
	</item>

	<item>
		<title>ACML in C++</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=105830</link> 
		<pubDate>2008-12-29T13:31:29 -05.00</pubDate> 
		<dc:creator>quimey</dc:creator>
   	    <slash:comments>5</slash:comments> 
		<description><![CDATA[ <p>Hi:</p>
<p>How I can use ACML in C++? I am using Fedora 9 with gcc 4.3.0, I want to use Lapack, Do you know some reference for its routines? There is something in AMD's page but I don't understand it.</p>
<p>Thanls</p>]]></description>
	</item>

	<item>
		<title>linking problems with gcc version 4.2</title>
		<link>http://forums.amd.com/forum/messageview.cfm?catid=217&amp;threadid=105287</link> 
		<pubDate>2008-12-20T14:42:38 -05.00</pubDate> 
		<dc:creator>r.lopez.negrete</dc:creator>
   	    <slash:comments>4</slash:comments> 
		<description><![CDATA[ <p>Hi all,</p>
<p>I've been having some problems linking the library version libacml-4.2.0 to another thirdparty solver (Ipopt). I'm using ubuntu 7.10 with gnu-gcc/gfortran version</p>
<p>$ gfortran-4.2 -v<br />Using built-in specs.<br />Target: x86_64-linux-gnu<br />Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu<br />Thread model: posix<br />gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4)</p>
<p>I've been getting the following problem. and the question is if I need gcc version 4.3 for this to work?</p>
<p>/usr/bin/ld: warning: libgfortran.so.3, needed by /opt/acml4.2.0/gfortran64/lib/libacml.so, not found (try using -rpath<br />&nbsp;or -rpath-link)<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_compare_string@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_transfer_character@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_transfer_integer@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_stop_numeric@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_st_write_done@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_pow_i4_i4@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_transfer_real@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_st_read@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_st_read_done@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_st_write@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_internal_pack@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_concat_string@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_internal_unpack@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_runtime_error@GFORTRAN_1.0'<br />/opt/acml4.2.0/gfortran64/lib/libacml.so: undefined reference to `_gfortran_string_index@GFORTRAN_1.0'</p>
<p>Thanks!</p>
<p>Rodrigo</p>]]></description>
	</item>

</channel>
</rss>
