MottS Posted April 11, 2003 Report Share Posted April 11, 2003 Hi all I recompiled my Gentoo system this week and I've learn quite a lot. First of all I compiled my system with some additionnal options CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer" Last time I used the i686 option... that's it. Now my system is compiled from the base up to KDE with those optimized flags for my system. The fomit-frame-pointer basically disable the verbose mode when crashes occurs. I don't want to debug anything so I put that flag there. Here is a website listing all the safe flags for pretty much all type of computer. http://www.freehackers.org/gentoo/gccflags.../flag_gcc3.html Another one for 'not so safe' flags.. lol (didn't try tho) http://www.freehackers.org/gentoo/gccflags...ag_gcc3opt.html Now, when came the time to compile kde-base I used a 'myconf' variable to pass some options to the ./configure command. I used the --enable-final --disable-debug --enable-fast-malloc=full [kde-freebsd] Re: --enable-final Will Andrews Will Andrews <will@csociety.org> Mon, 21 Jan 2002 12:51:51 -0500 * Previous message: [kde-freebsd] --enable-final * Next message: [kde-freebsd] Re: --enable-final * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] On Mon, Jan 21, 2002 at 12:46:21PM -0500, Alan E wrote: > I see configure says "build size optimized apps (experimental - needs lots of > memory)". > > Assuming that space vs speed is a tradeoff, is there a good reason why we > would want to optimize for space instead of speed? This is a compile-time thing only, I don't believe there are much (if any) runtime benefits. The benefit from --enable-final is that it jams all the source files for a particular binary or library into one *.cpp and compiles that. It speeds up compile time, especially on machines with enough memory to swallow some of the larger libs/bins. It cut down kdelibs builds on puck by 50%, which is no small feat. After kde-base I decided to compile kdenetwork but didn't want all the shit in that package but KMAIL. So I used a DO_NOT_COMPILE variable to store the package's name to not compile.. really simple and only Kmail was compiled. Like that: export DO_NOT_COMPILE="kdict kit knewsticker knode kpf kppp ktalkd kxmlrpc" emerge kdenetwork It took about 2 days to compile everyting but now my system fly .. kde is quick and I LOVE IT! Just wanted to share .. :P PS: .. I still like Mandrake MottS EDIT: I changed --enable-fast-malloc=fast for --enable-fast-malloc=full Quote Link to comment Share on other sites More sharing options...
kmack Posted April 11, 2003 Report Share Posted April 11, 2003 Motts, thanks for sharing your experience! I am a L O N G way from being ready to try that, but it is interesting to know that it did seem to make a difference.. I have been wondering if it would be worth the effort. Your experience will prompt me to do some reading and preparing for when it is my time to try that level of customization! K3wl! 8) Congrats! Quote Link to comment Share on other sites More sharing options...
Ixthusdan Posted April 11, 2003 Report Share Posted April 11, 2003 I was thinking about trying gentoo a while back. Do you think the athlon flags make a difference? For Mandrake too? I recently upgraded to an xp1800+, so now I am wondering if I should go to the specialized rpm's. Gentoo looks like fun! Quote Link to comment Share on other sites More sharing options...
aru Posted April 11, 2003 Report Share Posted April 11, 2003 Just in case you don't know. You already know that you can optimize your mandrake by recompiling your installed programs, but also you can do it 'a la gentoo', as johnnyv showed the other day in http://www.mandrakeusers.org/viewtopic.php...hp?t=4117#29382 The proccess basically involves the secuential (and double) recompilation of binutils, glibc and gcc, to optimize the compiler for your own architecture. Once that is achieved, all your recompilations will be 100% optimized. Quote Link to comment Share on other sites More sharing options...
MottS Posted April 11, 2003 Author Report Share Posted April 11, 2003 Do you think the athlon flags make a difference? I guess yes. Or maybe my system is fast because I only installed a couple of packages.. not the whole thing. I can't really measure the effect of march=athlon-xp because I used other flags in addition to this one. To compile kde-base I also used some options specific to kde like --enable-final and --enable-fast-malloc=fast. But as a whole, my system is faster and I can feel it. I also like the DO_NOT_COMPILE option. I didn't need kppp (I use dsl) and I didn't care about ktalkd kxmlrpc and... so being able to compile/install specific parts of a kde package is really nice. I guess that would be feasible with a .src.rpm but have no clue how to do that. You also have to keep in mind that compiling a system 'a la gentoo' is really LONG. It took me about 4 days to compile the thing. Well, I compiled the system a first time but I was using some weird optimisation flags I found somewhere on the net and that didn't seems to work. So after 2 days (I go to work while the system is compiling so I don't really care) I reformated the partition and restarted to compile the system but will safer flags and that worked :P I will keep my system this way for a while.. I'm tired now .. :lol: MOttS Quote Link to comment Share on other sites More sharing options...
MottS Posted April 11, 2003 Author Report Share Posted April 11, 2003 The proccess basically involves the secuential (and double) recompilation of binutils, glibc and gcc, to optimize the compiler for your own architecture. Once that is achieved, all your recompilations will be 100% optimized. You are right. This process is called the bootstrap process when one install Gentoo. This is the first thing you have to do in a chrooted environment. from http://www.gentoo.org/doc/en/gentoo-x86-in....xml#doc_chap11 bootstrap.sh will build binutils, gcc, gettext, and glibc, rebuilding binutils, gcc, and gettext after glibc. Needless to say, this process takes a while. Quote Link to comment Share on other sites More sharing options...
qnr Posted April 11, 2003 Report Share Posted April 11, 2003 I was thinking about trying gentoo a while back. Do you think the athlon flags make a difference? For Mandrake too? I recently upgraded to an xp1800+, so now I am wondering if I should go to the specialized rpm's. Gentoo looks like fun! I'm using Athlon Thunderbird flags (Athlon 1200) here in Source Mage, and yes, they do make a big difference. Quote Link to comment Share on other sites More sharing options...
Ixthusdan Posted April 11, 2003 Report Share Posted April 11, 2003 So, if I understand correctly, I need to compile my building environment for my box, so that everything after that will be optimized for my box. Do I get it? Obviously, you guys think this is best as opposed to some premade rpm's. Quote Link to comment Share on other sites More sharing options...
MottS Posted April 11, 2003 Author Report Share Posted April 11, 2003 Exactly. But in addition, you have to use flags so that binaries are build according to your system. For example 'make -march=athlon-xp' will compile a binaries that will only run on an Athlon-XP and Duron (1000 to 1300 MHz). The compiler will use (more complex) system calls that are only understood by that kind of processors and not by older ones like i586 machines. This make the binaries fully optimized for your system and then run faster. However if you change your cpu for a Pentium or something else (downgrade from Athlon-XP to Athlon-tbird for instance) then I doubt that the system will even boot... this is a choise. MOttS Quote Link to comment Share on other sites More sharing options...
Guest fubar::chi Posted April 16, 2003 Report Share Posted April 16, 2003 i don't know i'm using an athlon-xp here and i used Motts' flags (a while back) plus some others (1.4 r3 i think) and I didn't see any appreciable increase (except fo maybe bootup and shutdown (shutdown especially was lightening fast.) After I saw that the mdk kde packages were actually really faster than the gentoo optimized ones I ditched gentoo and came prancing back to mdk I think optimizations are good if you have a slower processor but then it takes a god-awful time to compile. With a faster processor your compiles are faster but you don't really see the increase because of how fast you're processor is (rest o' my system is crap, especially mobo) Quote Link to comment Share on other sites More sharing options...
Ixthusdan Posted April 16, 2003 Report Share Posted April 16, 2003 Well, at the moment i'm doing nothing because I trashed my box with a bios flash. I'm waiting for the new chip to show up! I like Mandrake, but I'm thinking about dumping windigs completely and giving gentoo its own partition(s). Quote Link to comment Share on other sites More sharing options...
aRTee Posted April 16, 2003 Report Share Posted April 16, 2003 Apart from the fact that it can be good fun and very rewarding to completely build your own system, not to mention the great learning experience, I don’t really see why you should recompile your system to get a few seconds here and there if it costs 4 days to compile and set things up.... ;) If I ever have time left, I will definitely try gentoo, or lfs, but for now, I still want to install SuSE to try out (someone gave me a cd to try it), never used that, also on the agenda is check out more stuff on mdk9.1 (my remote is not working yet). And of course, work on my website. Question: could anyone with such an optimised system compile something for i586 and compare the speed with athlon optimised code? Sort of, benchmark the system? Try it with something like, divx encoding, ogg encoding etc. Those are tasks that are normally long and there it would help if you can speed things up, even if it’s only a few % (just encoded a movie to divx for the first time, 2 pass: 5:30 + 6:20 hrs..) Quote Link to comment Share on other sites More sharing options...
MottS Posted April 16, 2003 Author Report Share Posted April 16, 2003 with such an optimised system compile something for i586 and compare the speed with athlon optimised code?Sort of, benchmark the system? Try it with something like, divx encoding, ogg encoding etc. Those are tasks that are normally long and there it would help if you can speed things up, even if it’s only a few % (just encoded a movie to divx for the first time, 2 pass: 5:30 + 6:20 hrs..) I'm going to try tonight man. I will 'lame -v whatever.wav whatever.mp3' with my system as it is. I'm using CHOST="i686-pc-linux-gnu" CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer" CXXFLAGS="$CFLAGS" Then I will recompile Lame with the following flags: CHOST="i586-pc-linux-gnu" CFLAGS="-mcpu=i586" CXXFLAGS="$CFLAGS" and will issue the same command on the same wav file. I will see the diff. IMHO, it is too long to test with mencoder (DVD->Divx) .. if anyone is interested tho.. :) MottS Quote Link to comment Share on other sites More sharing options...
aRTee Posted April 16, 2003 Report Share Posted April 16, 2003 Hey great Motts, nice to find someone willing to try this! Also, you don't have to rip a whole dvd, you could just rip some part that takes, say, 10 minutes. At least then you could tell the difference.. (If you try with something that just takes a few secs you'd never be sure that it was a fair comparison..) Quote Link to comment Share on other sites More sharing options...
MottS Posted April 17, 2003 Author Report Share Posted April 17, 2003 Ok I finally used the Red Hat optimization flags instead of Mandrake. They are worst than the Mandrake one so we'll see. Here they are: CHOST="i386" CFLAGS="-mcpu=i386" CXXFLAGS="${CFLAGS}" They use mcpu instead of march. From /etc/make.conf (GCC config.. I don't think you'll find it on a Mandrake machine tho) # -mcpu=<cpu-type> means optimize code for the particular type of CPU without# breaking compatibility with other CPUs. # # -march=<cpu-type> means to take full advantage of the ABI and instructions # for the particular CPU; this will break compatibility with older CPUs (for # example, -march=athlon-xp code will not run on a regular Athlon, and # -march=i686 code will not run on a Pentium Classic. Hey !! weird. it's like optimizing without optimizing... that sounds weird to me isn't? :-D The command was really simple. I used time lame -h track*.wav track*.mp3 in a directory where there was 5 songs (.wav, extracted from a cd with cdparanoia). The 'time' command give the real/cpu/user time a command last. From the man page: DESCRIPTIONThe time command runs the specified program command with the given arguments. When command finishes, time writes a message to standard output giving timing statistics about this program run. These statistics consist of (i) the elapsed real time between invocation and termination, (ii) the user CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times(2)), and (iii) the system CPU time (the sum of the tms_stime and tms_cstime values in a structtms as returned by times(2)). Here are the results: RedHat optimized test LAME version 3.93 MMX (http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track01.cdda.wav to track01.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 8946/8949 (100%)| 1:04/ 1:04| 1:05/ 1:05| 3.6361x| 0:00 average: 128.0 kbps LR: 681 (7.610%) MS: 8268 (92.39%) Writing LAME Tag...done real 1m5.484s user 1m3.410s sys 0m0.870s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track02.cdda.wav to track02.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 10798/10800 (100%)| 1:18/ 1:18| 1:22/ 1:22| 3.6121x| 0:00 average: 128.0 kbps LR: 646 (5.981%) MS: 10154 (94.02%) Writing LAME Tag...done real 1m22.031s user 1m17.100s sys 0m1.000s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track03.cdda.wav to track03.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 9376/9378 (100%)| 1:08/ 1:08| 1:08/ 1:08| 3.5865x| 0:00 average: 128.0 kbps LR: 429 (4.575%) MS: 8949 (95.43%) Writing LAME Tag...done real 1m8.784s user 1m7.190s sys 0m1.110s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track04.cdda.wav to track04.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 10513/10516 (100%)| 1:16/ 1:16| 1:18/ 1:18| 3.5680x| 0:00 average: 128.0 kbps LR: 1014 (9.642%) MS: 9502 (90.36%) Writing LAME Tag...done real 1m18.552s user 1m15.780s sys 0m1.200s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track05.cdda.wav to track05.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 11131/11134 (100%)| 1:17/ 1:17| 1:18/ 1:18| 3.7345x| 0:00 average: 128.0 kbps LR: 1083 (9.727%) MS: 10051 (90.27%) Writing LAME Tag...done real 1m18.774s user 1m16.530s sys 0m1.340s root@localhost mp3 # ls -l /usr/bin/lame -rwxr-xr-x 1 root root 337972 Apr 16 18:32 /usr/bin/lame root@localhost mp3 # ls -l /usr/lib/libmp3lame.so.0.0.0 -rwxr-xr-x 1 root root 340434 Apr 16 18:32 /usr/lib/libmp3lame.so.0.0.0 root@localhost mp3 # ls -l /usr/bin/mp3rtp -rwxr-xr-x 1 root root 333864 Apr 16 18:32 /usr/bin/mp3rtp Gentoo optimized test LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track01.cdda.wav to track01.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 8946/8949 (100%)| 0:56/ 0:56| 0:58/ 0:58| 4.1274x| 0:00 average: 128.0 kbps LR: 681 (7.610%) MS: 8268 (92.39%) Writing LAME Tag...done real 0m58.448s user 0m55.900s sys 0m0.730s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track02.cdda.wav to track02.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 10798/10800 (100%)| 1:07/ 1:07| 1:08/ 1:08| 4.1689x| 0:00 average: 128.0 kbps LR: 646 (5.981%) MS: 10154 (94.02%) Writing LAME Tag...done real 1m8.328s user 1m6.950s sys 0m0.740s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track03.cdda.wav to track03.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 9376/9378 (100%)| 1:00/ 1:00| 1:00/ 1:00| 4.0732x| 0:00 average: 128.0 kbps LR: 429 (4.575%) MS: 8949 (95.43%) Writing LAME Tag...done real 1m0.260s user 0m59.300s sys 0m0.850s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track04.cdda.wav to track04.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 10513/10516 (100%)| 1:07/ 1:07| 1:08/ 1:08| 4.0571x| 0:00 average: 128.0 kbps LR: 1014 (9.642%) MS: 9502 (90.36%) Writing LAME Tag...done real 1m8.241s user 1m6.930s sys 0m0.780s LAME version 3.93 MMX (http://www.mp3dev.org/) CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz Encoding track05.cdda.wav to track05.mp3 Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 11131/11134 (100%)| 1:08/ 1:08| 1:08/ 1:08| 4.2722x| 0:00 average: 128.0 kbps LR: 1083 (9.727%) MS: 10051 (90.27%) Writing LAME Tag...done real 1m8.230s user 1m7.210s sys 0m0.880s root@localhost mp3 # ls -l /usr/bin/lame -rwxr-xr-x 1 root root 331636 Apr 16 19:05 /usr/bin/lame root@localhost mp3 # ls -l /usr/lib/libmp3lame.so.0.0.0 -rwxr-xr-x 1 root root 334495 Apr 16 19:05 /usr/lib/libmp3lame.so.0.0.0 root@localhost mp3 # ls -l /usr/bin/mp3rtp -rwxr-xr-x 1 root root 327880 Apr 16 19:05 /usr/bin/mp3rtp Results To get the % of difference I did (RedHat-Gentoo)/Gentoo. Notice that I put the time in 1/100 before to do the calculation (as expected!). So we have Track#1: real ==> 12% reduction user ==> 13.4% reduction sys ==> 18.9 % reduction Track#2: real ==> 20% reduction user ==> 15.2% reduction sys ==> 35.8% reduction Track#3: real ==> 14.1% reduction user ==> 13.3% reduction sys ==> 30.3% reduction Track#4: Too lazy Track#5: Too lazy Also notice the size of the executable and library when down a bit, which make a system smaller when you think about the number of executable and lib you have in /bin /usr/bin etc... Imagine the results on KDE or Gnome !!! Are you convinced? MOttS Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.