Jump to content

Just want to share..


Recommended Posts

Hi all

 

I recompiled my Gentoo system this week and I've learn quite a lot. First of all I compiled my system with some additionnal options

CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer"

Last time I used the i686 option... that's it. Now my system is compiled from the base up to KDE with those optimized flags for my system. The fomit-frame-pointer basically disable the verbose mode when crashes occurs. I don't want to debug anything so I put that flag there. Here is a website listing all the safe flags for pretty much all type of computer.

 

http://www.freehackers.org/gentoo/gccflags.../flag_gcc3.html

 

Another one for 'not so safe' flags.. lol (didn't try tho)

 

http://www.freehackers.org/gentoo/gccflags...ag_gcc3opt.html

 

Now, when came the time to compile kde-base I used a 'myconf' variable to pass some options to the ./configure command. I used the

--enable-final --disable-debug --enable-fast-malloc=full

[kde-freebsd] Re: --enable-final 

Will Andrews Will Andrews <will@csociety.org> 

Mon, 21 Jan 2002 12:51:51 -0500 

 

* Previous message: [kde-freebsd] --enable-final 

* Next message: [kde-freebsd] Re: --enable-final 

* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] 

 

On Mon, Jan 21, 2002 at 12:46:21PM -0500, Alan E wrote: 

> I see configure says "build size optimized apps (experimental - needs lots of 

> memory)". 

> Assuming that space vs speed is a tradeoff, is there a good reason why we 

> would want to optimize for space instead of speed? 

 

This is a compile-time thing only, I don't believe there are much 

(if any) runtime benefits. The benefit from --enable-final is 

that it jams all the source files for a particular binary or 

library into one *.cpp and compiles that. It speeds up compile 

time, especially on machines with enough memory to swallow some 

of the larger libs/bins. It cut down kdelibs builds on puck by 

50%, which is no small feat.

After kde-base I decided to compile kdenetwork but didn't want all the shit in that package but KMAIL. So I used a DO_NOT_COMPILE variable to store the package's name to not compile.. really simple and only Kmail was compiled. Like that:

export DO_NOT_COMPILE="kdict kit knewsticker knode kpf kppp ktalkd kxmlrpc"

emerge kdenetwork

It took about 2 days to compile everyting but now my system fly .. kde is quick and I LOVE IT!

 

Just wanted to share .. :P

 

PS: .. :oops: I still like Mandrake :oops:

 

MottS

 

EDIT: I changed --enable-fast-malloc=fast for --enable-fast-malloc=full :oops:

Link to comment
Share on other sites

Motts, thanks for sharing your experience! I am a L O N G way from being ready to try that, but it is interesting to know that it did seem to make a difference.. I have been wondering if it would be worth the effort. Your experience will prompt me to do some reading and preparing for when it is my time to try that level of customization! K3wl! 8)

 

Congrats!

Link to comment
Share on other sites

I was thinking about trying gentoo a while back. Do you think the athlon flags make a difference? For Mandrake too? I recently upgraded to an xp1800+, so now I am wondering if I should go to the specialized rpm's. Gentoo looks like fun!

Link to comment
Share on other sites

Just in case you don't know.

 

You already know that you can optimize your mandrake by recompiling your installed programs, but also you can do it 'a la gentoo', as johnnyv showed the other day in http://www.mandrakeusers.org/viewtopic.php...hp?t=4117#29382

 

The proccess basically involves the secuential (and double) recompilation of binutils, glibc and gcc, to optimize the compiler for your own architecture. Once that is achieved, all your recompilations will be 100% optimized.

Link to comment
Share on other sites

Do you think the athlon flags make a difference?

 

I guess yes. Or maybe my system is fast because I only installed a couple of packages.. not the whole thing. I can't really measure the effect of march=athlon-xp because I used other flags in addition to this one. To compile kde-base I also used some options specific to kde like --enable-final and --enable-fast-malloc=fast. But as a whole, my system is faster and I can feel it. I also like the DO_NOT_COMPILE option. I didn't need kppp (I use dsl) and I didn't care about ktalkd kxmlrpc and... so being able to compile/install specific parts of a kde package is really nice. I guess that would be feasible with a .src.rpm but have no clue how to do that.

 

You also have to keep in mind that compiling a system 'a la gentoo' is really LONG. It took me about 4 days to compile the thing. Well, I compiled the system a first time but I was using some weird optimisation flags I found somewhere on the net and that didn't seems to work. So after 2 days (I go to work while the system is compiling so I don't really care) I reformated the partition and restarted to compile the system but will safer flags and that worked :P I will keep my system this way for a while.. I'm tired now .. :lol:

 

MOttS

Link to comment
Share on other sites

The proccess basically involves the secuential (and double) recompilation of binutils, glibc and gcc, to optimize the compiler for your own architecture. Once that is achieved, all your recompilations will be 100% optimized.

 

You are right. This process is called the bootstrap process when one install Gentoo. This is the first thing you have to do in a chrooted environment.

 

from http://www.gentoo.org/doc/en/gentoo-x86-in....xml#doc_chap11

 

bootstrap.sh will build binutils, gcc, gettext, and glibc, rebuilding binutils, gcc, and gettext after glibc. Needless to say, this process takes a while.
Link to comment
Share on other sites

I was thinking about trying gentoo a while back. Do you think the athlon flags make a difference? For Mandrake too? I recently upgraded to an xp1800+, so now I am wondering if I should go to the specialized rpm's. Gentoo looks like fun!

 

I'm using Athlon Thunderbird flags (Athlon 1200) here in Source Mage, and yes, they do make a big difference.

Link to comment
Share on other sites

So, if I understand correctly, I need to compile my building environment for my box, so that everything after that will be optimized for my box. Do I get it?

Obviously, you guys think this is best as opposed to some premade rpm's.

Link to comment
Share on other sites

Exactly. But in addition, you have to use flags so that binaries are build according to your system. For example 'make -march=athlon-xp' will compile a binaries that will only run on an Athlon-XP and Duron (1000 to 1300 MHz). The compiler will use (more complex) system calls that are only understood by that kind of processors and not by older ones like i586 machines. This make the binaries fully optimized for your system and then run faster. However if you change your cpu for a Pentium or something else (downgrade from Athlon-XP to Athlon-tbird for instance) then I doubt that the system will even boot... this is a choise.

 

MOttS

Link to comment
Share on other sites

Guest fubar::chi

i don't know i'm using an athlon-xp here and i used Motts' flags (a while back) plus some others (1.4 r3 i think) and I didn't see any appreciable increase (except fo maybe bootup and shutdown (shutdown especially was lightening fast.)

After I saw that the mdk kde packages were actually really faster than the gentoo optimized ones I ditched gentoo and came prancing back to mdk :twisted:

I think optimizations are good if you have a slower processor but then it takes a god-awful time to compile. With a faster processor your compiles are faster but you don't really see the increase because of how fast you're processor is (rest o' my system is crap, especially mobo)

Link to comment
Share on other sites

Well, at the moment i'm doing nothing because I trashed my box with a bios flash. :oops: I'm waiting for the new chip to show up! I like Mandrake, but I'm thinking about dumping windigs completely and giving gentoo its own partition(s).

Link to comment
Share on other sites

Apart from the fact that it can be good fun and very rewarding to completely build your own system, not to mention the great learning experience, I don’t really see why you should recompile your system to get a few seconds here and there if it costs 4 days to compile and set things up.... ;)

 

If I ever have time left, I will definitely try gentoo, or lfs, but for now, I still want to install SuSE to try out (someone gave me a cd to try it), never used that, also on the agenda is check out more stuff on mdk9.1 (my remote is not working yet).

And of course, work on my website.

 

Question: could anyone with such an optimised system compile something for i586 and compare the speed with athlon optimised code?

Sort of, benchmark the system?

Try it with something like, divx encoding, ogg encoding etc. Those are tasks that are normally long and there it would help if you can speed things up, even if it’s only a few % (just encoded a movie to divx for the first time, 2 pass: 5:30 + 6:20 hrs..)

Link to comment
Share on other sites

with such an optimised system compile something for i586 and compare the speed with athlon optimised code?

Sort of, benchmark the system?

Try it with something like, divx encoding, ogg encoding etc. Those are tasks that are normally long and there it would help if you can speed things up, even if it’s only a few % (just encoded a movie to divx for the first time, 2 pass: 5:30 + 6:20 hrs..)

 

I'm going to try tonight man. I will 'lame -v whatever.wav whatever.mp3' with my system as it is. I'm using

CHOST="i686-pc-linux-gnu" 

CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer" 

CXXFLAGS="$CFLAGS"

 

Then I will recompile Lame with the following flags:

CHOST="i586-pc-linux-gnu" 

CFLAGS="-mcpu=i586" 

CXXFLAGS="$CFLAGS"

and will issue the same command on the same wav file. I will see the diff. IMHO, it is too long to test with mencoder (DVD->Divx) .. if anyone is interested tho.. :)

 

MottS

Link to comment
Share on other sites

Hey great Motts, nice to find someone willing to try this!

 

Also, you don't have to rip a whole dvd, you could just rip some part that takes, say, 10 minutes. At least then you could tell the difference..

(If you try with something that just takes a few secs you'd never be sure that it was a fair comparison..)

Link to comment
Share on other sites

Ok I finally used the Red Hat optimization flags instead of Mandrake. They are worst than the Mandrake one so we'll see. Here they are:

CHOST="i386"

CFLAGS="-mcpu=i386"

CXXFLAGS="${CFLAGS}"

They use mcpu instead of march. From /etc/make.conf (GCC config.. I don't think you'll find it on a Mandrake machine tho)

# -mcpu=<cpu-type> means optimize code for the particular type of CPU without

# breaking compatibility with other CPUs.

#

# -march=<cpu-type> means to take full advantage of the ABI and instructions

# for the particular CPU; this will break compatibility with older CPUs (for

# example, -march=athlon-xp code will not run on a regular Athlon, and

# -march=i686 code will not run on a Pentium Classic.

Hey !! weird. it's like optimizing without optimizing... that sounds weird to me isn't? :-D

 

The command was really simple. I used time lame -h track*.wav track*.mp3 in a directory where there was 5 songs (.wav, extracted from a cd with cdparanoia). The 'time' command give the real/cpu/user time a command last. From the man page:

DESCRIPTION

The  time  command  runs  the  specified program command with the given arguments.  When command finishes, time writes a  message  to  standard output  giving timing statistics about this program run.  These statistics consist of (i) the elapsed real time between invocation and termination, (ii) the user CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times(2)), and (iii)  the  system CPU  time  (the  sum of the tms_stime and tms_cstime values in a structtms as returned by times(2)).

 

Here are the results:

 

RedHat optimized test

 

LAME version 3.93 MMX (http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/'>http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track01.cdda.wav to track01.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

8946/8949 (100%)| 1:04/ 1:04| 1:05/ 1:05| 3.6361x| 0:00

average: 128.0 kbps LR: 681 (7.610%) MS: 8268 (92.39%)

 

Writing LAME Tag...done

 

real 1m5.484s

user 1m3.410s

sys 0m0.870s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track02.cdda.wav to track02.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

10798/10800 (100%)| 1:18/ 1:18| 1:22/ 1:22| 3.6121x| 0:00

average: 128.0 kbps LR: 646 (5.981%) MS: 10154 (94.02%)

 

Writing LAME Tag...done

 

real 1m22.031s

user 1m17.100s

sys 0m1.000s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track03.cdda.wav to track03.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

9376/9378 (100%)| 1:08/ 1:08| 1:08/ 1:08| 3.5865x| 0:00

average: 128.0 kbps LR: 429 (4.575%) MS: 8949 (95.43%)

 

Writing LAME Tag...done

 

real 1m8.784s

user 1m7.190s

sys 0m1.110s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track04.cdda.wav to track04.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

10513/10516 (100%)| 1:16/ 1:16| 1:18/ 1:18| 3.5680x| 0:00

average: 128.0 kbps LR: 1014 (9.642%) MS: 9502 (90.36%)

 

Writing LAME Tag...done

 

real 1m18.552s

user 1m15.780s

sys 0m1.200s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track05.cdda.wav to track05.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

11131/11134 (100%)| 1:17/ 1:17| 1:18/ 1:18| 3.7345x| 0:00

average: 128.0 kbps LR: 1083 (9.727%) MS: 10051 (90.27%)

 

Writing LAME Tag...done

 

real 1m18.774s

user 1m16.530s

sys 0m1.340s

 

root@localhost mp3 # ls -l /usr/bin/lame

-rwxr-xr-x 1 root root 337972 Apr 16 18:32 /usr/bin/lame

root@localhost mp3 # ls -l /usr/lib/libmp3lame.so.0.0.0

-rwxr-xr-x 1 root root 340434 Apr 16 18:32 /usr/lib/libmp3lame.so.0.0.0

root@localhost mp3 # ls -l /usr/bin/mp3rtp

-rwxr-xr-x 1 root root 333864 Apr 16 18:32 /usr/bin/mp3rtp

 

Gentoo optimized test

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track01.cdda.wav to track01.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

8946/8949 (100%)| 0:56/ 0:56| 0:58/ 0:58| 4.1274x| 0:00

average: 128.0 kbps LR: 681 (7.610%) MS: 8268 (92.39%)

 

Writing LAME Tag...done

 

real 0m58.448s

user 0m55.900s

sys 0m0.730s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track02.cdda.wav to track02.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

10798/10800 (100%)| 1:07/ 1:07| 1:08/ 1:08| 4.1689x| 0:00

average: 128.0 kbps LR: 646 (5.981%) MS: 10154 (94.02%)

 

Writing LAME Tag...done

 

real 1m8.328s

user 1m6.950s

sys 0m0.740s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track03.cdda.wav to track03.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

9376/9378 (100%)| 1:00/ 1:00| 1:00/ 1:00| 4.0732x| 0:00

average: 128.0 kbps LR: 429 (4.575%) MS: 8949 (95.43%)

 

Writing LAME Tag...done

 

real 1m0.260s

user 0m59.300s

sys 0m0.850s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track04.cdda.wav to track04.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

10513/10516 (100%)| 1:07/ 1:07| 1:08/ 1:08| 4.0571x| 0:00

average: 128.0 kbps LR: 1014 (9.642%) MS: 9502 (90.36%)

 

Writing LAME Tag...done

 

real 1m8.241s

user 1m6.930s

sys 0m0.780s

 

LAME version 3.93 MMX (http://www.mp3dev.org/)

CPU features: i387, MMX (ASM used), 3DNow! (ASM used), SIMD

Using polyphase lowpass filter, transition band: 15115 Hz - 15648 Hz

Encoding track05.cdda.wav to track05.mp3

Encoding as 44.1 kHz 128 kbps j-stereo MPEG-1 Layer III (11x) qval=2

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

11131/11134 (100%)| 1:08/ 1:08| 1:08/ 1:08| 4.2722x| 0:00

average: 128.0 kbps LR: 1083 (9.727%) MS: 10051 (90.27%)

 

Writing LAME Tag...done

 

real 1m8.230s

user 1m7.210s

sys 0m0.880s

 

root@localhost mp3 # ls -l /usr/bin/lame

-rwxr-xr-x 1 root root 331636 Apr 16 19:05 /usr/bin/lame

root@localhost mp3 # ls -l /usr/lib/libmp3lame.so.0.0.0

-rwxr-xr-x 1 root root 334495 Apr 16 19:05 /usr/lib/libmp3lame.so.0.0.0

root@localhost mp3 # ls -l /usr/bin/mp3rtp

-rwxr-xr-x 1 root root 327880 Apr 16 19:05 /usr/bin/mp3rtp

 

Results

To get the % of difference I did (RedHat-Gentoo)/Gentoo. Notice that I put the time in 1/100 before to do the calculation (as expected!). So we have

 

Track#1:

real ==> 12% reduction

user ==> 13.4% reduction

sys ==> 18.9 % reduction

 

Track#2:

real ==> 20% reduction

user ==> 15.2% reduction

sys ==> 35.8% reduction

 

Track#3:

real ==> 14.1% reduction

user ==> 13.3% reduction

sys ==> 30.3% reduction

 

Track#4:

Too lazy

 

Track#5:

Too lazy

 

Also notice the size of the executable and library when down a bit, which make a system smaller when you think about the number of executable and lib you have in /bin /usr/bin etc...

 

Imagine the results on KDE or Gnome !!! Are you convinced?

 

MOttS

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...