shric a day ago | next |

A fun read on word count optimization can be found in Abrash's Black Book:

https://www.jagregory.com/abrash-black-book/#lessons-learned...

You can gloss over the asm if you wish, the tricks that are explained around it are worth it imho.

Joker_vD a day ago | root | parent |

I wonder if large lookup tables/table-driven state machines are still as good as they used to be. After all, even with all the on-chip caches, the additional memory accesses today seem to be slower than doing some multi-instruction SIMD voodoo.

Joker_vD a day ago | prev | next |

> A word is a maximal string of characters delimited by spaces, tabs or newlines.

And then the actual code explicitly filters out and ignores every character larger than 0x7F. Just why.

Tor3 a day ago | root | parent | prev | next |

ASCII is 7 bits (the eight bit would be parity), so that makes perfect sense, in an ASCII world.

Joker_vD a day ago | root | parent |

So the character e.g. "B" would have this parity bit set and therefore should be filtered out and not count as a letter, in the ASCII world?

Tor3 5 hours ago | root | parent | next |

Parity bits are not part of the character. They are for detecting transmission errors. You filter off the parity bit before looking at the byte.

Joker_vD 7 minutes ago | root | parent |

But this is not what's the code doing, is it? It's not doing (ch & 0x7F), it's doing ch <= 0x7F. And the parity checking/filtering is done in the tape drive/serial port driver anyhow, it would never reach wc in the first place.

aap_ a day ago | root | parent | prev | next |

There are only 7 bits in ASCII. An 8th can be used for parity when transmitting data but a regular program will never see it. Anything above 0x7F is simply not a character.

epcoa 16 hours ago | root | parent | prev |

What in the hell are you going on about? B is 0x46 which is < 0x7F.

Joker_vD 15 hours ago | root | parent |

I am going about the parity bit. 0x46 has odd number of bits set (three, to be precise) so for the parity to check out (that is, the number of bits set has to be even), a parity bit needs to be set and the resulting encoding has to be 0xC6, with four bits set.

Tor3 5 hours ago | root | parent |

The parity bit is not part of the character. It's external, an error detecting device. To read ASCII you always look at bits 6..0, seven bits. You don't filter away the character because it has the parity bit set, you filter off the parity bit (whether it's set or not).

ivan_gammel a day ago | root | parent | prev |

Because they thought that a word is something said in a human language that they can understand.

Joker_vD a day ago | root | parent |

Mi ne pensas ke lingvoj kiuj usas ekskluzive la basan latinan alfabeton estas komprepeneblaj per si mem.

actionfromafar a day ago | root | parent | prev |

Ze riform iz komplit.

Joker_vD a day ago | root | parent |

The [z] and [ð] are phonemically different in English, just as [i] and [i:] are, so it'd actually be "Ðe riform is komplijt". American rhotacism prevents us from spelling it "rifoom" as would be proper, unfortunately.