"Actually, the American billion is 1000 times bigger than the UK billion."
No, they're the same. When I wrote my previous posting I unsure about the British nomenclature, but the European nomenclature in general, spearheaded by the French, does use multiples of million when applying the prefixes to BIllion, TRIllion, QUADrillion, and so forth. (The American way is to use multiples of a thousand, less one.)
From what I've learnt this at least used to be the norm for British English as well, but might now see more influence from their western colonies, resulting in occasional confusion.
@ Typical- have to agree.
What better way to create a new line of business, put the fear in to companies then come out with the new catch phrase "CLOUD COMPUTING" and sell it. The previous artical about SideKick is a prime example of why you shouldn't hand over your data and the way the EULA's are written, you won't have a leg to stand on, because no company that sponsors "Cloud Computing" is going to allow itself to to be sued to oblivion.
A good PSU and decent mother board can have a profound impact stability. Issues such as chip set compatibilities poor layout, trace routing and EMI can cause many problems.
There are problems and then there are symptoms of problems!
Electronics is built now to "work?" just passed the warranty expiration date.
The report did not differentiate between hard and soft failures. Also, they did not spend any money on failure analysis of the bad memory. Without knowing why the modules failed it is very difficult to place the blame. However, it is a known fact that the RHoS initiative has reduced the reliability of all of our electronics. Why does the military refuse to use these parts, duh. I have 30 year old electronics that still work. However, this will now be a thing of the past!!!
People installing memory without proper antistatic precautions more like!
I'd be very curious as to how many of those modules were from reputable suppliers and had been handled and installed following correct anti-static precautions at *all* times.
What many people (sadly including some hardware techs and field circus people) don't seem to understand is that static damage often isn't something that's instantly obvious, it can be very subtle.
A small (and undetectable to the person installing it) static discharge can lightly cook the memory, and might result in it failing earlier than it should, or becoming flaky, without causing a consistent problem.
Cheap unbranded/knockoff RAM is another cause. If the costs to manufacture something are the same, in a cost sensitive market cutting corners gets a cheaper product and delivers higher margins; at least in the short term. If you don't care about your brand image (because you don't have one), then it's easy to cut corners.
I'd also like to see a breakdown of where the errors were coming from, whether it's the controller, the ram chips themselves, or down to board layout/construction issues. Also whether the motherboard chipset is in any way guilty. Poor motherboard timing could cause a memory error, even when the memory is well within spec.
Fully agree about one error being a good predictor of future errors, hard disks show that very strongly IMO.
Ever since the 430HX chipset (for the Pentium CPU), Intel have had ECC memory option for high end desktop systems.
It's only the 925x chipset and the new Core i7 CPU that doesn't have it. It's a complete breach of tradition that only the Xeon version features it. It's not like it isn't available it's just disabled because of some acute outbreak of insanity in Intels marketing department.
I wish things like ECC where standard but the stuff is harder to find, fewer choices, cost way to much more. There are even better memory types that have been designed that are faster, static even in no power state and last orders of magnitude longer. But we only hear about then and then they are gone. Not enough profit in them. :(
Reading the report 1/3 experienced at least 1 memory error per year. Chances increase after an error has occurred.
Solution 1, use ECC, any major server does anyway, even nVidia's fermi -P
Solution 2, reboot every 6 months.
Solution 3, shield against EM radiation which is the primary cause of a memory error or so we're lead to believe.
you write:
"Yet this very year, when I looked around for an affordable PC with ECC RAM, I was told that "only servers get that".
You can get one. AMD systems include the memory controller in the actual processor instead of the chipset. As a result, you can get a very fast Phenom II and DDR2 or DDR3 ECC memory.
I have a 3.0GHz Phenom II 940, 8GB of DDR2 ECC memory, and the whole system cost me less than $900. You don't need to buy a "server class" system anymore to get decent memory.
I think they all taught to learn the American billion nowadays, atleast that's what i faced at work. When I ask them "which billion?", they all look at me strangely and say "how many billion we have? they's only 1 way...".
Of the whole PC mentality, that is. Pile it high, sell it cheap, if anything goes wrong reboot, failing that buy another, after all they're cheap.
Unfortunately the thing that is left out of all this clever entrepreneurial calculation is the value of people's irreplaceable data, and of course their personal time. Needless to say Intel, Microsoft, and all the other big PC players couldn't care less about their customers' time, data, or indeed anything else. Just so long as they can move mountains of "product" every year - and more of it every year. Which, let's face it, is easier if the stuff keeps breaking down and needs replacing.
Let's see - I think it would have been about 1980 when I was working in DEC's Remote Diagnosis Centre in Basingstoke, and a very senior colleague who had gone to the States to help design next-generation VAXen emailed to ask us if we thought it would be a good tradeoff to drop ECC on our RAM, as calculations showed the total expected downtime from memory errors would be less than the time gained by faster computation. Working in front-line support (and thus in constant touch with actual customers and their concerns) we thought about it for about ten seconds before chorusing a unanimous "No way!"
It doesn't take much thought to see that wall-clock time is the least of it. Computers simply shouldn't fail if there is any reasonably feasible way to prevent it. A single-bit RAM error could (worst case) wipe out a hard drive or a whole RAID set, and lose data worth millions that took real people years to collect.
Yet this very year, when I looked around for an affordable PC with ECC RAM, I was told that "only servers get that". Why on earth??? Servers are no more likely than clients to contain irreplaceable data; and they are far more likely to be properly backed up, too.
It's time we started thinking about putting computing on solid footing, and expecting our hardware to deliver predictable, reliable, repeatable results would be a good first step. (Don't even get me started on the state of software...)
First get the numbers straight:
- The test period was 2.5 years, not 3.
- The failure rate was per American billion of hours, not the 1,000 times larger "European" billion.
For a computer with two memory modules it boils down to an average of one failure per year of hard work 24/7.
"In a three year programme, researchers found that memory modules failed an average of 25,000 to 75,000 times for every billion hours of operation."
3 years? Really? Nothing better to do? They could have asked a selection of system builders the fail rate in systems and scaled up from that. Memory failure rate is shockingly high no matter what the brand involved but that's pretty much just the way it is.
"Actually, the American billion is 1000 times bigger than the UK billion."
No, they're the same. When I wrote my previous posting I unsure about the British nomenclature, but the European nomenclature in general, spearheaded by the French, does use multiples of million when applying the prefixes to BIllion, TRIllion, QUADrillion, and so forth. (The American way is to use multiples of a thousand, less one.)
From what I've learnt this at least used to be the norm for British English as well, but might now see more influence from their western colonies, resulting in occasional confusion.
Perhaps DDR-4 should have ECC as a built-in requirement at a sacrifice to some of the extra speed.
Perhaps DDR-4 should have it as a built-in requirement at a sacrifice to some of the extra speed.
@ Typical- have to agree.
What better way to create a new line of business, put the fear in to companies then come out with the new catch phrase "CLOUD COMPUTING" and sell it. The previous artical about SideKick is a prime example of why you shouldn't hand over your data and the way the EULA's are written, you won't have a leg to stand on, because no company that sponsors "Cloud Computing" is going to allow itself to to be sued to oblivion.
A good PSU and decent mother board can have a profound impact stability. Issues such as chip set compatibilities poor layout, trace routing and EMI can cause many problems.
There are problems and then there are symptoms of problems!
Electronics is built now to "work?" just passed the warranty expiration date.
The report did not differentiate between hard and soft failures. Also, they did not spend any money on failure analysis of the bad memory. Without knowing why the modules failed it is very difficult to place the blame. However, it is a known fact that the RHoS initiative has reduced the reliability of all of our electronics. Why does the military refuse to use these parts, duh. I have 30 year old electronics that still work. However, this will now be a thing of the past!!!
I'd be very curious as to how many of those modules were from reputable suppliers and had been handled and installed following correct anti-static precautions at *all* times.
What many people (sadly including some hardware techs and field circus people) don't seem to understand is that static damage often isn't something that's instantly obvious, it can be very subtle.
A small (and undetectable to the person installing it) static discharge can lightly cook the memory, and might result in it failing earlier than it should, or becoming flaky, without causing a consistent problem.
Cheap unbranded/knockoff RAM is another cause. If the costs to manufacture something are the same, in a cost sensitive market cutting corners gets a cheaper product and delivers higher margins; at least in the short term. If you don't care about your brand image (because you don't have one), then it's easy to cut corners.
I'd also like to see a breakdown of where the errors were coming from, whether it's the controller, the ram chips themselves, or down to board layout/construction issues. Also whether the motherboard chipset is in any way guilty. Poor motherboard timing could cause a memory error, even when the memory is well within spec.
Fully agree about one error being a good predictor of future errors, hard disks show that very strongly IMO.
Ever since the 430HX chipset (for the Pentium CPU), Intel have had ECC memory option for high end desktop systems.
It's only the 925x chipset and the new Core i7 CPU that doesn't have it. It's a complete breach of tradition that only the Xeon version features it. It's not like it isn't available it's just disabled because of some acute outbreak of insanity in Intels marketing department.
Memory is so cheap nowadays that there's no reason for us to all start using ECC in new systems. Back in the old days, we had parity memory at least.
I wish things like ECC where standard but the stuff is harder to find, fewer choices, cost way to much more. There are even better memory types that have been designed that are faster, static even in no power state and last orders of magnitude longer. But we only hear about then and then they are gone. Not enough profit in them. :(
Reading the report 1/3 experienced at least 1 memory error per year. Chances increase after an error has occurred.
Solution 1, use ECC, any major server does anyway, even nVidia's fermi -P
Solution 2, reboot every 6 months.
Solution 3, shield against EM radiation which is the primary cause of a memory error or so we're lead to believe.
Being an American born and raised here, I am pretty sure we use 10^9 for billion.
Goes like this
10^1 ten
10^2 hundred
10^3 thousand
10^6 million
10^9 billion
10^12 trillion
10^15 quadrillion
10^18 quintillion
...
I could think of a billion reasons not to be, but that would depend on your definition of "is".
How much would you pay to hold Buck Fifty?
... and British Billion are the same - Have been since the seventies.
Don't know where D got his info from; Maybe just stepped out of his Tardis?
@Tom Welsh
you write:
"Yet this very year, when I looked around for an affordable PC with ECC RAM, I was told that "only servers get that".
You can get one. AMD systems include the memory controller in the actual processor instead of the chipset. As a result, you can get a very fast Phenom II and DDR2 or DDR3 ECC memory.
I have a 3.0GHz Phenom II 940, 8GB of DDR2 ECC memory, and the whole system cost me less than $900. You don't need to buy a "server class" system anymore to get decent memory.
I was going to say something, but I forgot my POV when I started typing...
I blame my old 64 level soggy MLC memory...
;-)
Love the big white beard.
What color are your suspenders?
I read this article about a month ago.
How can it be listed as an October 13th article?
I think they all taught to learn the American billion nowadays, atleast that's what i faced at work. When I ask them "which billion?", they all look at me strangely and say "how many billion we have? they's only 1 way...".
Kinds, nap...
Actually, the American billion is 1000 times bigger than the UK billion.
UK currently use short scale, 1 billion = 10^9, and not the long scale of 10^12.
Of the whole PC mentality, that is. Pile it high, sell it cheap, if anything goes wrong reboot, failing that buy another, after all they're cheap.
Unfortunately the thing that is left out of all this clever entrepreneurial calculation is the value of people's irreplaceable data, and of course their personal time. Needless to say Intel, Microsoft, and all the other big PC players couldn't care less about their customers' time, data, or indeed anything else. Just so long as they can move mountains of "product" every year - and more of it every year. Which, let's face it, is easier if the stuff keeps breaking down and needs replacing.
Let's see - I think it would have been about 1980 when I was working in DEC's Remote Diagnosis Centre in Basingstoke, and a very senior colleague who had gone to the States to help design next-generation VAXen emailed to ask us if we thought it would be a good tradeoff to drop ECC on our RAM, as calculations showed the total expected downtime from memory errors would be less than the time gained by faster computation. Working in front-line support (and thus in constant touch with actual customers and their concerns) we thought about it for about ten seconds before chorusing a unanimous "No way!"
It doesn't take much thought to see that wall-clock time is the least of it. Computers simply shouldn't fail if there is any reasonably feasible way to prevent it. A single-bit RAM error could (worst case) wipe out a hard drive or a whole RAID set, and lose data worth millions that took real people years to collect.
Yet this very year, when I looked around for an affordable PC with ECC RAM, I was told that "only servers get that". Why on earth??? Servers are no more likely than clients to contain irreplaceable data; and they are far more likely to be properly backed up, too.
It's time we started thinking about putting computing on solid footing, and expecting our hardware to deliver predictable, reliable, repeatable results would be a good first step. (Don't even get me started on the state of software...)
yup.
http://en.wikipedia.org/wiki/Long_and_short_scales
sheesh. what _do_ they teach kids in schools these days?
There are American and European ways to count to a billion? That's news to me.
First get the numbers straight:
- The test period was 2.5 years, not 3.
- The failure rate was per American billion of hours, not the 1,000 times larger "European" billion.
For a computer with two memory modules it boils down to an average of one failure per year of hard work 24/7.
"In a three year programme, researchers found that memory modules failed an average of 25,000 to 75,000 times for every billion hours of operation."
3 years? Really? Nothing better to do? They could have asked a selection of system builders the fail rate in systems and scaled up from that. Memory failure rate is shockingly high no matter what the brand involved but that's pretty much just the way it is.