return to main page

SMS-Modules - Maintenance - Ron's Bug Logs


Purpose of this page

Begin discussions of:
a)- more effective SMS card fault finding,
b)- finding/fixing marginal components during maintenance periods rather than in general usage time.
c)- Ron's Bug Logs


Since late January 2005, we have had enough 50 Hz power to operate the 1401 processor - a bit marginally, but SMS card problem chasing could begin.

For the past 24 months ( February 2005 through January 2007 ) we have been locating SMS faults as they occur - replacing the SMS card from spares when available, and depending upon the special skills of Tim Coslet and Ron Crane to find and replace bad components. (Often Ron Williams or Bob Feretich have isolated the probable fault to the transistor, diode, inductor, ... and added that note to the bag with the defective card.)

There has been some pressure/worry that the special skills to do the above will be in short supply in the future.
Also, it would be good to find marginal components at a more convenient time (maintenance period) rather than during general usage.

E-mails - "in no particular order" ;-))

from Robert Garner January 30, 2007 from Robert Garner January 30, 2007
Hi Grant, 

Thanks for getting the variable-edge-rate pulse generator! 

Your idea of locally heating (or cooling) a gate of cards is intriguing. 
May be worth trying, particularly if it doesn't find very many problems. 

Alternatively, we could also try just spray cooling cards in a gate, one by one. 
At least you might know WHICH card may be a culprit (or some other card's 
signal connected to it.) 

...

...
How was this approached at DEC?  (i.e., how many bad boards did 
voltage margining tend to reveal in a single PDP-1 thru -9 class of machine? 

I still think we need to get a "semi-automated" SMS card tester going 
to search for weak levels, slow transitions, voltage margins, etc.. 
As we "decided" a year ago, we would then want to go thru ALL 3,000 cards 
(or at least the low hanging fruit; i.e., easier ones, skip Star cards for instance.) 
Plan was to do all cards of each type in all gates (~200) at a shot. 
This will be a slow process. 

Although I don't pretend to believe everything is "fine" 
(as I've said: the system has to be able to run for several months without 
crashing before we can really let it loose on docents and demo it), 
I may have a slightly less pessimistic feeling about its health. 

Recognizing the "known" temperature sensitivity problem for what it was, 
(which Ron found as a bad transistor in an overlap signal on a STAR card on Jan 13th) 
for many months the 1401 CPU has otherwise been stable (i.e., not degrading), 
until this 1402 interface failure last Weds. 
No 1401 instructions have (re-)failed after having been repaired. 
The 1403 interface has been failure free for about a year. 
(Bob - I recall you had a TAU/729 card that used to work re-fail?) 

I agree that we should find out why the 1402 is having problems writing 9 amp's worth 
of cores with a card of all group marks (we need to measure the core currents), 
but this problem is not a show stopper (hasn't gotten in the way of any card 
binary running to date.) 

We also only have 8KB of memory, so it would be good to try to fix the two 4K problems 
(one case need to observe core write currents again), but again, this shouldn't get in 
the way of running demos... 

Let's discuss further... 

- Robert


Ron's Bug Logs

Ron'sBugLog01-28-.jpg
Ron'sBugLog29-56-.jpg
Ron'sBugLog57-84-.jpg
Ron'sBugLog85-112-.jpg
Ron'sBugLog113-116-.jpg


return to main page