`
`-
`
`.
`
`’3 b1"
`
`.,
`
`»
`
`fi“.
`“\
`y].
`myJigmumt‘gmgiemnugrwpsWeiméwgMug12251S9.786°M0mwshastwmodfi:iaéZWU'I‘i-‘owwmumifin
`
`Frwm: dmflfifiiwanu,ox.au (Albart Langer}
`NewsgrougSz nlc.aoutwaa.d,comg.archiVea‘afimin
`Subject: Re: fil/éescribe {File dascripzians} paataé t0 &1t.eau¥ces
`Measage-ID; 419915ug7.225159.786anewshoac.aau.edu.3u>
`Date:
`7 Aug 91 22:51:59 GET
`Ratarnuc«a: c1991Au97.124457.6814acav.v1cccl.adu,au>
`<1991Au97‘133643$6817aeev.viccsl.efiu.av>
`33nd&r;'newaanewnhoat.anu.edu.uu
`‘
`Fellowu§~fo: camp.archives.a6min -
`Organiwatinn‘ Comimter aewic‘ee Ccn’érre, Avatfilims‘flatimflal Ufi‘iv‘erfiity,
`Canbfirra‘ Australia,
`~
`Lines: 391
`
`,
`
`In article «1991Au97.131948.8817®csvnvicccl.edu,aua figmcawcsv‘vificélgedfifafi
`(Tim Cowk) writae:
`»
`
`>1 hava juat posted a naw version of “diffiaacribe” (ptavioualy known &8
`>"dis/éescribe") to alt.acfirces. This acticn wae fitflmptfid by Christian
`‘ $86hlichtharle'a <6hria®3ttt¢n.xuhr.fiax wasting ofi “ale“ and chan “v18“t
`
`Campatiticn werks wandars!
`
`Thu "d1“ camm&nd haw has a racurniva
`a! bava alga added a raw tngngn.
`>{«R) aption,
`juat like 13(1)
`f...1
`
`I atill think
`I am very glad to fiea you have taken up this auggaatiou. Bat
`it is important for ALL aptions of 13 (infiluding tha na uptiona case)«
`to wark IDEWTICALLY ha 18 itself - especially when aha program is
`called with aha name
`($6)
`is. That ehaula be a VERY aany change to
`maka (just call 13 z-} and'ahould remove REY raaervatSens abaut
`raplacing In with d1 fax ftp aitea.
`(Dascriptiana aheula be addad
`when¢vnr a apecial option, never used by either sya V at BSD 13
`is givan, or perhfips also as a default when $0 is not 13, with an bption
`t0 5upptfiaa it.)
`
`Anather euggaaCion: It would be nice to allaw-multifile line
`deacripcicng, aVen thcugh that may be awkward with ébm. As well 33
`allowing fiat caaaa whan cue lina juat 18 nut encugh,
`thia~cculd
`parmit eventual transition_to complete MARC racmrfla tram the OCLC
`project at ether forma of more detailed cataiogingr A150 lcng
`filanamas should be handLéd transparently. h eimpla éiaplay convention
`could ha that anything'KCarting in the flfirat column is a naw fiilenams
`(6x a cantinuatiaa 05 a previous ridiaulcualy lung filaname that did
`net and haters the cad of the previoua line). Anything that anarta
`with whizeapace is a aontinuation of a deacriyticn from tha previews
`ling or if at the top.
`:3 a descriptimn oi the directory.
`(A similar apnventicn 1% need in FILEfi.BBS.filaa that play a aimilar'
`rain an scma Bfifiea}.
`
`wfiiffldef M0936?
`5|
`
`#1 wfiuld implnre all thwae looking for a syatem of getting and listing
`wdeacriptiva file cammants to invmetigate dl/flgscriba.
`I think it is a
`gvary 990d aolucicn (I expecially think tha uae of flfiM filea ta atore
`
`
`
`fififlyr’gmupwwgmmmfgmups732m:1991A-ug7.m€%40n mfiastanuwu.WéaerTngafioyngfllain {1 6f 5)”WWW 5:79277 PM] '
`EMCVM.W .1003
`
`EMC 1003
`
`
`
`r
`
`_
`>
`fimzf/gmupfisoogicrmvgmumficlmt I99! Aug7'.225 159.vaswmcmhaa‘wutéutwwwwwwmwfmgplnin
`v
`
`>deacxipticna to be a superior mehhod), and it has alxaady been installed
`win saveral Anonymaue FTP Sitflfi atuund the werld.
`I glam-on Making it ea
`vportable ag pc§sibla, and getEing it to wash with the arable systew. and
`>aa m&ny ftpfi'a an poaalble.
`
`y a
`
`fiendif
`
`x would like to audarae that strongly, with an risk bf imfiqdaaty :‘)‘
`
`rm and in an Chria fiid,
`I hopa yam quickly add utilitiea for mv, mp,
`and alga for procesaifig MAEIFEST files. Thasa ehculé b3 quit;
`trivial.
`(I alaa hope Chris upgrades v15 for full c0mpat&hi11ty
`with la and to fine abm and that campetition contlhueu a~)
`
`t
`
`‘
`
`Again, cheae utilitiea shculd function IaERTZCALLY be the mermal
`Sya V and BSD veraiona with ALL optiona, an that usara can mimgly‘“’
`'includa them anead Of
`the normal veraiona in thair-patha withcut
`ANY acrigts breakifig. Incidentally, you might was:
`to think about
`a posaible variation that refiefines tha apprupriata system
`calla thamaalvaa in an inatallable fila ayuceml n3 wall an proviéing;
`trfinsparent nae of descriptionr alcng with filas fut C pragrams
`&s vali'as shell script&,
`tkie might ba 3 way to impl&mfint
`lsng
`Elle names and aven aymbolic linka an a Sya V 3.2 lnatallabla
`'tile syatem ~
`thua earning eternal gratituae at parhapa hard_
`walk trgm many people unable to upgrafie to Syn V Rt:
`
`‘
`
`%
`
`SOLVIRG TEE FTP fifln CRTALOGING PROBLfiM
`
`It looke like all the piece§,are falling intc 9136a for a thcrongh
`solution to ftp problems (X4500, arable, pxaayerc, WAIS,
`tha OCLC project,
`Mark Maraea batched ftp and now d1 or via).
`
`;
`
`I think it.ia also,
`$1 la much smallar than aama of the cthars but
`VERY imgoraanc so as to 6&ptuxe aeacriptiona glong with Ziianamaa
`themselves. Thia proviaaa the raw material for a fiacant
`lndaxlng
`ۤCillty
`for X.560, arckie, prosperc and WRIB without axtra wcrk adding
`deacrlptiona centrally. That is essential both as an intarim meaauxe and-
`to prevlde raw material for future upgrading n0 propar catalcg
`recorda when Chane hava been definad by tbs OCLC yrajeet.
`
`‘
`
`I: should jug: bacome expected that anybudy making a file
`available for {cg will ALSO add a Onfi line description w a3
`is usually the cage an the most primitive aflsas. Tha fiburdefi"
`is unlikaly to be resented as it-makes the‘iilea that much
`more useful
`:0 the ftp site a8 well, and it 13 a Pkg smaller
`burden tor each ftp Site than it would be far centrall
`databaae maintaineral
`
`the quicker
`Thu quiaket 61 (av v13} gate widely used at ftp witfia,
`in will be pcaaible ta find potentially lntareabifig packagaa by
`keyword quaries an arehia etc inateaé af just by fiiia nama or
`ficrtlon at filaflame (0r with restricticn ca bnly incumplete
`keyword indéxefi maintained centrally).
`'
`
`Mtpz/I‘Qmuns.gwgfe.com/grougs7se£m=r1991Aug7.,1;5%40rwwshostawflduwfiaefiUTFMOUipW-ngaifl[20WH7/29/2003 5:19:17 PM]:
`
`
`
`W Whip:l/groups.googlcmm/gwupskcimaI 991Au37.225 i59.786%40nmh03!.anu.edu.auWUfi-‘sgwumuwmsin
`
`‘In the moantime it'ia a great utility for onhano$ng tho normal use
`of any ftp aite and simplifying administration of directory description
`Eilea etc (and likewise for Reaping track of filea for any user on
`any unix syetem).
`
`tools will be neefied, as Tim
`As soon as'it gets widely used;
`mentioned,
`for getting it to mesh with the arohie syatom and
`with ftp.
`In particular:
`'
`
`‘1. Collection of descriptions by archio and delivery (optionally)
`'along with filenames in xesponae to queries.
`
`I2..Ditto“for proaparom
`
`g V
`
`3, Soarohiog of raw descriptions by keyword queries for.X,500, orchio and
`WAIS.
`
`It would bo nice to algo prDVide this within d1 itoelf ~ or jug: as a 81mp1o
`utility to periodically do a recursive listing of descriptions and either‘
`index that file or use grep on it.
`
`a. Facilitiea for reviewing multiple raw deacriptiona of the aame
`filename and selecting one, or editing a now description that can
`optionally be uaed aa 3 "revisefi" description instead of the raw
`description in response to a query (and for keyword searching).
`
`to pickup descriptions and merge them automatically in‘anx
`5. Ways
`ftp sesaion.
`(One way involvea running the dl command and prooeseing
`the output locally, but an alternative may be to get
`the (peg file
`from each directory and process that
`v if it can be made machine
`independent).
`'
`
`.
`6. A way to feofi back "revised" descriptiona from 4 to fihe indivioual
`ftp oitos for optional replacement of or addition to their raw descriptions «
`perhapa uaing methodfi alao developed for 5.
`
`mamas IDENTIFISRS
`
`Finally, anothoz iaaue“that will ariae is that of uniquely identifying
`files which may have different names and/or be in differanc directories
`on different systems (and also of being sure that filea with the same
`v name are identical ~ we don’t even have the date preserved across
`'
`ftp transfers and can only rely on'the file size).
`
`Specifying a file in a news arcicla in the form host.domaia:path/filename
`is fine as far as it goes. But any automatic extraction of that to
`place a file request will go to the particular ftp site oven though
`closer sites may have;the same item. Cache and mirror sites can
`partially aolve the problem. eapacially if extendad more widely
`through an enhancement of Mark Morass scripts, but there will still
`be a strong temptation to juat taka the easy way out, and not bother
`inatalling software cnav adds a delay consulting arehie
`or any other method to check where to obtain a filo locally.
`
`If comp,archivea and‘WAIS etc provide a unique identifier
`for each file which is independent of location, and there are convenient
`
`g
`
`
`
`Nipz//groups.90091a earn/groups 7391;212:199 Mug 7‘..o8%40newshostanuedu,au&oa=UTFv8évautxgplaln (3 of 6) [7/29/2003 52:19:171PM]
`
`
`
`~
`
`'http://woups‘gwglwom/gmups'lsolmu1‘99! Aug’mzs l59.786%40ncwhoatmuxduwWW~8Mummvgplaln
`t
`
`‘wayo to automatioally insert that identifier into a nowa artiolé when'
`referring to a file,
`then users would HAVE to lookup a directory“befoie
`ftping the tile, and could then be automatically informed at the
`noaxeat
`location.
`(This need be no burden on the uoor ~
`they abould,
`be able to requeat by the unique identifier and have tho request
`acted upon by the appropriate ftp archive in one operation while"
`reading news or mail
`- at dial~up sites juet au well as on tho(
`‘
`internet).
`lnciflontally,
`funding thin aspect of'comp.archivea and
`archio etc could be justified by RSFHet and regional networks 33
`being concerned solely with baodwidth conservation tathor thanfl
`information value adding.
`
`A simple method of defining a unique ioentitier that does.NOT'in¢lude
`a particular aite identifier would be to uso a
`'
`‘
`hash function on the entire contents of
`the file. This can be generated
`vlocally without requiring a registration system and it long enough the
`chances of collision are negligible.
`I would suggest using a cryptographio
`hash function such as MDS which generateo a 16 byte result. The extra
`.work to use cryptographic hashing is only done once when aaaigning‘the'
`unique identifier and is therctore unimportant. But for any users
`,
`that DO wish to check validity, it provides a VERY aecure means of ensuring
`they have got an uncoxrupted version of
`the apecific file they were told
`about,
`regatdleoo of where they can get it from.
`(There
`is currently no publicly known way to generate a file that would‘
`produce the same 16 byte HUS code as any given file}.
`
`instead of providing accession lists with bibliographic inflormation
`in order to Establish union catalogs,
`it should be quite oimpla tor
`ftp sites to notify the MDS codeo and local directory path/tilename
`of new filea to central databaae aervera. Use of MOS could prevant
`pooaible oabotage of a syatem bated on easily duplicatefl CRCa Eat
`well as providing a valuable service combating dissemination of viruses
`and serving varioue other authentication functions.
`'
`
`Utilities for inserting the unique cooe into a news article or mail
`message (along with a marker for automatic extraction) can aimply
`calculate the M05 tunction from a local copy of
`the file, But to speed
`thiuga up it might be better to include the result
`in the local d1 or
`vle Bystem where it’oan be accessed quickly (though not normally dieplayod
`unleaa asked for). This could be combined with an enhancement to allow
`find like tree nearching of ALL file descriptions rather than just»
`those in a particular directory, ané/or use a-separate index by MOS
`code.
`is a separate index is provided for a
`(hidden) MD5 aubfielo of
`‘the descriptiona, similar indexing could be made available on other
`iields or on all words of the description at
`the same timco‘
`I
`
`A simple ftp implementation would just hardlink every file available to: ftp-
`to a filename encoding of it's MUS token. Users would then ftp the
`directory path and filoname'of the M05 token and obtain the file. An'
`archie or similar lookup could firat determine which-noarby=aystoms Kayo the
`file (though some to think of it,
`that database lockup may aa well
`also provide the local directory and filoname for it). For dial~up
`altos a mallyaervor requeat could be chained until it reached a site
`with directory acceoo, and the files requested added to temporary
`
`(toy/mags“mumfgmpsuetms199mug?A.i.‘§%d0newsfias£aifiu.atomoe=UTF«8&eat9ui2gpla(§ (4 of6} fl/29/2993 5:15:17Pm
`
`
`
`h ‘éYrm-J/mups’gocglc‘com/groupWstimn } 99 I Aug7.225 f59.786%4Omwshbstmu.cdu.au&oe¢UTF‘8&ouxputvgplain
`¢
`
`caches on the way tacky
`
`For compatability with ALL Posix systema, dnly 14 character file ahd
`directory mamas are available. Uaing a aimple 6 bit amending with
`only 64 characterd allows a Eilename to represent 88 bits. Directory
`names could be added ta proviéé the remaining 40 bits or the MES code
`could be truncated if increased ccllieicns and leaa accurity ware
`acceptable.
`
`, Any cclliaiona cculd be placed in a public central liat, and the files
`affacted aacigned new unique identifiers (e.gr append the MDE result to
`the file and try again). It probahly would act ha worth it, but
`individuhl cites could maintain copies of tha public list so at to rename
`any fiilca far which ccllieicna later accur.(ct prmvide-both iéentificta
`wherc there is no local cclliaionl.’
`>
`‘
`
`PRCKAGES CONTAINING A:DiRECTORY OR'DIRECTORY TREE
`
`information‘4
`A related problam is that ecaentially the same collection of
`may be available as airfarent
`.tar.z or 200 or ZIP or char filea etc.
`,
`Thic happena capecially with files distributed through sources newsgrcupa
`and archived with different methods
`(or Even with the same methods. but
`including tha-local headers, which are differcnt). xt will alto happen
`where a local modification has been added to a package.
`
`Ultimately these dc have to be ragarded as DIFFERENT files and any
`connecticna between them listed aeparately. Nevertheleaa a user may
`be wandering whether to itp a package that has a flew MES cede to see
`if it ccntaine new revisions and it wcul& be nice to be abla to
`tell the user without
`the need for collecting tha entire package.
`
`A simple ccnvention should require that the code 18 always calculated
`on the raw filt rather than on the .Z vereimn (at equivalent for any
`othar‘compreasicn scheme). Alta text files anculd‘be encoded item the
`unix form (ASCII code with L? as line end and TAEa nut expanded).
`
`Likewise the code for a tar or cpio qr ZIP archive etc aria collection
`at shar files (with or withcut uuenccding etc) could be the
`code obtained by applying MES again to the concatenation of the codes
`of
`the extracted files,
`in numeric order.
`(This delibarately
`loses any date and modé or ownership iniormaticn and also loses
`the tilename and directory structure information although there arc
`argumenta for retaining the latter and it could be dmnc easily enough
`by preceding each MES cede with the filepath relative t0 directory .
`4
`as the top of the package).
`
`That ccnvention would help a ldt, but dues nut aclveithe problem
`cancerning packages that ARE slightly different.
`
`The best approach for the lattar would be for'Praspgro and archie etc to
`explcde the contents of such files and list the individual
`items within thtmi
`This can easily be'dcne using the convention that
`the .2 cr .tar;z_or
`.219 file etc is treated as though it wart a directcry, and the
`Contents of that directoty are then listec as files {including
`
`has;:i/groutsgwgfawmigmups?selm=1991Aug'?‘,(.GWDMWSfiostam}.eduavgzcez‘UTF~8&GUf&ift=gpf3§fi (5 {£6} {7/29/2093 5:19:37 PM}
`
`
`
`‘ ‘hand/groups,googlctom/gwupmdmw{991 Aug7,225 I 59.786W0ncwshoatmu.edu.aqum-EMumm-rgplnm
`‘
`L
`1
`fl
`
`subfdirectcriea and similar patkags filea within a package file "
`treated recursively).
`I believe there are soma {up implementations
`that already provide something like that as infiividutl files can,
`be requested or a tar file of a directory and so that either
`unecmpreasad or .2 versions can be requested.
`
`a
`
`It would be nice if d1 or via could aupport this too, with '
`deacripticna {and automatic extraction of existing‘deacriptionsw
`where they are avtilable frdm within ZIP and ZOO files etc)“
`Some thought should ba given tor conventions to include d1 or
`via description file equivalents such as MANIFEST and FILES.BBS files ‘
`within packageg that don‘t have their own facilities for archive
`..
`~description‘(e.g. shat.
`tat and~cpio ~ unlike ZIP and ZOO).
`or to automatically make use of those already aVailable;
`
`,v
`’ w“;
`‘
`
`There may alga be tathniques for thawing aimilarityflbetween
`packages (é.g.
`that only 1 file ie different or new), using EomQ tarm'
`of multiwattributa hashing insteaé cf the Simple convention I
`suggested above, Thin should be examined beiore adapting any
`such conventicn.
`‘
`~
`
`.
`(x don't recall the details of multi~attribute hashing but it invmlves '
`composing an overall hash result out of bits chasen from the result of haahifig”
`each at
`the attributea at that
`itama with timilar attributea'will cluatex
`together. Care would be required ta allow identification of two paCkaggs
`which differed only in 1 E113 being modifiiad or adfied, while,prcaerving
`cryptographic security).
`'
`a...
`
`‘
`
`Opinicna dieclaimed (Authoritative answer {rem opinicn server)
`Header reply addreaa wrung. U82 cmeSlflcsc2¢anu,edu.au
`
`http:/Zg-raatsxgatgfacnmjgroupmeéms1991mm;..;6%40newtmwnu.e-du.aMesa:UTFvS’éGutputsgpfgm ‘(6 0%) §f7/2912€703 5:19;}? PM]
`
`