Proteomics

Reader Eliot Morrison, a protein biochemist, has been looking for the longest English word found in the human proteome — the full set of proteins that can be expressed by the human body. Proteins are chains composed of amino acids, and the most common 20 are represented by the letters A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y. “These amino acids have different chemical properties,” Eliot writes, “and the sequence influences how the whole chain folds in three dimensions, which in turn determines the structural and functional properties of the protein.”

The longest English word he’s found is TARGETEER, at nine letters, in the uncharacterized protein C12orf42. The whole sequence of C12orf42 is:

MSTVICMKQR EEEFLLTIRP FANRMQKSPC YIPIVSSATL WDRSTPSAKH IPCYERTSVP 
CSRFINHMKN FSESPKFRSL HFLNFPVFPE RTQNSMACKR LLHTCQYIVP RCSVSTVSFD 
EESYEEFRSS PAPSSETDEA PLIFTARGET EERARGAPKQ AWNSSFLEQL VKKPNWAHSV 
NPVHLEAQGI HISRHTRPKG QPLSSPKKNS GSAARPSTAI GLCRRSQTPG ALQSTGPSNT 
ELEPEERMAV PAGAQAHPDD IQSRLLGASG NPVGKGAVAM APEMLPKHPH TPRDRRPQAD 
TSLHGNLAGA PLPLLAGAST HFPSKRLIKV CSSAPPRPTR RFHTVCSQAL SRPVVNAHLH                                             

And there are more: “There are also a number of eight-letters words found: ASPARKLE (Uniprot code: Q86UW7), DATELESS (Q9ULP0-3), GALAGALA (Q86VD7), GRISETTE (Q969Y0), MISSPEAK (Q8WXH0), REELRALL (Q96FL8), RELASTER (Q8IVB5), REVERSAL (Q5TZA2), and SLAVERER (Q2TAC2).” I wonder if there’s a sentence in us somewhere.

(Thanks, Eliot.)