Faculty Fellows
2000-2001 Final Projects
Bioinformatics
Exercise I
Kathleen L. McGuire,
Ph. D., EdCenter Faculty Fellow
Bio585
MB610
1.
Go to the Entrez homepage at www.ncbi.nlm.nih.gov/Entrez/
a)
Note the options open to you; explore the site if you wish to know what
is available at Entrez.
b)
Click on Nucleotide
c)
Pick a human gene of immunological significance and enter search term;
if the number of hits is too large to evaluate:
1)
Use "and, or & not" terms (not is particularly useful if you look
at the list and pick keywords that will allow you to eliminate records
you don't want);
2) Search for human sequence only (genus and species
must be used; the databases don't accept common names); and
3) Also use Limits
to eliminate patents, ESTs, SSTs, etc. (Do not eliminate GSTs)
d)
Choose a record for a genomic gene. That the sequence is genomic
is often indicated in the name of the record by the word "gene" (in contrast
to labels like mRNA, partial sequence, or promoter). Make sure that
your record contains the "complete cds" (complete coding sequence).
Note the accession number for the record of the genomic sequence and look
over the record carefully, noting what information it contains.
Either using the same gene, or a different one if your original search
did not reveal one, find a record for a cDNA. This is usually indicated
by the term mRNA in the name of the record (in contrast to labels like
gene, partial sequence, or promoter). Make sure that your cDNA contains
a complete cds, note the accession number and look carefully at the information
provided in the record.
e)
Answer the following questions: What are the major differences in
the records for these two kinds of sequences? What are the similarities?

2.
Enter the Biology Workbench at http://workbench.sdsc.edu/
a)
If you are new, set up a free account. Follow the instructions. WRITE
DOWN YOUR USER NAME AND PASSWORD!
b) Click on Session
Tools, highlight Start new session and click Run
c) Name your session and click Start
session (to resume session, mark the old session and highlight Resume
session and click Run)
HINTS
FOR BWB: ALWAYS USE RETURN TO GO BACK--DO NOT
USE "BACK" BUTTON
MAKE SURE THE SESSION YOU WANT TO USE IS LISTED AT THE TOP OF THE PAGE
BEFORE YOU START TO WORK

3. Click on Nucleic Tools
a)
Repeat the search that you did in Entrez (remember to indicate you want
the homo sapiens gene and cDNA). To do this, highlight Ndjinn - Multiple database search and click Run
b) Choose a database (can choose more than one) -
For this exercise choose Genbank Primate Sequences.
To do this check the box next to this database.
c) Select Exact Match
and Show All Hits
d) Click Search (If too many hits are obtained to evaluate,
you can re-search with "and, or & not" terms).
e) To verify that you have the sequences you desire,
highlight sequence of interest, click on Show Record, and look at
the record (make sure that you have human sequences and that they contain
a complete cds). Are the records you obtained in BWB identical
to those obtained in Entrez? Did you get the same hits with this
search? Which method do you prefer for searching the database?
HINT: If your search does not yield the same records that you found
in Entrez, you can research in BWB using the accession numbers you recorded
above. This may or may not work to bring up the same records you
obtained in Entrez.
f) You want to download the genomic sequence and the cDNA sequence
into your BWB session. To do this, highlight sequences of interest,
click on Import Sequences.
The sequences you have highlighted will now be in the nucleic acid portion
of your session.

4. In Protein Tools, select Ndjinn
- Multiple database search
a)
Search GBPRI
b) Enter search terms used above for the cDNA sequence only and
hit Search; remember to use and, or & not terms to limit the results obtained.
c) Import Sequence
of interest (be sure to choose the human gene) into the Protein Tools section of your
session. It will go into the proteins tools section automatically
beause you are using protein tools already.
d) Select a sequence, highlight View Protein Sequence and
click Run
e) Note the first 6 amino acids of your protein.
What are they?

5. Return to Nucleic Tools, select the
cDNA sequence
a)
Highlight Sixframe - Generate & import 6 frame
translations on a NS, click Run
b) Select Show sequence being translated and 3 forward frames, then Submit
c) Import Sequences
(the translation will be deposited in the Protein
Tools section of your session)
d) Which reading frame encodes the protein? How can you tell?
e) Identify the Start Codon/Methionine in your cDNA
and the translated protein sequence.

6. Return to Nucleic Tools, select cDNA
sequence
a)
Highlight TACG - Analyze a NS for restriction enzyme
sites, click Run
b) Use all default parameters except use 3
Forward Frames under "Output Parameters - Display Translation"
and 0 for "Smallest fragment cutoff size"
c) Print the whole file; Identify start codon and
stop codons and the open reading frame
d) Compare this output to that seen above in #5. Which
is easier to use when you are working with a nucleic acid sequence?
Which type of information would be more useful in the lab if you wanted
to manipulate the sequence of the protein?

7. In Nucleic Tools, select your
cDNA sequence
a)
Highlight BLASTN - compare a NS to a NS DB
and click Run
b) Highlight GenbankOther Mammalian Species database;
otherwise use default parameters, click Submit
c) How many "hits" did you obtain? Look at what
species are represented there. Which species is the most closely
related to the human gene? The most distant? (Be careful--make
sure that you are looking at the same gene. Sometimes the most distantly
related product in a search is a different gene.)

8. In Protein Tools, select your
protein sequence
a)
Highlight BLASTP - compare a PS to a PS DB
and click Run
b) Select Swissprot database; otherwise use default parameters,
click Submit
c) How many "hits" did you obtain? What is the
format of the information obtained. Look at what species are represented
there. Which species is the most closely related to the human gene?
The most distant?
d) Return to Protein Tools
e) Highlight BLASTP - compare a PS to a PS DB and click Run
f) Mark Use the SDSC Non-redundant Database; click Submit
g) Did you get more or less "hits" than with Swissprot?
How do the "hits" compare? What is the format of the information
obtained?
h) Highlight BLASTP - compare a PS to a PS DB and click Run
i) Highlight GenbankOther Mammalian Species database;
otherwise use default parameters, click Submit
j) How many "hits" did you get? Are there fewer
or more? Look at what species are represented there. Which
species is the most closely related to the human gene? The most
distant?
k) Select at least 15 "hits" from different species
(make sure you are picking records for the SAME GENE from other species); Import Sequences; These will
go into the ProteinTools of your session. If
necessary, do a BlastP search of the rodent or primate databases to pick
up 15 different species (HINT: you can identify the species and if the
record has a complete protein sequence by View Database Records of Imported Sequences. If you can't identify the SAME
GENE in at least 10 species, pick a different gene.)

9. In Protein Tools, select the
records for the protein sequences of human and other species.
a)
Highlight CLUSTALW and hit Run; Use default parameters
and Submit
b) Which of the two species are the closest relatives?
Which are the farthest apart? What data gives you this information?
c) Import Alignment;
it can be found in the Alignment Tools of your session.

10. In Alignment Tools, select the
ClustalW alignment now in your session
a)
Highlight Edit Aligned Sequences and
click Run
b) Under Label: rename sequences to common names, click Save
c) HighlightBOXSHADE - Color coded plots of prealigned sequences
and click Run
d) Use default parameters; click Submit
e) What does this tell you? What do the different
colors mean?

11. In Alignment Tools, select the
ClustalW alignment now in your session
a)
Highlight DRAWTREE - Draw unrooted phylogenetic tree
from alignment and click Run
b) Use default parameters; click Submit
c) What does the unrooted tree tell you? Can
you tell which species are most closely related? Which looks the
furthest from humans?
d) Highlight DRAWGRAM - Draw rooted phylogenetic tree from alignment
and click Run
e) Use default parameters; click Submit
f) What does the rooted tree tell you? Can you
tell which species are most closely related? Which is the furthest
from humans?
(HINT: I prefer to use the short labels myself rather than
the accession numbers)

12. Return to the BWB homepage; click on Protein
Explorer
a)
Click on Find your Molecules PDB ID code
b) Click on PDB
Lite
c) Click on USA database
d) Enter your search term and hit return
e) Click on Retrieve
released data matching your query
f) Note PDB ID number of your molecule; select the
sequence you want and click on View/Analyze/Save Macromolecule
g) Click on Protein Explorer; then Start
Protein Explorer
h) Look at the structure of your molecule and investigate what is
available in Protein Explorer
FOR
GRADUATE STUDENTS ONLY:
Using
the Restricition Enzyme map generated in #7 above, design two cloning
strategies. One clone is to be a mammalian expression vector using the
pCIneo vector from Promega (http://www.promega.com/)
and the second will be a bacterial expression vector for a His-tagged
fusion protein using the pQE 30-32 vectors from Qiagen (http://www.qiagen.com/).
REMEMBER:
If you do not have convenient existing restriction enzyme sites, there
are three major ways to introduce them into your DNA sequence. One is
by PCR, the second by cloning into an intermediate vector, and the third
by the use of double-stranded oligonucleotide linkers. (There are other
ways as well if you prefer to use them). I can answer questions for you
if you are having trouble with this concept.
|