[EdCenter] [Projects] [News] [EPIC] [Training] [People] [Resources] [Related Links] [Directions]
http://www.edcenter.sdsu.edu/faculty-fellows/spring2001/kathym/

Faculty Fellows 2000-2001 Final Projects

Bioinformatics Exercise I

Kathleen L. McGuire, Ph. D., EdCenter Faculty Fellow
 

Bio585           MB610

1.  Go to the Entrez homepage at www.ncbi.nlm.nih.gov/Entrez/

a)  Note the options open to you; explore the site if you wish to know what is available at Entrez.

b)  Click on Nucleotide

c)  Pick a human gene of immunological significance and enter search term; if the number of hits is too large to evaluate:

1)  Use "and, or & not" terms (not is particularly useful if you look at the list and pick keywords that will allow you to eliminate records you don't want);
2)  Search for human sequence only (genus and species must be used; the databases don't accept common names); and

3)  Also use Limits to eliminate patents, ESTs, SSTs, etc.  (Do not eliminate GSTs)

d)  Choose a record for a genomic gene.  That the sequence is genomic is often indicated in the name of the record by the word "gene" (in contrast to labels like mRNA, partial sequence, or promoter).  Make sure that your record contains the "complete cds" (complete coding sequence).  Note the accession number for the record of the genomic sequence and look over the record carefully, noting what information it contains.  Either using the same gene, or a different one if your original search did not reveal one, find a record for a cDNA.  This is usually indicated by the term mRNA in the name of the record (in contrast to labels like gene, partial sequence, or promoter).  Make sure that your cDNA contains a complete cds, note the accession number and look carefully at the information provided in the record.

e)  Answer the following questions:  What are the major differences in the records for these two kinds of sequences?  What are the similarities?
 


2.  Enter the Biology Workbench at http://workbench.sdsc.edu/

a)  If you are new, set up a free account. Follow the instructions. WRITE DOWN YOUR USER NAME AND PASSWORD!
b)  Click on Session Tools, highlight Start new session and click Run

c)  Name your session and click Start session (to resume session, mark the old session and highlight Resume session and click Run)

HINTS FOR BWB: ALWAYS USE RETURN TO GO BACK--DO NOT USE "BACK" BUTTON
MAKE SURE THE SESSION YOU WANT TO USE IS LISTED AT THE TOP OF THE PAGE BEFORE YOU START TO WORK


3.  Click on Nucleic Tools

a)  Repeat the search that you did in Entrez (remember to indicate you want the homo sapiens gene and cDNA).  To do this, highlight Ndjinn - Multiple database search and click Run
b)  Choose a database (can choose more than one) - For this exercise choose Genbank Primate Sequences.  To do this check the box next to this database.

c)  Select Exact Match and Show All Hits

d)  Click Search  (If too many hits are obtained to evaluate, you can re-search with "and, or & not" terms).

e)  To verify that you have the sequences you desire, highlight sequence of interest, click on Show Record, and look at the record (make sure that you have human sequences and that they contain a complete cds).   Are the records you obtained in BWB identical to those obtained in Entrez?  Did you get the same hits with this search?  Which method do you prefer for searching the database?  HINT:  If your search does not yield the same records that you found in Entrez, you can research in BWB using the accession numbers you recorded above.  This may or may not work to bring up the same records you obtained in Entrez.

f)  You want to download the genomic sequence and the cDNA sequence into your BWB session.  To do this, highlight sequences of interest, click on Import Sequences.  The sequences you have highlighted will now be in the nucleic acid portion of your session.


4.  In Protein Tools, select Ndjinn - Multiple database search

a)  Search GBPRI
b)  Enter search terms used above for the cDNA sequence only and hit Search; remember to use and, or & not terms to limit the results obtained.

c) Import Sequence of interest (be sure to choose the human gene) into the Protein Tools section of your session.  It will go into the proteins tools section automatically beause you are using protein tools already.

d)  Select a sequence, highlight View Protein Sequence and click Run

e)  Note the first 6 amino acids of your protein.  What are they?


5.  Return to Nucleic Tools, select the cDNA sequence

a)  Highlight Sixframe - Generate & import 6 frame translations on a NS, click Run
b)  Select Show sequence being translated and 3 forward frames, then Submit

c) Import Sequences (the translation will be deposited in the Protein Tools section of your session)

d)  Which reading frame encodes the protein? How can you tell?

e)  Identify the Start Codon/Methionine in your cDNA and the translated protein sequence.


6.  Return to Nucleic Tools, select cDNA sequence

a)  Highlight TACG - Analyze a NS for restriction enzyme sites, click Run
b)  Use all default parameters except use 3 Forward Frames under "Output Parameters - Display Translation" and 0 for "Smallest fragment cutoff size"

c)  Print the whole file; Identify start codon and stop codons and the open reading frame

d)  Compare this output to that seen above in #5. Which is easier to use when you are working with a nucleic acid sequence?  Which type of information would be more useful in the lab if you wanted to manipulate the sequence of the protein?


7.  In Nucleic Tools, select your cDNA sequence

a)  Highlight BLASTN - compare a NS to a NS DB and click Run
b)  Highlight GenbankOther Mammalian Species database; otherwise use default parameters, click Submit

c)  How many "hits" did you obtain?  Look at what species are represented there.  Which species is the most closely related to the human gene?  The most distant?  (Be careful--make sure that you are looking at the same gene.  Sometimes the most distantly related product in a search is a different gene.)


8.  In Protein Tools, select your protein sequence

a)  Highlight BLASTP - compare a PS to a PS DB and click Run
b)  Select Swissprot database; otherwise use default parameters, click Submit

c)  How many "hits" did you obtain?  What is the format of the information obtained.  Look at what species are represented there.  Which species is the most closely related to the human gene?  The most distant?

d)  Return to Protein Tools

e)  Highlight BLASTP - compare a PS to a PS DB and click Run

f)  Mark Use the SDSC Non-redundant Database; click Submit

g)  Did you get more or less "hits" than with Swissprot?  How do the "hits" compare?  What is the format of the information obtained?

h)  Highlight BLASTP - compare a PS to a PS DB and click Run

i)  Highlight GenbankOther Mammalian Species database; otherwise use default parameters, click Submit

j)  How many "hits" did you get?  Are there fewer or more?  Look at what species are represented there.  Which species is the most closely related to the human gene?  The most distant?

k)  Select at least 15 "hits" from different species (make sure you are picking records for the SAME GENE from other species); Import Sequences; These will go into the ProteinTools of your session. If necessary, do a BlastP search of the rodent or primate databases to pick up 15 different species (HINT: you can identify the species and if the record has a complete protein sequence by View Database Records of Imported SequencesIf you can't identify the SAME GENE in at least 10 species, pick a different gene.)


9.  In Protein Tools, select the records for the protein sequences of human and other species
.

a)  Highlight CLUSTALW and hit Run; Use default parameters and Submit
b)  Which of the two species are the closest relatives?  Which are the farthest apart?  What data gives you this information?

c) Import Alignment; it can be found in the Alignment Tools of your session.


10.  In Alignment Tools, select the ClustalW alignment now in your session

a)  Highlight Edit Aligned Sequences and click Run
b)  Under Label: rename sequences to common names, click Save

c)  HighlightBOXSHADE - Color coded plots of prealigned sequences and click Run

d)  Use default parameters; click Submit

e)  What does this tell you?  What do the different colors mean?


11.  In Alignment Tools, select the ClustalW alignment now in your session

a)  Highlight DRAWTREE - Draw unrooted phylogenetic tree from alignment and click Run
b)  Use default parameters; click Submit

c)  What does the unrooted tree tell you?  Can you tell which species are most closely related?  Which looks the furthest from humans?

d)  Highlight DRAWGRAM - Draw rooted phylogenetic tree from alignment and click Run

e)  Use default parameters; click Submit

f)  What does the rooted tree tell you?  Can you tell which species are most closely related?  Which is the furthest from humans?

(HINT: I prefer to use the short labels myself rather than the accession numbers)


12.  Return to the BWB homepage; click on Protein Explorer

a)  Click on Find your Molecules PDB ID code
b)  Click on PDB Lite

c)  Click on USA database

d)  Enter your search term and hit return

e)  Click on Retrieve released data matching your query

f)  Note PDB ID number of your molecule; select the sequence you want and click on View/Analyze/Save Macromolecule

g)  Click on Protein Explorer; then Start Protein Explorer

h)  Look at the structure of your molecule and investigate what is available in Protein Explorer

 

FOR GRADUATE STUDENTS ONLY:

Using the Restricition Enzyme map generated in #7 above, design two cloning strategies. One clone is to be a mammalian expression vector using the pCIneo vector from Promega (http://www.promega.com/) and the second will be a bacterial expression vector for a His-tagged fusion protein using the pQE 30-32 vectors from Qiagen (http://www.qiagen.com/). 

REMEMBER: If you do not have convenient existing restriction enzyme sites, there are three major ways to introduce them into your DNA sequence. One is by PCR, the second by cloning into an intermediate vector, and the third by the use of double-stranded oligonucleotide linkers. (There are other ways as well if you prefer to use them). I can answer questions for you if you are having trouble with this concept. 

 
 

[EdCenter] [Projects] [News] [EPIC] [Training] [People] [Resources] [Related Links] [Directions]