Friday, October 5, 2018

Python Virtual Environment

A virtual environment is a named, isolated, working copy of Python that that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environments make it easy to cleanly separate different projects and avoid problems with different dependencies and version requirements across components.

Activating a conda environment modifies the PATH and shell variables to point to the specific isolated Python set-up you created.


1. virtualenv

virtualenv is a very popular tool that creates isolated Python environments for Python libraries. It works by installing a bunch of files in a directory (eg: env/), and then modifying the PATH environment variable to prefix it with a custom bin directory (eg: env/bin/). An exact copy of the python or python3 binary is placed in this directory, but Python is programmed to look for libraries relative to its path first, in the environment directory.

Once activated, you can install packages in the virtual environment using pip.


Install virtualenv:
  $ pip install virtualenv

Create a virtual environment:
  $ virtualenv yourenvname

Activate the virtual environment:
  $ source yourenvname/bin/activate
To leave my virtual environment:
  $ deactivate

To install all the required packages specified by the file requirements.txt:
  $pip install -r requirements.txt

If you want to have all the installed packages in a requirements.txt
  $ pip freeze > requirements.txt

To delete a virtual environmrent
 $ rm -r /path/to/yourenvname


2. conda
The conda command is the preferred interface for managing installations and virtual environments with the Anaconda Python distribution.

To see a list of available python versions:
 $ conda search "^python$"

Create a virtual environment with python version x.x for your project
 $ conda create -n yourenvname python=x.x anaconda

To activate or switch into your virtual environment,
 $ source activate yourenvname

To see a list of all your environments,
 $ conda info -e


To install additional packages only to your virtual environment
 $ conda install -n yourenvname [package]

Failure to specify “-n yourenvname” will install the package to the root Python installation.

To end a session in the current environment
 $ source deactivate

Wednesday, January 24, 2018

RMSE vs. R-squared


A well-fitting regression model results in predicted values close to the observed data values. The mean model, which uses the mean for every predicted value, generally would be used if there were no informative predictor variables. The fit of a proposed regression model should therefore be better than the fit of the mean model.
Three statistics are used in Ordinary Least Squares (OLS) regression to evaluate model fit: R-squared, the overall F-test, and the Root Mean Square Error (RMSE). All three are based on two sums of squares: Sum of Squares Total (SST) and Sum of Squares Error (SSE). SST measures how far the data are from the mean and SSE measures how far the data are from the model’s predicted values. Different combinations of these two values provide different information about how the regression model compares to the mean model.
The difference between SST and SSE is the improvement in prediction from the regression model, compared to the mean model. Dividing that difference by SST gives R-squared. It is the proportional improvement in prediction from the regression model, compared to the mean model. It indicates the goodness of fit of the model.
R-squared has the useful property that its scale is intuitive: it ranges from zero to one, with zero indicating that the proposed model does not improve prediction over the mean model and one indicating perfect prediction. Improvement in the regression model results in proportional increases in R-squared.