jeudi 17 août 2023

Address similarity detection using bags of words using python


Principles

Bags of word allow the comparison of text by splitting changing text into a vector. Then, the distance between each can be computed and we can measure the proximity.We would connect the close elements using some graph to have a formal relation between the elements and let understand thay are similar.
 


 

Loading the dataset

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics.pairwise import euclidean_distances import numpy as np ## I use a corpus that is not address in order not tu divulgate the dataset I used corpus=pd.read_csv("addresses.csv") 

Preprocessing data

Before we convert it into a bags of words, we need to perform some transformation. Typically, our dataset contains some address in various countries and the country name can differ depending on the language of the emitter country. 
 

  
import re
#Read a file were countries are mappend to their english translation
countries=pd.read_csv("countries.csv",index_col=0)

# Select the countries when there is a need to substitute
replacements=countries[countries.originalcountry!=countries.tobename]
replacements.originalcountry="\\s"+replacements.originalcountry+"[\\s$]"
replacements.tobename=" "+replacements.tobename

replacements=replacements.set_index('originalcountry')
replacements.tobename=replacements.tobename.replace("\s","",regex=True)
# The replacement functions needs a disctionnary. Hence, we get that représentation from the dataframe.
replacements=replacements.tobename.to_dict()
# Replacing the society abreviations
replacements.update({"s.a.r.l":"sarl"})
replacements.update({"s.a.u":"sau"})
replacements.update({"s.a.s":"sas"})
replacements.update({"s.a":"sa"})
replacements.update({"limited":"ltd"})


#iterate(countriestoreplace)
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in replacements.items()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))

    

Converting into bags of words



 # The vectorizer convert the sentences into a bags of word representation
vectorizer = CountVectorizer(strip_accents="ascii")
features = vectorizer.fit_transform(corpus).todense() 
  
  

Computing the distance

from sklearn.metrics.pairwise import pairwise_distances distances=pairwise_distances(features,metric='cosine') np.save("cosinedistances",distances) print("Sauvegarde terminée")

Identifying the synonyms 

Consideration on performances


 


lundi 10 octobre 2022

Entering Kubernetes avec Microks

Site de référence

https://microk8s.io/docs/getting-started

 

 Installer

sudo snap install microk8s --classic
Lister les servvces
 microk8s kubectl get services
 Lister les pods
microk8s kubectl get pods
 

Pousser une image docker vers un repo 

microk8s enable registry
 
docker tag d17287c3708f localhost:32000/jupyter:1
docker push localhost:32000/jupyter:1
Créer un deploiement
 microk8s kubectl create deployment jupyter --image=localhost:32000/jupyter:1
Deployer  
microk8s kubectl scale deployment jupyter 1


Proxy dashboard (accessible en remote)
 microk8s dashboard-proxy
 
 microk8s kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443
 
 
Delete a pod
 microk8s kubectl delete  pods <podname>

Appliquer un changement

microk8s kubectl apply -f volume.yml
Créer un volume persistant 
apiVersion: v1
kind: PersistentVolume
metadata:
name: streamlit-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/clement/appstreamlit"
 
 

vendredi 25 septembre 2020

Mirror Ubuntu mirror in corporate Artifactory

 

Configuring Artifactory using remote repository, selecting the generic 


Provide the url of the repository you wan to mirror, select the type generic. 
In case you are behind a proxy you may select it. 



     

Go on the Linux machine that you want to use this mirror and edit file /etc/apt/sources.list

Change the default location from archive.ubuntu.com to your internal mirror.
That's it 



mardi 12 mai 2020

Analysing disctrete correlation

survey <- cleme="" esktop="" read.csv="" sers="" stringsasfactors="FALSE)<br" survey-covid.csv="">surveyFinalized<-survey br="" esponse="" finalized="" survey="" tatus="=">
names(surveyFinalized)[names(surveyFinalized) == "What.is.your.home.situation..you.are.in.living...."] <- br="" latorhouse="">names(surveyFinalized)[names(surveyFinalized) == "What.is.your.family.situation."] <- br="" situation="">names(surveyFinalized)[names(surveyFinalized) == "How.worried.are.you.about.the.impact.of.coronavirus.on.you.personally."] <- anxiety="" br="">
View(surveyFinalized)
surveyFinalized$situation[grepl( "without child", surveyFinalized$situation,fixed = TRUE)]<- br="" lone="">surveyFinalized$situation[grepl( "with child", surveyFinalized$situation,fixed = TRUE)]<- amily="" br="">
surveyFinalized$FlatOrHouse[grepl( "ppart", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "Apart", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "flat", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "city", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ppartment="">surveyFinalized$FlatOrHouse[grepl( "country", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ouse="">surveyFinalized$FlatOrHouse[grepl( "house", surveyFinalized$FlatOrHouse,fixed = TRUE)]<- br="" ouse="">

library("ggplot2")

surveyFinalized$anxiety<-factor all="" anxiety="" at="" br="" ery="" levels="c(" omewhat="" ot="" so="" surveyfinalized="" worried="" xtremely="">relevel(surveyFinalized$anxiety,"Very worried")

(d <- aes="" anxiety="" br="" depending="" eneral="" ggplot="" ggtitle="" on="" situation="" surveyfinalized="">    geom_jitter())

(d <- aes="" anxiety="" br="" depending="" eneral="" ggplot="" ggtitle="" latorhouse="" on="" setup="" surveyfinalized="">    geom_jitter())


vendredi 27 mars 2020

Docker campaign

My notes to startup docker:

Configure your internal docker repository

Find your internal Trusted Repository
dtr.mycompany.com

In your : /etc/docker/daemon.json file

Make sure that you trust your repository:
{
    "insecure-registries" : ["dtr.mycompany.com"]
}

curl -k https://dtr.mycompany.com/ca -o /usr/local/share/ca-certificates/dtr.mycompany.com.crt
update-ca-certificates
service docker restart

Allow the user (not root to use docker)

usermod -aG docker myuser
su - myuser
newgrp

Create a docker file

Create the docker file

FROM  dtr.mycompany.com/mycompany/httpd:latest
LABEL maintainer="clement_soullard@infosys.com"
EXPOSE 80
COPY dist/ui  /usr/local/apache2/htdocs/
COPY docker/certs/server.crt /usr/local/apache2/conf/
COPY docker/certs/server.key /usr/local/apache2/conf/
COPY docker/httpd.conf /usr/local/apache2/conf/httpd.conf
RUN sed -i \
        -e 's/^#\(Include .*httpd-ssl.conf\)/\1/' \
        -e 's/^#\(LoadModule .*mod_ssl.so\)/\1/' \
        -e 's/^#\(LoadModule .*mod_socache_shmcb.so\)/\1/' \
        conf/httpd.conf

Useful commands

Stopping a container

docker stop specworks-ui

Removing a container

docker rm specworks-ui
docker image build -t specworks-ui:1.0 .
docker container run --publish 8000:80 --detach --name specworks-ui specworks-ui:1.0
docker exec -it specworks-ui  bash

docker network create -d bridge specworks-net

cp commands works recursively
docker cp docker/httpd.conf specworks-ui:/usr/local/apache2/conf/httpd.conf

Clean up the docker cache

docker system prune -a -f 

Download image from proxy 

mkdir /etc/systemd/system/docker.service.d
Now create a file called /etc/systemd/system/docker.service.d/http-proxy.conf that adds the HTTP_PROXY environment variable:
[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80/"
If you have internal Docker registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:
Environment="HTTP_PROXY=http://proxy.example.com:80/"
Environment="NO_PROXY=localhost,127.0.0.0/8,docker-registry.somecorporation.com"
Flush changes:
$ sudo systemctl daemon-reload
Verify that the configuration has been loaded:
$ sudo systemctl show --property Environment docker
Environment=HTTP_PROXY=http://proxy.example.com:80/
Restart Docker:
$ sudo systemctl restart docker

 


jeudi 30 janvier 2020

Managing entries/password in LDAP from command line

First you have to install the utilities of openLDAP to connect to your LDAP. LDAP is a protocol, it means that you can use the openldap library to connect to other ldap like apache ds for instance:

yum install openldap-clients.x86_64

You can change the password using the admin bind account



ldappasswd -H ldap://:10399 -x -D "uid=admin,ou=system" -W -S "uid=bhashya_avula,ou=Persons,dc=mycompany,dc=com"



Or you can use the user account to change the password

ldappasswd -H ldap://:10399 -x -D "uid=xcvvsd,ou=Persons,dc=,dc=com" -W -A -S

If the account is locked, you can unlock it using apache ds. First, make sure that your connection use display the operational information by checking the below box. 


ldapsearch -H ldap://localhost:10389 -b "uid=admin,ou=system" -wXXXXX -s sub "(cn=Soullard)" +










 
















Then remove  the pwdAccountLockedTime attribute